Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust
On Wed, Feb 17, 2010 at 08:16:45PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote: > > Besides this, proper #DB forwarding to the guest was missing. > During NMI injection? How to reproduce? > >>> Inject, e.g., an NMI over code with TF set. A bit harder is placing a > >>> guest HW breakpoint at the spot the NMI handler returns to. > >>> > >> Will try to reproduce. > >> > > How can I make gdb to run debugged process with TF set? Is this patch > > fixes it: > > > > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > > index 52f78dd..b85b200 100644 > > --- a/arch/x86/kvm/svm.c > > +++ b/arch/x86/kvm/svm.c > > @@ -109,6 +109,7 @@ struct vcpu_svm { > > struct nested_state nested; > > > > bool nmi_singlestep; > > + bool nmi_singlestep_tf; > > }; > > > > /* enable NPT for AMD64 and X86 with PAE */ > > @@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm) > > > > if (svm->nmi_singlestep) { > > svm->nmi_singlestep = false; > > - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) > > + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) { > > svm->vmcb->save.rflags &= > > ~(X86_EFLAGS_TF | X86_EFLAGS_RF); > > + if (svm->nmi_singlestep_tf) { > > + svm->vmcb->save.rflags |= X86_EFLAGS_TF; > > + kvm_queue_exception(&svm->vcpu, DB_VECTOR); > > + } > > + } > > update_db_intercept(&svm->vcpu); > > } > > > > @@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) > >possible problem (IRET or exception injection or interrupt > >shadow) */ > > svm->nmi_singlestep = true; > > + svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF); > > svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); > > update_db_intercept(vcpu); > > } > > That's closer. However, I've a version here that restores TF&RF only if > you did not execute an IRET but stepped over the shadow (which is still > not correct either, e.g. when stepping popf). I will break up my patch > into parts that fix the issues separately so that we can decide what to > merge. > I am not sure what do you mean here. Why should we restore RF? It is cleared after each instruction execution and popf is not special in this regards and SDM explicitly says so. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 02/17/2010 08:07 PM, Alexander Graf wrote: On 17.02.2010, at 17:34, Avi Kivity wrote: On 02/17/2010 06:23 PM, Alexander Graf wrote: On 17.02.2010, at 17:03, Avi Kivity wrote: On 02/17/2010 04:56 PM, Alexander Graf wrote: So I changed to code according to your input by making all FPU calls explicit, getting rid of all binary patching. On the PowerStation again I'm running this code (simplified to the important instructions) using kvmctl: li r2, 0x1234 std r2, 0(r1) lfd f3, 0(r1) lfd f4, 0(r1) do_mul: fmulf0, f3, f4 b do_mul With the following kvm_stat output: dec 2236 53 exits 60797802 1171403 ext_intr 379 4 halt_wakeup 0 0 inst_emu 60795247 1171344 ld60795132 1171348 So I'm getting 1171403 fmul operations per second. And that's even with non-optimized instruction fetching. Not bad. It's a large number, but won't real hardware be three orders of magnitude faster? Yes, it would. But we don't have to care. The only thing we need to worry about is being fast enough to emulate enough FPU instructions actually used in normal guests so the guest runs in full speed. And 1000k> 250k, so we can do that apparently, leaving some spare cycles for non-fpu instructions. I'm sure 250k isn't representative of a floating point intensive program (but maybe there aren't fpu intensive applications on that cpu). Now you made me check how fast the real hw is. I get about 65,000,000 fmul operations per second on it. That's surprisingly low. So we're 65x slower on a PowerStation. And that's for a tight FPU only loop. I'm still not convinced we're running into major problems. Well, it's up to you. I just hope we don't end up underperforming due to this. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 08:17:28PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote: > On 02/17/2010 12:43 PM, Gleb Natapov wrote: > >> And, again: This is an _existing_ user space ABI. We could only provide > >> an alternative, but we have to maintain what is there at least for some > >> longer grace period. > >> > > But it was always broken for SVM and was broken for VMX for a year and > > nobody noticed, so may be instead of reintroducing old interface we > > should > > do it right this time? > We need to fix the existing interface first, and then think long and > hard if we want yet another interface, since we're likely to screw > it up as well. > > The more interfaces we introduce, the harder maintenance becomes. > > >>> We are in a sad state if we cannot improve interface. The current one > >>> outsource part of CPU functionality into userspace. This should be a big > >>> no-no. > >> I still disagree on this. Moving the decision logic to user space > >> prevented to re-implement a gdbstub in kernel space. I oversaw that > >> re-injecting #BP over older SVM was broken, but it is now fixed for all > >> vendors. So moving it back to kernel has actually no long-term reason. > >> > > There were patches to implement gdbstub in kernel space! And not so long > > time ago :) > > Yes, a good reason to implement yet another one. :) > We can you unify them later :). But seriously I am not proposing anything like gdbstub in kernel, just track inserted breakpoints in kernel. > > But I want to move only a tiny bit of logic into the kernel space. > > And #BP reinjection brokenness is a different issue. It should be fixed > > anyway no matter where decision about reinfection happens. > > > > If maintainers think that we should not have improved interface and we > > should support reinjection of #DB from userspace then this patch should > > be applied. I don't have other objections to it. But I, at least, would > > prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB) > > and not the new one. The old one makes it explicit what we are doing, > > the new one allows injection of any event and should be used only during > > migration or CPU reset. It would be event good idea to fail setting > > events if CPU is running. > > Event injection is well supported by both vendors (except for those > software-triggered events). Just because QEMU mostly uses it for reset > and migration doesn't mean we have to restrict other users to only those > cases as well. Yes we have too! Qemu implements device model and the way devices communicates with CPU is well defined and called interrupts, so we have a way to inject interrupts (KVM_IRQ_LINE/KVM_INTERRUPT). Input is validated and passed into VCPU in the right time, we do not inject interrupts directly into VCPU using event injection. Exceptions, on the other hand, is completely internal CPU thing. QEMU shouldn't be a part of CPU emulation. > > And as we have true event injection now, and as it naturally conflicts Now we have a bug that should be fixed ASAP. We should allow setting of some VCPU state only when VCPU is stopped and only for migration/reset purposes. > with the special KVM_SET_GUEST_DEBUG interface, I have a patch that > consolidates this usage for QEMU: use the old interface of > SET_GUEST_DEBUG for pre-2.6.33 kernels, switch to SET_VCPU_EVENTS on > recent ones. Don't do that please, this will encourage use of SET_VCPU_EVENTS for something it shouldn't be used for. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
"We think"? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). That does indeed look promising! Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. I measured runtime of the test code with sample data. My test environment and results are described below. x86 Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 6GB ppc Test Environment: CPU: 2x Dual Core PPC970MP Mem size: 2GB The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS was live migrating. To measure the runtime I copied cpu_get_real_ticks() of QEMU to my test program. Experimental results: Test1: Guest OS read 3GB file, which is bigger than memory. orig.(msec)patch(msec)ratio x860.30.16.4 ppc7.92.73.0 Test2: Guest OS read/write 3GB file, which is bigger than memory. orig.(msec)patch(msec)ratio x8612.0 3.23.7 ppc251.1 1232.0 I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 3/4] qemu: kvm: consume internal signal with sigtimedwait
Change the way the internal qemu signal, used for communication between iothread and vcpus, is handled. Block and consume it with sigtimedwait on the outer vcpu loop, which allows more precise timing control. Change from standard signal (SIGUSR1) to real-time one, so multiple signals are not collapsed. Set the signal number on KVM's in-kernel allowed sigmask. Signed-off-by: Marcelo Tosatti Index: qemu-kvm/vl.c === --- qemu-kvm.orig/vl.c +++ qemu-kvm/vl.c @@ -271,6 +271,12 @@ uint8_t qemu_uuid[16]; static QEMUBootSetHandler *boot_set_handler; static void *boot_set_opaque; +#ifdef SIGRTMIN +#define SIG_IPI (SIGRTMIN+4) +#else +#define SIG_IPI SIGUSR1 +#endif + static int default_serial = 1; static int default_parallel = 1; static int default_virtcon = 1; @@ -3379,7 +3385,8 @@ static QemuCond qemu_cpu_cond; static QemuCond qemu_system_cond; static QemuCond qemu_pause_cond; -static void block_io_signals(void); +static void tcg_block_io_signals(void); +static void kvm_block_io_signals(CPUState *env); static void unblock_io_signals(void); static int tcg_has_work(void); static int cpu_has_work(CPUState *env); @@ -3431,11 +3438,36 @@ static void qemu_wait_io_event(CPUState qemu_wait_io_event_common(env); } +static void qemu_kvm_eat_signal(CPUState *env, int timeout) +{ +struct timespec ts; +int r, e; +siginfo_t siginfo; +sigset_t waitset; + +ts.tv_sec = timeout / 1000; +ts.tv_nsec = (timeout % 1000) * 100; + +sigemptyset(&waitset); +sigaddset(&waitset, SIG_IPI); + +qemu_mutex_unlock(&qemu_global_mutex); +r = sigtimedwait(&waitset, &siginfo, &ts); +e = errno; +qemu_mutex_lock(&qemu_global_mutex); + +if (r == -1 && !(e == EAGAIN || e == EINTR)) { +fprintf(stderr, "sigtimedwait: %s\n", strerror(e)); +exit(1); +} +} + static void qemu_kvm_wait_io_event(CPUState *env) { while (!cpu_has_work(env)) qemu_cond_timedwait(env->halt_cond, &qemu_global_mutex, 1000); +qemu_kvm_eat_signal(env, 0); qemu_wait_io_event_common(env); } @@ -3445,11 +3477,12 @@ static void *kvm_cpu_thread_fn(void *arg { CPUState *env = arg; -block_io_signals(); qemu_thread_self(env->thread); if (kvm_enabled()) kvm_init_vcpu(env); +kvm_block_io_signals(env); + /* signal CPU creation */ qemu_mutex_lock(&qemu_global_mutex); env->created = 1; @@ -3474,7 +3507,7 @@ static void *tcg_cpu_thread_fn(void *arg { CPUState *env = arg; -block_io_signals(); +tcg_block_io_signals(); qemu_thread_self(env->thread); /* signal CPU creation */ @@ -3500,7 +3533,7 @@ void qemu_cpu_kick(void *_env) CPUState *env = _env; qemu_cond_broadcast(env->halt_cond); if (kvm_enabled()) -qemu_thread_signal(env->thread, SIGUSR1); +qemu_thread_signal(env->thread, SIG_IPI); } int qemu_cpu_self(void *_env) @@ -3519,7 +3552,7 @@ static void cpu_signal(int sig) cpu_exit(cpu_single_env); } -static void block_io_signals(void) +static void tcg_block_io_signals(void) { sigset_t set; struct sigaction sigact; @@ -3532,12 +3565,44 @@ static void block_io_signals(void) pthread_sigmask(SIG_BLOCK, &set, NULL); sigemptyset(&set); -sigaddset(&set, SIGUSR1); +sigaddset(&set, SIG_IPI); pthread_sigmask(SIG_UNBLOCK, &set, NULL); memset(&sigact, 0, sizeof(sigact)); sigact.sa_handler = cpu_signal; -sigaction(SIGUSR1, &sigact, NULL); +sigaction(SIG_IPI, &sigact, NULL); +} + +static void dummy_signal(int sig) +{ +} + +static void kvm_block_io_signals(CPUState *env) +{ +int r; +sigset_t set; +struct sigaction sigact; + +sigemptyset(&set); +sigaddset(&set, SIGUSR2); +sigaddset(&set, SIGIO); +sigaddset(&set, SIGALRM); +sigaddset(&set, SIGCHLD); +sigaddset(&set, SIG_IPI); +pthread_sigmask(SIG_BLOCK, &set, NULL); + +pthread_sigmask(SIG_BLOCK, NULL, &set); +sigdelset(&set, SIG_IPI); + +memset(&sigact, 0, sizeof(sigact)); +sigact.sa_handler = dummy_signal; +sigaction(SIG_IPI, &sigact, NULL); + +r = kvm_set_signal_mask(env, &set); +if (r) { +fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(r)); +exit(1); +} } static void unblock_io_signals(void) @@ -3551,7 +3616,7 @@ static void unblock_io_signals(void) pthread_sigmask(SIG_UNBLOCK, &set, NULL); sigemptyset(&set); -sigaddset(&set, SIGUSR1); +sigaddset(&set, SIG_IPI); pthread_sigmask(SIG_BLOCK, &set, NULL); } @@ -3560,7 +3625,7 @@ static void qemu_signal_lock(unsigned in qemu_mutex_lock(&qemu_fair_mutex); while (qemu_mutex_trylock(&qemu_global_mutex)) { -qemu_thread_signal(tcg_cpu_thread, SIGUSR1); +qemu_thread_signal(tcg_cpu_thread, SIG_IPI); if (!qemu_mutex_timedlock(&qemu_global_mutex, msecs)) break; } @@ -3601,7 +3
[patch uq/master 4/4] qemu: kvm: remove pre-entry exit_request check with iothread enabled
With SIG_IPI blocked vcpu loop exit notification happens via -EAGAIN from KVM_RUN. Signed-off-by: Marcelo Tosatti Index: qemu/kvm-all.c === --- qemu.orig/kvm-all.c +++ qemu/kvm-all.c @@ -753,11 +753,13 @@ int kvm_cpu_exec(CPUState *env) dprintf("kvm_cpu_exec()\n"); do { +#ifndef CONFIG_IOTHREAD if (env->exit_request) { dprintf("interrupt exit requested\n"); ret = 0; break; } +#endif if (env->kvm_vcpu_dirty) { kvm_arch_put_registers(env); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 2/4] qemu: kvm specific wait_io_event
In KVM mode the global mutex is released when vcpus are executing, which means acquiring the fairness mutex is not required. Also for KVM there is one thread per vcpu, so tcg_has_work is meaningless. Add a new qemu_wait_io_event_common function to hold common code between TCG/KVM. Signed-off-by: Marcelo Tosatti Index: qemu/vl.c === --- qemu.orig/vl.c +++ qemu/vl.c @@ -3382,6 +3382,7 @@ static QemuCond qemu_pause_cond; static void block_io_signals(void); static void unblock_io_signals(void); static int tcg_has_work(void); +static int cpu_has_work(CPUState *env); static int qemu_init_main_loop(void) { @@ -3402,6 +3403,15 @@ static int qemu_init_main_loop(void) return 0; } +static void qemu_wait_io_event_common(CPUState *env) +{ +if (env->stop) { +env->stop = 0; +env->stopped = 1; +qemu_cond_signal(&qemu_pause_cond); +} +} + static void qemu_wait_io_event(CPUState *env) { while (!tcg_has_work()) @@ -3418,11 +3428,15 @@ static void qemu_wait_io_event(CPUState qemu_mutex_unlock(&qemu_fair_mutex); qemu_mutex_lock(&qemu_global_mutex); -if (env->stop) { -env->stop = 0; -env->stopped = 1; -qemu_cond_signal(&qemu_pause_cond); -} +qemu_wait_io_event_common(env); +} + +static void qemu_kvm_wait_io_event(CPUState *env) +{ +while (!cpu_has_work(env)) +qemu_cond_timedwait(env->halt_cond, &qemu_global_mutex, 1000); + +qemu_wait_io_event_common(env); } static int qemu_cpu_exec(CPUState *env); @@ -3448,7 +3462,7 @@ static void *kvm_cpu_thread_fn(void *arg while (1) { if (cpu_can_run(env)) qemu_cpu_exec(env); -qemu_wait_io_event(env); +qemu_kvm_wait_io_event(env); } return NULL; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 0/4] uq/master: iothread consume signals via sigtimedwait and cleanups
See individual patches for details. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 1/4] qemu: block SIGCHLD in vcpu thread(s)
Otherwise a vcpu thread can run the sigchild handler causing waitpid() from iothread to fail. Signed-off-by: Marcelo Tosatti Index: qemu/vl.c === --- qemu.orig/vl.c +++ qemu/vl.c @@ -3514,6 +3514,7 @@ static void block_io_signals(void) sigaddset(&set, SIGUSR2); sigaddset(&set, SIGIO); sigaddset(&set, SIGALRM); +sigaddset(&set, SIGCHLD); pthread_sigmask(SIG_BLOCK, &set, NULL); sigemptyset(&set); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache
On Wed, Feb 17, 2010 at 10:26:56PM +0100, Jes Sorensen wrote: > On 02/17/10 22:08, Marcelo Tosatti wrote: > >The KVM_CAP_CR3_CACHE reference can be removed since the feature > >was never implemented/included. > > Ok that works too, would you rather a patch to remove all references > to it, or leave it in in case someone decides to pick it up later? I'd say remove all references, its obsolete due to EPT/NPT. > > Cheers, > Jes > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache
On 02/17/10 22:08, Marcelo Tosatti wrote: The KVM_CAP_CR3_CACHE reference can be removed since the feature was never implemented/included. Ok that works too, would you rather a patch to remove all references to it, or leave it in in case someone decides to pick it up later? Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache
On Wed, Feb 17, 2010 at 06:44:12PM +0100, Jes Sorensen wrote: > Hi, > > Comparing the features tested for in get_para_features() with the > kvm_feature_names in target-i386/helper.c, I noticed that we didn't > list the cr3_cache feature in the real name table. > > I presume this is unintentional so here's a patch to correct it. > > Cheers, > Jes > The KVM_CAP_CR3_CACHE reference can be removed since the feature was never implemented/included. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
Gleb Natapov wrote: > On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote: On 02/17/2010 12:43 PM, Gleb Natapov wrote: >> And, again: This is an _existing_ user space ABI. We could only provide >> an alternative, but we have to maintain what is there at least for some >> longer grace period. >> > But it was always broken for SVM and was broken for VMX for a year and > nobody noticed, so may be instead of reintroducing old interface we should > do it right this time? We need to fix the existing interface first, and then think long and hard if we want yet another interface, since we're likely to screw it up as well. The more interfaces we introduce, the harder maintenance becomes. >>> We are in a sad state if we cannot improve interface. The current one >>> outsource part of CPU functionality into userspace. This should be a big >>> no-no. >> I still disagree on this. Moving the decision logic to user space >> prevented to re-implement a gdbstub in kernel space. I oversaw that >> re-injecting #BP over older SVM was broken, but it is now fixed for all >> vendors. So moving it back to kernel has actually no long-term reason. >> > There were patches to implement gdbstub in kernel space! And not so long > time ago :) Yes, a good reason to implement yet another one. :) > But I want to move only a tiny bit of logic into the kernel space. > And #BP reinjection brokenness is a different issue. It should be fixed > anyway no matter where decision about reinfection happens. > > If maintainers think that we should not have improved interface and we > should support reinjection of #DB from userspace then this patch should > be applied. I don't have other objections to it. But I, at least, would > prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB) > and not the new one. The old one makes it explicit what we are doing, > the new one allows injection of any event and should be used only during > migration or CPU reset. It would be event good idea to fail setting > events if CPU is running. Event injection is well supported by both vendors (except for those software-triggered events). Just because QEMU mostly uses it for reset and migration doesn't mean we have to restrict other users to only those cases as well. And as we have true event injection now, and as it naturally conflicts with the special KVM_SET_GUEST_DEBUG interface, I have a patch that consolidates this usage for QEMU: use the old interface of SET_GUEST_DEBUG for pre-2.6.33 kernels, switch to SET_VCPU_EVENTS on recent ones. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust
Gleb Natapov wrote: > On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote: > Besides this, proper #DB forwarding to the guest was missing. During NMI injection? How to reproduce? >>> Inject, e.g., an NMI over code with TF set. A bit harder is placing a >>> guest HW breakpoint at the spot the NMI handler returns to. >>> >> Will try to reproduce. >> > How can I make gdb to run debugged process with TF set? Is this patch > fixes it: > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 52f78dd..b85b200 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -109,6 +109,7 @@ struct vcpu_svm { > struct nested_state nested; > > bool nmi_singlestep; > + bool nmi_singlestep_tf; > }; > > /* enable NPT for AMD64 and X86 with PAE */ > @@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm) > > if (svm->nmi_singlestep) { > svm->nmi_singlestep = false; > - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) > + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) { > svm->vmcb->save.rflags &= > ~(X86_EFLAGS_TF | X86_EFLAGS_RF); > + if (svm->nmi_singlestep_tf) { > + svm->vmcb->save.rflags |= X86_EFLAGS_TF; > + kvm_queue_exception(&svm->vcpu, DB_VECTOR); > + } > + } > update_db_intercept(&svm->vcpu); > } > > @@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) > possible problem (IRET or exception injection or interrupt > shadow) */ > svm->nmi_singlestep = true; > + svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF); > svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); > update_db_intercept(vcpu); > } That's closer. However, I've a version here that restores TF&RF only if you did not execute an IRET but stepped over the shadow (which is still not correct either, e.g. when stepping popf). I will break up my patch into parts that fix the issues separately so that we can decide what to merge. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2915201 ] Nested kvm (SVM)
Bugs item #2915201, was opened at 2009-12-15 17:35 Message generated for change (Comment added) made by alex_williamson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: v1.0 (example) Status: Open Resolution: None Priority: 5 Private: No Submitted By: jbl001 (jbl001) Assigned to: Nobody/Anonymous (nobody) Summary: Nested kvm (SVM) Initial Comment: I have seen a couple messages where people have stated that nested SVM works properly, but I cannot replicate it. I first attempted to use the following configurations: Hardware: desktop system: Gigabyte MA785GM board with Athlon X2 4400 server system: Tyan h2000M board with Opteron 2354 Software: Host OS: Ubuntu 9.10 with production kernel Host KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried git-tip Guest VMM Host OS: Ubuntu 9.10 Guest VMM KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried git-tip True guest: tried Slackware 10.2, 64-bit Ubuntu 8.10, 64-bit Ubuntu 9.10, and 32-bit XP All configurations result in the true guest not booting, but the Slackware 10.2 true guest is the easiest to analyze. It hangs at various places during boot with the most common being the "calibrating delay loop", "testing HLT instruction", mounting the hard disks, or starting the INIT processes. It seems it is losing interrupts. I also tried an older host (64-bit Ubuntu 8.10) and guest VMM (64-bit Ubuntu 8.10) with the KVM-88 release. With this configuration, the Slackware 10.2 true guest will usually boot, but will then get a constant flow of "hda: lost interrupt" and "hda: dma_timer_expiry: dma status == 0x24". Again, it seems to be losing interrupts. I have ensured that the nested=1 is passed to the module and that enable-nesting is passed to the qemu. It obviously works for some time and I've tried printing out exit reasons in the handle_exit() function of the guest VMM, but it consistently fails in some form or another across all the hardware and software I have to try it on. -- Comment By: Alex Williamson (alex_williamson) Date: 2010-02-17 12:02 Message: Try reverting cd3ff653ae0b45bac7a19208e9c75034fcacc85f from kvm-kmod (kvm-svm). I ran into trouble with nested kvm about a month ago and bisected it back to this change. I alerted Joerg, but he might need another poke if this fixes nesting for you too. -- Comment By: jbl001 (jbl001) Date: 2010-02-17 10:47 Message: I tried this again with qemu-0.12.2 and kvm-kmod-2.6.32.3 while passing no-kvmclock to both the host and guest VMM kernels. It did not help the problem of lost interrupts in the true guest, however. -- Comment By: Brian Jackson (iggy_cav) Date: 2010-02-09 12:07 Message: Can you try disabling kvmclock in both guests? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-kmod: Build fix for #define KVM_DEBUG
Tsuyoshi Ozawa wrote: >>> Copy Jan - he maintains kvm-kmod, and probably didn't see your patch. >>> >> Yes, I did. Proper subject prefixing can help a lot here... >> > > I'm sorry for I forgot to prefix "kvm-kmod" and thank you for telling me this. > I mind this from now. > >> Could you please repost, avoiding that the patch is line-wrapped and >> giving it an up-to-date changelog? > > Yes, this new patch for the newest commit passed checkpatch. Thanks, merged. [And as your original changelog was even better, I included it as well.] Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-kmod: Build fix for #define KVM_DEBUG
>> Copy Jan - he maintains kvm-kmod, and probably didn't see your patch. >> > > Yes, I did. Proper subject prefixing can help a lot here... > I'm sorry for I forgot to prefix "kvm-kmod" and thank you for telling me this. I mind this from now. > Could you please repost, avoiding that the patch is line-wrapped and > giving it an up-to-date changelog? Yes, this new patch for the newest commit passed checkpatch. 0001-Build-fix-for-define-KVM_DEBUG.patch Description: Binary data
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 17.02.2010, at 17:34, Avi Kivity wrote: > On 02/17/2010 06:23 PM, Alexander Graf wrote: >> On 17.02.2010, at 17:03, Avi Kivity wrote: >> >> >>> On 02/17/2010 04:56 PM, Alexander Graf wrote: >>> So I changed to code according to your input by making all FPU calls explicit, getting rid of all binary patching. On the PowerStation again I'm running this code (simplified to the important instructions) using kvmctl: li r2, 0x1234 std r2, 0(r1) lfd f3, 0(r1) lfd f4, 0(r1) do_mul: fmulf0, f3, f4 b do_mul With the following kvm_stat output: dec 2236 53 exits 60797802 1171403 ext_intr 379 4 halt_wakeup 0 0 inst_emu 60795247 1171344 ld60795132 1171348 So I'm getting 1171403 fmul operations per second. And that's even with non-optimized instruction fetching. Not bad. >>> It's a large number, but won't real hardware be three orders of magnitude >>> faster? >>> >> Yes, it would. But we don't have to care. The only thing we need to worry >> about is being fast enough to emulate enough FPU instructions actually used >> in normal guests so the guest runs in full speed. And 1000k> 250k, so we >> can do that apparently, leaving some spare cycles for non-fpu instructions. >> > > I'm sure 250k isn't representative of a floating point intensive program (but > maybe there aren't fpu intensive applications on that cpu). Now you made me check how fast the real hw is. I get about 65,000,000 fmul operations per second on it. So we're 65x slower on a PowerStation. And that's for a tight FPU only loop. I'm still not convinced we're running into major problems. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM: Balloon support for device assignment
On Wed, Feb 17, 2010 at 12:27:09PM +0200, Avi Kivity wrote: > On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote: > >From: Eran Borovik > > > >This patch adds modifications to allow correct > >balloon operation when a virtual guest uses a direct assigned device. > >The modifications include a new interface between qemu and kvm to allow > >mapping and unmapping the pages from the IOMMU as well as pinning and > >unpinning as needed. > > The plan for iommu support is to push it into uio. Instead of kvm > managing the iommu directly, I'd like qemu to open a uio device and > set up an iommu mapping there, which will just happen to match the > kvm memory slots. Similarly, interrupts will be forwarded using > irqfds. This will allow using the iommu without kvm, and reduce the > amount of special purpose kvm code. > > These patches make the transition more difficult which worries me. That's a fair point, but they also address a real short-coming of the current device assignment code, which pins all of the guest's memory unconditionally. Unless the uio effort is in progress and expected to complete shortly, I would think the benefit of these simple patches trumps the cost. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2915201 ] Nested kvm (SVM)
Bugs item #2915201, was opened at 2009-12-15 16:35 Message generated for change (Comment added) made by jbl001 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: v1.0 (example) >Status: Open Resolution: None Priority: 5 Private: No Submitted By: jbl001 (jbl001) Assigned to: Nobody/Anonymous (nobody) Summary: Nested kvm (SVM) Initial Comment: I have seen a couple messages where people have stated that nested SVM works properly, but I cannot replicate it. I first attempted to use the following configurations: Hardware: desktop system: Gigabyte MA785GM board with Athlon X2 4400 server system: Tyan h2000M board with Opteron 2354 Software: Host OS: Ubuntu 9.10 with production kernel Host KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried git-tip Guest VMM Host OS: Ubuntu 9.10 Guest VMM KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried git-tip True guest: tried Slackware 10.2, 64-bit Ubuntu 8.10, 64-bit Ubuntu 9.10, and 32-bit XP All configurations result in the true guest not booting, but the Slackware 10.2 true guest is the easiest to analyze. It hangs at various places during boot with the most common being the "calibrating delay loop", "testing HLT instruction", mounting the hard disks, or starting the INIT processes. It seems it is losing interrupts. I also tried an older host (64-bit Ubuntu 8.10) and guest VMM (64-bit Ubuntu 8.10) with the KVM-88 release. With this configuration, the Slackware 10.2 true guest will usually boot, but will then get a constant flow of "hda: lost interrupt" and "hda: dma_timer_expiry: dma status == 0x24". Again, it seems to be losing interrupts. I have ensured that the nested=1 is passed to the module and that enable-nesting is passed to the qemu. It obviously works for some time and I've tried printing out exit reasons in the handle_exit() function of the guest VMM, but it consistently fails in some form or another across all the hardware and software I have to try it on. -- >Comment By: jbl001 (jbl001) Date: 2010-02-17 09:47 Message: I tried this again with qemu-0.12.2 and kvm-kmod-2.6.32.3 while passing no-kvmclock to both the host and guest VMM kernels. It did not help the problem of lost interrupts in the true guest, however. -- Comment By: Brian Jackson (iggy_cav) Date: 2010-02-09 11:07 Message: Can you try disabling kvmclock in both guests? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache
Hi, Comparing the features tested for in get_para_features() with the kvm_feature_names in target-i386/helper.c, I noticed that we didn't list the cr3_cache feature in the real name table. I presume this is unintentional so here's a patch to correct it. Cheers, Jes commit 39cb576d15a6ffbbcade3c4f282c2f3e76e3098a Author: Jes Sorensen Date: Wed Feb 17 18:03:37 2010 +0100 Add kvm_cr3_cache to the list of KVM features. This is to match the features automatically added by target-i386/kvm.c:get_para_features() Signed-off-by: Jes Sorensen diff --git a/target-i386/helper.c b/target-i386/helper.c index f9d63f6..2cd3dca 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -61,7 +61,8 @@ static const char *ext3_feature_name[] = { }; static const char *kvm_feature_name[] = { -"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL, +"kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvm_cr3_cache", +NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
buildbot failure in qemu-kvm on default_x86_64_out_of_tree
The Buildbot has detected a new failure of default_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/218 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_out_of_tree
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/216 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/279 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 02/17/2010 06:23 PM, Alexander Graf wrote: On 17.02.2010, at 17:03, Avi Kivity wrote: On 02/17/2010 04:56 PM, Alexander Graf wrote: So I changed to code according to your input by making all FPU calls explicit, getting rid of all binary patching. On the PowerStation again I'm running this code (simplified to the important instructions) using kvmctl: li r2, 0x1234 std r2, 0(r1) lfd f3, 0(r1) lfd f4, 0(r1) do_mul: fmulf0, f3, f4 b do_mul With the following kvm_stat output: dec 2236 53 exits 60797802 1171403 ext_intr 379 4 halt_wakeup 0 0 inst_emu 60795247 1171344 ld60795132 1171348 So I'm getting 1171403 fmul operations per second. And that's even with non-optimized instruction fetching. Not bad. It's a large number, but won't real hardware be three orders of magnitude faster? Yes, it would. But we don't have to care. The only thing we need to worry about is being fast enough to emulate enough FPU instructions actually used in normal guests so the guest runs in full speed. And 1000k> 250k, so we can do that apparently, leaving some spare cycles for non-fpu instructions. I'm sure 250k isn't representative of a floating point intensive program (but maybe there aren't fpu intensive applications on that cpu). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/268 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/216 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/267 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_debian_5_0
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/277 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/216 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Amit Shah ,Anthony Liguori ,Artyom Tarasenko ,Aurelien Jarno ,Avi Kivity ,Blue Swirl ,Brian Jackson ,Christian Krause ,Christoph Hellwig ,David S. Ahern ,Dirk Ullrich ,Edgar E. Iglesias ,Evgeniy Dushistov ,Isaku Yamahata ,Jan Kiszka ,Jim Meyering ,Kevin Wolf ,Liran Schour ,Loïc Minier ,Luiz Capitulino ,Marcelo Tosatti ,Markus Armbruster ,Michael S. Tsirkin ,OHMURA Kei ,Paolo Bonzini ,Richard Henderson ,Riku Voipio ,Roy Tam ,Scott Tsai ,Sheng Yang ,Stefan Weil ,TeLeMan ,Tom Lendacky ,h...@lst.de ,malc BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 17.02.2010, at 17:03, Avi Kivity wrote: > On 02/17/2010 04:56 PM, Alexander Graf wrote: >> >> So I changed to code according to your input by making all FPU calls >> explicit, getting rid of all binary patching. >> >> On the PowerStation again I'm running this code (simplified to the important >> instructions) using kvmctl: >> >> li r2, 0x1234 >> std r2, 0(r1) >> lfd f3, 0(r1) >> lfd f4, 0(r1) >> do_mul: >> fmulf0, f3, f4 >> b do_mul >> >> >> With the following kvm_stat output: >> >> dec 2236 53 >> exits 60797802 1171403 >> ext_intr 379 4 >> halt_wakeup 0 0 >> inst_emu 60795247 1171344 >> ld60795132 1171348 >> >> So I'm getting 1171403 fmul operations per second. And that's even with >> non-optimized instruction fetching. Not bad. >> > > It's a large number, but won't real hardware be three orders of magnitude > faster? Yes, it would. But we don't have to care. The only thing we need to worry about is being fast enough to emulate enough FPU instructions actually used in normal guests so the guest runs in full speed. And 1000k > 250k, so we can do that apparently, leaving some spare cycles for non-fpu instructions. The kernel on my PS3 is still compiling. Let's see how fast I get there. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 04:13:11PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote: > Jan Kiszka wrote: > > Gleb Natapov wrote: > >> Lets check if SVM works. I can do that if you tell me how. > > - Fire up some Linux guest with gdb installed > > - Attach gdb to gdbstub of the VM > > - Set a soft breakpoint in guest kernel, ideally where it does not > > immediately trigger, e.g. on sys_reboot (use grep sys_reboot > > /proc/kallsyms if you don't have symbols for the guest kernel) > > - Start gdb /bin/true in the guest > > - run > > > > As gdb sets some automatic breakpoints, this already exercises the > > reinjection of #BP. > I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), > and it just worked. > > >>> I tested it on processor without NextRIP and your test case works there > >>> too, > >>> but it shouldn't have, so I looked deeper into that and what I see is > >>> that GDB outsmart us. It doesn't matter if we inject event before int3 > >>> inserted by GDB or after it GDB correctly finds breakpoint that > >>> triggered and restart instruction correctly. I assume it doesn't use > >>> exact match between rip where int3 was inserted and where exceptions > >>> triggers. > >> At latest when you have two successive breakpoints on single-byte > >> instructions, gdb will reach its limits (for it failed earlier, BTW). > >> And other debuggers under other OSes may become unhappy as well. > > Yes, and that is why I am saying checking with GDB is not a good test. > > GDB may work, but it doesn't mean injection works correctly. It took me > > some time to write test that finally confused gdb. It was like this: > > > > 1: int main(int argc, char **argv) > > 2: { > > 3: if (argc == 1) > > 4: goto a; > > 5: asm("cmc"); > > 6: a: > > 7: asm("cmc"); > > 8: return 0; > > 9: } > > > > If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB > > thinks it is on line 5. > > > > So can you run int3 test below on master on AMD with NextRIP support? > > I doubt the result will be correct. > > If you meant your test above: Works out of the box with unpatched kvm on > modern AMD CPUs, ie. gdb always stops at line 7 even if host debugging > is active. > I meant test that does asm("int3") and see that rip it reports with and without host debugging active is the same and points after int3. But I guess if program above works correctly int3 test should work too. Thanks. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 02/17/2010 04:56 PM, Alexander Graf wrote: So I changed to code according to your input by making all FPU calls explicit, getting rid of all binary patching. On the PowerStation again I'm running this code (simplified to the important instructions) using kvmctl: li r2, 0x1234 std r2, 0(r1) lfd f3, 0(r1) lfd f4, 0(r1) do_mul: fmulf0, f3, f4 b do_mul With the following kvm_stat output: dec 2236 53 exits 60797802 1171403 ext_intr 379 4 halt_wakeup 0 0 inst_emu 60795247 1171344 ld60795132 1171348 So I'm getting 1171403 fmul operations per second. And that's even with non-optimized instruction fetching. Not bad. It's a large number, but won't real hardware be three orders of magnitude faster? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
Gleb Natapov wrote: > On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote: >> Gleb Natapov wrote: >>> On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote: Jan Kiszka wrote: > Gleb Natapov wrote: >> Lets check if SVM works. I can do that if you tell me how. > - Fire up some Linux guest with gdb installed > - Attach gdb to gdbstub of the VM > - Set a soft breakpoint in guest kernel, ideally where it does not > immediately trigger, e.g. on sys_reboot (use grep sys_reboot > /proc/kallsyms if you don't have symbols for the guest kernel) > - Start gdb /bin/true in the guest > - run > > As gdb sets some automatic breakpoints, this already exercises the > reinjection of #BP. I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), and it just worked. >>> I tested it on processor without NextRIP and your test case works there too, >>> but it shouldn't have, so I looked deeper into that and what I see is >>> that GDB outsmart us. It doesn't matter if we inject event before int3 >>> inserted by GDB or after it GDB correctly finds breakpoint that >>> triggered and restart instruction correctly. I assume it doesn't use >>> exact match between rip where int3 was inserted and where exceptions >>> triggers. >> At latest when you have two successive breakpoints on single-byte >> instructions, gdb will reach its limits (for it failed earlier, BTW). >> And other debuggers under other OSes may become unhappy as well. > Yes, and that is why I am saying checking with GDB is not a good test. > GDB may work, but it doesn't mean injection works correctly. It took me > some time to write test that finally confused gdb. It was like this: > > 1: int main(int argc, char **argv) > 2: { > 3:if (argc == 1) > 4:goto a; > 5:asm("cmc"); > 6: a: > 7:asm("cmc"); > 8:return 0; > 9: } > > If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB > thinks it is on line 5. > > So can you run int3 test below on master on AMD with NextRIP support? > I doubt the result will be correct. If you meant your test above: Works out of the box with unpatched kvm on modern AMD CPUs, ie. gdb always stops at line 7 even if host debugging is active. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests
On 09.02.2010, at 13:27, Avi Kivity wrote: > On 02/09/2010 01:13 PM, Alexander Graf wrote: >> Avi Kivity wrote: >> >>> On 02/09/2010 01:00 PM, Alexander Graf wrote: >>> > That's pretty impressive (never saw x86 with this exit rate) but it's > more than 1000 times slower than the hardware, assuming 1 fpu IPC (and > the processor can probably do more). An fpu intensive application > will slow to a crawl. > > Measuring a typical Gekko application, I get about 200k-250k of fpu (incl. paired singles) instructions per second. >>> Virtualized, yes? What's the rate on bare metal? >>> >> >> Emulated. I can't measure anything on bare metal. >> > > Well, then, the rate may be low due to virtualization overhead. Any way to > compare absolute performance? So I changed to code according to your input by making all FPU calls explicit, getting rid of all binary patching. On the PowerStation again I'm running this code (simplified to the important instructions) using kvmctl: li r2, 0x1234 std r2, 0(r1) lfd f3, 0(r1) lfd f4, 0(r1) do_mul: fmulf0, f3, f4 b do_mul With the following kvm_stat output: dec 2236 53 exits 60797802 1171403 ext_intr 379 4 halt_wakeup 0 0 inst_emu 60795247 1171344 ld60795132 1171348 So I'm getting 1171403 fmul operations per second. And that's even with non-optimized instruction fetching. Not bad. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
On Wed, Feb 17, 2010 at 11:10:07AM +0200, Gleb Natapov wrote: > On Wed, Feb 17, 2010 at 10:03:58AM +0100, Jan Kiszka wrote: > > > > > > Also, as Avi mentioned it would be better to avoid this. Is it not > > > possible to disallow migration while interrupt shadow is present? > > > > Which means disallowing user space exists while the shadow it set? Or > > should we introduce some flag for user space that tells it "do not > > migration now, resume the guest till next exit"? > > > I think disabling migration is a slippery slope. Guest may abuse it. May > be it will be hard to do with interrupt shadow, but the mechanism will be > used for other cases too. I remember there was an argument that we > should not migrate while vcpu is in a nested guest mode. Agree that guest may abuse it. Better to save/restore blocking-by-sti/by-mov-ss individually. I was thinking the writeback of interrupt shadow / interruptibility state would be too complicated (eg necessary to care about ordering, etc), but now i see its handled in kernel (inject_pending_event and friends). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks
The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert them to raw_spinlock. No change for !RT kernels. Signed-off-by: Thomas Gleixner --- arch/x86/kvm/i8254.c | 10 +- arch/x86/kvm/i8254.h |2 +- arch/x86/kvm/i8259.c | 31 --- arch/x86/kvm/irq.h |2 +- arch/x86/kvm/x86.c |8 5 files changed, 27 insertions(+), 26 deletions(-) Index: linux-2.6-tip/arch/x86/kvm/i8254.c === --- linux-2.6-tip.orig/arch/x86/kvm/i8254.c +++ linux-2.6-tip/arch/x86/kvm/i8254.c @@ -242,11 +242,11 @@ static void kvm_pit_ack_irq(struct kvm_i { struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state, irq_ack_notifier); - spin_lock(&ps->inject_lock); + raw_spin_lock(&ps->inject_lock); if (atomic_dec_return(&ps->pit_timer.pending) < 0) atomic_inc(&ps->pit_timer.pending); ps->irq_ack = 1; - spin_unlock(&ps->inject_lock); + raw_spin_unlock(&ps->inject_lock); } void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) @@ -624,7 +624,7 @@ struct kvm_pit *kvm_create_pit(struct kv mutex_init(&pit->pit_state.lock); mutex_lock(&pit->pit_state.lock); - spin_lock_init(&pit->pit_state.inject_lock); + raw_spin_lock_init(&pit->pit_state.inject_lock); kvm->arch.vpit = pit; pit->kvm = kvm; @@ -723,12 +723,12 @@ void kvm_inject_pit_timer_irqs(struct kv /* Try to inject pending interrupts when * last one has been acked. */ - spin_lock(&ps->inject_lock); + raw_spin_lock(&ps->inject_lock); if (atomic_read(&ps->pit_timer.pending) && ps->irq_ack) { ps->irq_ack = 0; inject = 1; } - spin_unlock(&ps->inject_lock); + raw_spin_unlock(&ps->inject_lock); if (inject) __inject_pit_timer_intr(kvm); } Index: linux-2.6-tip/arch/x86/kvm/i8254.h === --- linux-2.6-tip.orig/arch/x86/kvm/i8254.h +++ linux-2.6-tip/arch/x86/kvm/i8254.h @@ -27,7 +27,7 @@ struct kvm_kpit_state { u32speaker_data_on; struct mutex lock; struct kvm_pit *pit; - spinlock_t inject_lock; + raw_spinlock_t inject_lock; unsigned long irq_ack; struct kvm_irq_ack_notifier irq_ack_notifier; }; Index: linux-2.6-tip/arch/x86/kvm/i8259.c === --- linux-2.6-tip.orig/arch/x86/kvm/i8259.c +++ linux-2.6-tip/arch/x86/kvm/i8259.c @@ -44,18 +44,19 @@ static void pic_clear_isr(struct kvm_kpi * Other interrupt may be delivered to PIC while lock is dropped but * it should be safe since PIC state is already updated at this stage. */ - spin_unlock(&s->pics_state->lock); + raw_spin_unlock(&s->pics_state->lock); kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq); - spin_lock(&s->pics_state->lock); + raw_spin_lock(&s->pics_state->lock); } void kvm_pic_clear_isr_ack(struct kvm *kvm) { struct kvm_pic *s = pic_irqchip(kvm); - spin_lock(&s->lock); + + raw_spin_lock(&s->lock); s->pics[0].isr_ack = 0xff; s->pics[1].isr_ack = 0xff; - spin_unlock(&s->lock); + raw_spin_unlock(&s->lock); } /* @@ -156,9 +157,9 @@ static void pic_update_irq(struct kvm_pi void kvm_pic_update_irq(struct kvm_pic *s) { - spin_lock(&s->lock); + raw_spin_lock(&s->lock); pic_update_irq(s); - spin_unlock(&s->lock); + raw_spin_unlock(&s->lock); } int kvm_pic_set_irq(void *opaque, int irq, int level) @@ -166,14 +167,14 @@ int kvm_pic_set_irq(void *opaque, int ir struct kvm_pic *s = opaque; int ret = -1; - spin_lock(&s->lock); + raw_spin_lock(&s->lock); if (irq >= 0 && irq < PIC_NUM_PINS) { ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, level); pic_update_irq(s); trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr, s->pics[irq >> 3].imr, ret == 0); } - spin_unlock(&s->lock); + raw_spin_unlock(&s->lock); return ret; } @@ -203,7 +204,7 @@ int kvm_pic_read_irq(struct kvm *kvm) int irq, irq2, intno; struct kvm_pic *s = pic_irqchip(kvm); - spin_lock(&s->lock); + raw_spin_lock(&s->lock); irq = pic_get_irq(&s->pics[0]); if (irq >= 0) { pic_intack(&s->pics[0], irq); @@ -228,7 +229,7 @@ int kvm_pic_read_irq(struct kvm *kvm) intno = s->pics[0].irq_base + irq; } pic_update_irq(s); - spin_unlock(&s->lock); + raw_spin
[PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
From: Wei Yongjun kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure due to cannot register io dev. Signed-off-by: Wei Yongjun Signed-off-by: Avi Kivity --- virt/kvm/ioapic.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index a2edfd1..f3d0693 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -393,8 +393,10 @@ int kvm_ioapic_init(struct kvm *kvm) mutex_lock(&kvm->slots_lock); ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, &ioapic->dev); mutex_unlock(&kvm->slots_lock); - if (ret < 0) + if (ret < 0) { + kvm->arch.vioapic = NULL; kfree(ioapic); + } return ret; } -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4)
This is the first of four batches of patches for the 2.6.34 merge window. KVM changes for this cycle include: - rdtscp support - powerpc server-class updates - much improved large-guest scaling (now up to 64 vcpus) - improved guest fpu handling - initial Hyper-V emulation - better swapping with EPT - 1GB pages on Intel - x86 emulator fixes as well as the usual assortment of random fixes and improvements. Avi Kivity (2): KVM: MMU: Add tracepoint for guest page aging KVM: Plan obsolescence of kernel allocated slots, paravirt mmu Gleb Natapov (9): KVM: x86 emulator: Add group8 instruction decoding KVM: x86 emulator: Add group9 instruction decoding KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation KVM: x86 emulator: Check IOPL level during io instruction emulation KVM: x86 emulator: Fix popf emulation KVM: x86 emulator: Check CPL level during privilege instruction emulation KVM: x86 emulator: Add LOCK prefix validity checking KVM: x86 emulator: disallow opcode 82 in 64-bit mode Jochen Maes (1): KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c Liu Yu (1): KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest Michael S. Tsirkin (1): KVM: do not store wqh in irqfd Sheng Yang (1): KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT Wei Yongjun (5): KVM: PIT: unregister kvm irq notifier if fail to create pit KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl KVM: ia64: destroy ioapic device if fail to setup default irq routing KVM: x86 emulator: code style cleanup Documentation/feature-removal-schedule.txt | 30 +++ arch/ia64/kvm/kvm-ia64.c |2 +- arch/powerpc/include/asm/kvm_host.h|2 + arch/powerpc/kvm/booke.c | 59 -- arch/powerpc/kvm/emulate.c |4 +- arch/x86/include/asm/kvm_emulate.h | 15 ++- arch/x86/include/asm/kvm_host.h|8 +- arch/x86/include/asm/vmx.h |2 +- arch/x86/kvm/emulate.c | 300 +--- arch/x86/kvm/i8254.c |5 +- arch/x86/kvm/i8259.c | 11 + arch/x86/kvm/irq.h |1 + arch/x86/kvm/mmu.c | 28 ++-- arch/x86/kvm/mmu.h |6 + arch/x86/kvm/paging_tmpl.h | 11 +- arch/x86/kvm/vmx.c |4 +- arch/x86/kvm/x86.c | 152 ++ include/trace/events/kvm.h | 22 ++ virt/kvm/coalesced_mmio.c |4 +- virt/kvm/eventfd.c |3 - virt/kvm/ioapic.c | 15 ++- virt/kvm/ioapic.h |1 + 22 files changed, 525 insertions(+), 160 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing
From: Wei Yongjun If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(), ioapic device is not destroyed and kvm->arch.vioapic is not set to NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to unexcepted memory. Signed-off-by: Wei Yongjun Signed-off-by: Avi Kivity --- arch/ia64/kvm/kvm-ia64.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 0618898..26e0e08 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -968,7 +968,7 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; r = kvm_setup_default_irq_routing(kvm); if (r) { - kfree(kvm->arch.vioapic); + kvm_ioapic_destroy(kvm); goto out; } break; -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/20] KVM: x86 emulator: Add group9 instruction decoding
From: Gleb Natapov Use groups mechanism to decode 0F C7 instructions. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 435b1e4..45a4f7c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -88,7 +88,7 @@ enum { Group1_80, Group1_81, Group1_82, Group1_83, Group1A, Group3_Byte, Group3, Group4, Group5, Group7, - Group8, + Group8, Group9, }; static u32 opcode_table[256] = { @@ -272,7 +272,8 @@ static u32 twobyte_table[256] = { 0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem16 | ModRM | Mov, /* 0xC0 - 0xCF */ - 0, 0, 0, DstMem | SrcReg | ModRM | Mov, 0, 0, 0, ImplicitOps | ModRM, + 0, 0, 0, DstMem | SrcReg | ModRM | Mov, + 0, 0, 0, Group | GroupDual | Group9, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xD0 - 0xDF */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, @@ -328,6 +329,8 @@ static u32 group_table[] = { 0, 0, 0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, + [Group9*8] = + 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, }; static u32 group2_table[] = { @@ -335,6 +338,8 @@ static u32 group2_table[] = { SrcNone | ModRM, 0, 0, SrcNone | ModRM, SrcNone | ModRM | DstMem | Mov, 0, SrcMem16 | ModRM | Mov, 0, + [Group9*8] = + 0, 0, 0, 0, 0, 0, 0, 0, }; /* EFLAGS bit definitions. */ -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding
From: Gleb Natapov Use groups mechanism to decode 0F BA instructions. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 645b245..435b1e4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -88,6 +88,7 @@ enum { Group1_80, Group1_81, Group1_82, Group1_83, Group1A, Group3_Byte, Group3, Group4, Group5, Group7, + Group8, }; static u32 opcode_table[256] = { @@ -267,7 +268,7 @@ static u32 twobyte_table[256] = { 0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem16 | ModRM | Mov, /* 0xB8 - 0xBF */ - 0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcReg | ModRM | BitOp, + 0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp, 0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem16 | ModRM | Mov, /* 0xC0 - 0xCF */ @@ -323,6 +324,10 @@ static u32 group_table[] = { 0, 0, ModRM | SrcMem, ModRM | SrcMem, SrcNone | ModRM | DstMem | Mov, 0, SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp, + [Group8*8] = + 0, 0, 0, 0, + DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, + DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, }; static u32 group2_table[] = { -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
From: Gleb Natapov Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 14 +++- arch/x86/include/asm/kvm_host.h|7 ++- arch/x86/kvm/emulate.c |6 +- arch/x86/kvm/mmu.c | 17 ++--- arch/x86/kvm/mmu.h |6 ++ arch/x86/kvm/paging_tmpl.h | 11 ++- arch/x86/kvm/x86.c | 131 +++- 7 files changed, 142 insertions(+), 50 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 784d7c5..7a6f54f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -54,13 +54,23 @@ struct x86_emulate_ctxt; struct x86_emulate_ops { /* * read_std: Read bytes of standard (non-emulated/special) memory. -* Used for instruction fetch, stack operations, and others. +* Used for descriptor reading. * @addr: [IN ] Linear address from which to read. * @val: [OUT] Value read from memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to read from memory. */ int (*read_std)(unsigned long addr, void *val, - unsigned int bytes, struct kvm_vcpu *vcpu); + unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); + + /* +* fetch: Read bytes of standard (non-emulated/special) memory. +*Used for instruction fetch. +* @addr: [IN ] Linear address from which to read. +* @val: [OUT] Value read from memory, zero-extended to 'u_long'. +* @bytes: [IN ] Number of bytes to read from memory. +*/ + int (*fetch)(unsigned long addr, void *val, + unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* * read_emulated: Read bytes from emulated/special memory area. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1522337..c07c16f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -243,7 +243,8 @@ struct kvm_mmu { void (*new_cr3)(struct kvm_vcpu *vcpu); int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err); void (*free)(struct kvm_vcpu *vcpu); - gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva); + gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, + u32 *error); void (*prefetch_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page); int (*sync_page)(struct kvm_vcpu *vcpu, @@ -660,6 +661,10 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); int kvm_mmu_load(struct kvm_vcpu *vcpu); void kvm_mmu_unload(struct kvm_vcpu *vcpu); void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); +gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); +gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); +gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); +gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error); int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e4e2df3..c44b460 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -616,7 +616,7 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, if (linear < fc->start || linear >= fc->end) { size = min(15UL, PAGE_SIZE - offset_in_page(linear)); - rc = ops->read_std(linear, fc->data, size, ctxt->vcpu); + rc = ops->fetch(linear, fc->data, size, ctxt->vcpu, NULL); if (rc) return rc; fc->start = linear; @@ -671,11 +671,11 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt, op_bytes = 3; *address = 0; rc = ops->read_std((unsigned long)ptr, (unsigned long *)size, 2, - ctxt->vcpu); + ctxt->vcpu, NULL); if (rc) return rc; rc = ops->read_std((unsigned long)ptr + 2, address, op_bytes, - ctxt->vcpu); + ctxt->vcpu, NULL); return rc; } diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7397932..741373e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -138,12 +138,6 @@ module_param(oos_shadow, bool, 0644); #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \ | PT64_NX_MASK) -#define P
[PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation
From: Gleb Natapov For some instructions CPU behaves differently for real-mode and virtual 8086. Let emulator know which mode cpu is in, so it will not poke into vcpu state directly. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 12 +++- arch/x86/kvm/x86.c |3 ++- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 9b697c2..784d7c5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -168,6 +168,7 @@ struct x86_emulate_ctxt { /* Execution mode, passed to the emulator. */ #define X86EMUL_MODE_REAL 0/* Real mode. */ +#define X86EMUL_MODE_VM86 1/* Virtual 8086 mode. */ #define X86EMUL_MODE_PROT16 2/* 16-bit protected mode. */ #define X86EMUL_MODE_PROT32 4/* 32-bit protected mode. */ #define X86EMUL_MODE_PROT64 8/* 64-bit (long) mode.*/ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 45a4f7c..e4e2df3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -899,6 +899,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) switch (mode) { case X86EMUL_MODE_REAL: + case X86EMUL_MODE_VM86: case X86EMUL_MODE_PROT16: def_op_bytes = def_ad_bytes = 2; break; @@ -1525,7 +1526,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) /* syscall is not available in real mode */ if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL - || !is_protmode(ctxt->vcpu)) + || ctxt->mode == X86EMUL_MODE_VM86) return -1; setup_syscalls_segments(ctxt, &cs, &ss); @@ -1577,8 +1578,8 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) if (c->lock_prefix) return -1; - /* inject #GP if in real mode or paging is disabled */ - if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) { + /* inject #GP if in real mode */ + if (ctxt->mode == X86EMUL_MODE_REAL) { kvm_inject_gp(ctxt->vcpu, 0); return -1; } @@ -1642,8 +1643,9 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) if (c->lock_prefix) return -1; - /* inject #GP if in real mode or paging is disabled */ - if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) { + /* inject #GP if in real mode or Virtual 8086 mode */ + if (ctxt->mode == X86EMUL_MODE_REAL || + ctxt->mode == X86EMUL_MODE_VM86) { kvm_inject_gp(ctxt->vcpu, 0); return -1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b2f91b9..a283795 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3348,8 +3348,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu->arch.emulate_ctxt.vcpu = vcpu; vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu); vcpu->arch.emulate_ctxt.mode = + (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) - ? X86EMUL_MODE_REAL : cs_l + ? X86EMUL_MODE_VM86 : cs_l ? X86EMUL_MODE_PROT64 : cs_db ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust
On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote: > > > > > >> Besides this, proper #DB forwarding to the guest was missing. > > > During NMI injection? How to reproduce? > > > > Inject, e.g., an NMI over code with TF set. A bit harder is placing a > > guest HW breakpoint at the spot the NMI handler returns to. > > > Will try to reproduce. > How can I make gdb to run debugged process with TF set? Is this patch fixes it: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 52f78dd..b85b200 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -109,6 +109,7 @@ struct vcpu_svm { struct nested_state nested; bool nmi_singlestep; + bool nmi_singlestep_tf; }; /* enable NPT for AMD64 and X86 with PAE */ @@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm) if (svm->nmi_singlestep) { svm->nmi_singlestep = false; - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) { svm->vmcb->save.rflags &= ~(X86_EFLAGS_TF | X86_EFLAGS_RF); + if (svm->nmi_singlestep_tf) { + svm->vmcb->save.rflags |= X86_EFLAGS_TF; + kvm_queue_exception(&svm->vcpu, DB_VECTOR); + } + } update_db_intercept(&svm->vcpu); } @@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) possible problem (IRET or exception injection or interrupt shadow) */ svm->nmi_singlestep = true; + svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF); svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); update_db_intercept(vcpu); } -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT
From: Sheng Yang Following the new SDM. Now the bit is named "Ignore PAT memory type". Signed-off-by: Sheng Yang Signed-off-by: Avi Kivity --- arch/x86/include/asm/vmx.h |2 +- arch/x86/kvm/vmx.c |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 43f1e9b..fb9a080 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -377,7 +377,7 @@ enum vmcs_field { #define VMX_EPT_READABLE_MASK 0x1ull #define VMX_EPT_WRITABLE_MASK 0x2ull #define VMX_EPT_EXECUTABLE_MASK0x4ull -#define VMX_EPT_IGMT_BIT (1ull << 6) +#define VMX_EPT_IPAT_BIT (1ull << 6) #define VMX_EPT_IDENTITY_PAGETABLE_ADDR0xfffbc000ul diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b400be0..f82b072 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4001,7 +4001,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) * b. VT-d with snooping control feature: snooping control feature of * VT-d engine can guarantee the cache correctness. Just set it * to WB to keep consistent with host. So the same as item 3. -* 3. EPT without VT-d: always map as WB and set IGMT=1 to keep +* 3. EPT without VT-d: always map as WB and set IPAT=1 to keep *consistent with host MTRR */ if (is_mmio) @@ -4012,7 +4012,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) VMX_EPT_MT_EPTE_SHIFT; else ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) - | VMX_EPT_IGMT_BIT; + | VMX_EPT_IPAT_BIT; return ret; } -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
From: Wei Yongjun If we fail to init ioapic device or the fail to setup the default irq routing, the device register by kvm_create_pic() and kvm_ioapic_init() remain unregister. This patch fixed to do this. Signed-off-by: Wei Yongjun Signed-off-by: Avi Kivity --- arch/x86/kvm/i8259.c | 11 +++ arch/x86/kvm/irq.h |1 + arch/x86/kvm/x86.c |8 virt/kvm/ioapic.c| 11 +++ virt/kvm/ioapic.h|1 + 5 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index d5753a7..a3711f9 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -543,3 +543,14 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm) return s; } + +void kvm_destroy_pic(struct kvm *kvm) +{ + struct kvm_pic *vpic = kvm->arch.vpic; + + if (vpic) { + kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &vpic->dev); + kvm->arch.vpic = NULL; + kfree(vpic); + } +} diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index be399e2..0b71d48 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -75,6 +75,7 @@ struct kvm_pic { }; struct kvm_pic *kvm_create_pic(struct kvm *kvm); +void kvm_destroy_pic(struct kvm *kvm); int kvm_pic_read_irq(struct kvm *kvm); void kvm_pic_update_irq(struct kvm_pic *s); void kvm_pic_clear_isr_ack(struct kvm *kvm); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bd3161c..b2f91b9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2771,6 +2771,8 @@ long kvm_arch_vm_ioctl(struct file *filp, if (vpic) { r = kvm_ioapic_init(kvm); if (r) { + kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, + &vpic->dev); kfree(vpic); goto create_irqchip_unlock; } @@ -2782,10 +2784,8 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_setup_default_irq_routing(kvm); if (r) { mutex_lock(&kvm->irq_lock); - kfree(kvm->arch.vpic); - kfree(kvm->arch.vioapic); - kvm->arch.vpic = NULL; - kvm->arch.vioapic = NULL; + kvm_ioapic_destroy(kvm); + kvm_destroy_pic(kvm); mutex_unlock(&kvm->irq_lock); } create_irqchip_unlock: diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index f3d0693..3db15a8 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -401,6 +401,17 @@ int kvm_ioapic_init(struct kvm *kvm) return ret; } +void kvm_ioapic_destroy(struct kvm *kvm) +{ + struct kvm_ioapic *ioapic = kvm->arch.vioapic; + + if (ioapic) { + kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev); + kvm->arch.vioapic = NULL; + kfree(ioapic); + } +} + int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state) { struct kvm_ioapic *ioapic = ioapic_irqchip(kvm); diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index a505ce9..8a751b7 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -72,6 +72,7 @@ int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2); void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode); int kvm_ioapic_init(struct kvm *kvm); +void kvm_ioapic_destroy(struct kvm *kvm); int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level); void kvm_ioapic_reset(struct kvm_ioapic *ioapic); int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging
Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 11 --- include/trace/events/kvm.h | 22 ++ 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b8da671..7397932 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -151,6 +151,9 @@ module_param(oos_shadow, bool, 0644); #define ACC_USER_MASKPT_USER_MASK #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) +#include + +#undef TRACE_INCLUDE_FILE #define CREATE_TRACE_POINTS #include "mmutrace.h" @@ -792,6 +795,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, unsigned long data)) { int i, j; + int ret; int retval = 0; struct kvm_memslots *slots; @@ -806,16 +810,17 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, if (hva >= start && hva < end) { gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; - retval |= handler(kvm, &memslot->rmap[gfn_offset], - data); + ret = handler(kvm, &memslot->rmap[gfn_offset], data); for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) { int idx = gfn_offset; idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j); - retval |= handler(kvm, + ret |= handler(kvm, &memslot->lpage_info[j][idx].rmap_pde, data); } + trace_kvm_age_page(hva, memslot, ret); + retval |= ret; } } diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 8abdc12..b17d49d 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -164,6 +164,28 @@ TRACE_EVENT(kvm_fpu, TP_printk("%s", __print_symbolic(__entry->load, kvm_fpu_load_symbol)) ); +TRACE_EVENT(kvm_age_page, + TP_PROTO(ulong hva, struct kvm_memory_slot *slot, int ref), + TP_ARGS(hva, slot, ref), + + TP_STRUCT__entry( + __field(u64,hva ) + __field(u64,gfn ) + __field(u8, referenced ) + ), + + TP_fast_assign( + __entry->hva= hva; + __entry->gfn= + slot->base_gfn + ((hva - slot->userspace_addr) >> PAGE_SHIFT); + __entry->referenced = ref; + ), + + TP_printk("hva %llx gfn %llx %s", + __entry->hva, __entry->gfn, + __entry->referenced ? "YOUNG" : "OLD") +); + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit
From: Wei Yongjun If fail to create pit, we should unregister kvm irq notifier which register in kvm_create_pit(). Signed-off-by: Wei Yongjun Acked-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/x86/kvm/i8254.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 6a74246..c9569f2 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -663,8 +663,9 @@ fail_unregister: kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev); fail: - if (pit->irq_source_id >= 0) - kvm_free_irq_source_id(kvm, pit->irq_source_id); + kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); + kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier); + kvm_free_irq_source_id(kvm, pit->irq_source_id); kfree(pit); return NULL; -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation
From: Gleb Natapov Add CPL checking in case emulator is tricked into emulating privilege instruction from userspace. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 35 --- 1 files changed, 20 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1782387..d632111 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -76,6 +76,7 @@ #define GroupDual (1<<15) /* Alternate decoding of mod == 3 */ #define GroupMask 0xff/* Group number stored in bits 0:7 */ /* Misc flags */ +#define Priv(1<<27) /* instruction generates #GP if current CPL != 0 */ #define No64 (1<<28) /* Source 2 operand type */ #define Src2None(0<<29) @@ -211,7 +212,7 @@ static u32 opcode_table[256] = { SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* 0xF0 - 0xF7 */ 0, 0, 0, 0, - ImplicitOps, ImplicitOps, Group | Group3_Byte, Group | Group3, + ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3, /* 0xF8 - 0xFF */ ImplicitOps, 0, ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps, Group | Group4, Group | Group5, @@ -219,16 +220,20 @@ static u32 opcode_table[256] = { static u32 twobyte_table[256] = { /* 0x00 - 0x0F */ - 0, Group | GroupDual | Group7, 0, 0, 0, ImplicitOps, ImplicitOps, 0, - ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0, + 0, Group | GroupDual | Group7, 0, 0, + 0, ImplicitOps, ImplicitOps | Priv, 0, + ImplicitOps | Priv, ImplicitOps | Priv, 0, 0, + 0, ImplicitOps | ModRM, 0, 0, /* 0x10 - 0x1F */ 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, 0, /* 0x20 - 0x2F */ - ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0, + ModRM | ImplicitOps | Priv, ModRM | Priv, + ModRM | ImplicitOps | Priv, ModRM | Priv, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x30 - 0x3F */ - ImplicitOps, 0, ImplicitOps, 0, - ImplicitOps, ImplicitOps, 0, 0, + ImplicitOps | Priv, 0, ImplicitOps | Priv, 0, + ImplicitOps, ImplicitOps | Priv, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x40 - 0x47 */ DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, @@ -322,9 +327,9 @@ static u32 group_table[] = { SrcMem | ModRM | Stack, 0, SrcMem | ModRM | Stack, 0, SrcMem | ModRM | Stack, 0, [Group7*8] = - 0, 0, ModRM | SrcMem, ModRM | SrcMem, + 0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv, SrcNone | ModRM | DstMem | Mov, 0, - SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp, + SrcMem16 | ModRM | Mov | Priv, SrcMem | ModRM | ByteOp | Priv, [Group8*8] = 0, 0, 0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM, @@ -335,7 +340,7 @@ static u32 group_table[] = { static u32 group2_table[] = { [Group7*8] = - SrcNone | ModRM, 0, 0, SrcNone | ModRM, + SrcNone | ModRM | Priv, 0, 0, SrcNone | ModRM, SrcNone | ModRM | DstMem | Mov, 0, SrcMem16 | ModRM | Mov, 0, [Group9*8] = @@ -1700,12 +1705,6 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return -1; } - /* sysexit must be called from CPL 0 */ - if (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0) { - kvm_inject_gp(ctxt->vcpu, 0); - return -1; - } - setup_syscalls_segments(ctxt, &cs, &ss); if ((c->rex_prefix & 0x8) != 0x0) @@ -1820,6 +1819,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); saved_eip = c->eip; + /* Privileged instruction can be executed only in CPL=0 */ + if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } + if (((c->d & ModRM) && (c->modrm_mod != 3)) || (c->d & MemAbs)) memop = c->modrm_ea; -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/20] KVM: do not store wqh in irqfd
From: Michael S. Tsirkin wqh is unused, so we do not need to store it in irqfd anymore Signed-off-by: Michael S. Tsirkin Signed-off-by: Avi Kivity --- virt/kvm/eventfd.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 486c604..7016319 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -47,7 +47,6 @@ struct _irqfd { int gsi; struct list_head list; poll_tablept; - wait_queue_head_t*wqh; wait_queue_t wait; struct work_structinject; struct work_structshutdown; @@ -159,8 +158,6 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt); - - irqfd->wqh = wqh; add_wait_queue(wqh, &irqfd->wait); } -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation
From: Gleb Natapov Make emulator check that vcpu is allowed to execute IN, INS, OUT, OUTS, CLI, STI. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/emulate.c | 89 +++--- arch/x86/kvm/x86.c | 10 ++--- 3 files changed, 87 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c07c16f..f9a2f66 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -678,6 +678,7 @@ void kvm_disable_tdp(void); int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); int complete_pio(struct kvm_vcpu *vcpu); +bool kvm_check_iopl(struct kvm_vcpu *vcpu); struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c44b460..296e851 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1698,6 +1698,57 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return 0; } +static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) +{ + int iopl; + if (ctxt->mode == X86EMUL_MODE_REAL) + return false; + if (ctxt->mode == X86EMUL_MODE_VM86) + return true; + iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; + return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl; +} + +static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 port, u16 len) +{ + struct kvm_segment tr_seg; + int r; + u16 io_bitmap_ptr; + u8 perm, bit_idx = port & 0x7; + unsigned mask = (1 << len) - 1; + + kvm_get_segment(ctxt->vcpu, &tr_seg, VCPU_SREG_TR); + if (tr_seg.unusable) + return false; + if (tr_seg.limit < 103) + return false; + r = ops->read_std(tr_seg.base + 102, &io_bitmap_ptr, 2, ctxt->vcpu, + NULL); + if (r != X86EMUL_CONTINUE) + return false; + if (io_bitmap_ptr + port/8 > tr_seg.limit) + return false; + r = ops->read_std(tr_seg.base + io_bitmap_ptr + port/8, &perm, 1, + ctxt->vcpu, NULL); + if (r != X86EMUL_CONTINUE) + return false; + if ((perm >> bit_idx) & mask) + return false; + return true; +} + +static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 port, u16 len) +{ + if (emulator_bad_iopl(ctxt)) + if (!emulator_io_port_access_allowed(ctxt, ops, port, len)) + return false; + return true; +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { @@ -1889,7 +1940,12 @@ special_insn: break; case 0x6c: /* insb */ case 0x6d: /* insw/insd */ -if (kvm_emulate_pio_string(ctxt->vcpu, + if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX], + (c->d & ByteOp) ? 1 : c->op_bytes)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } + if (kvm_emulate_pio_string(ctxt->vcpu, 1, (c->d & ByteOp) ? 1 : c->op_bytes, c->rep_prefix ? @@ -1905,6 +1961,11 @@ special_insn: return 0; case 0x6e: /* outsb */ case 0x6f: /* outsw/outsd */ + if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX], + (c->d & ByteOp) ? 1 : c->op_bytes)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } if (kvm_emulate_pio_string(ctxt->vcpu, 0, (c->d & ByteOp) ? 1 : c->op_bytes, @@ -2202,7 +2263,13 @@ special_insn: case 0xef: /* out (e/r)ax,dx */ port = c->regs[VCPU_REGS_RDX]; io_dir_in = 0; - do_io: if (kvm_emulate_pio(ctxt->vcpu, io_dir_in, + do_io: + if (!emulator_io_permited(ctxt, ops, port, + (c->d & ByteOp) ? 1 : c->op_bytes)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } + if (kvm_emulate_pio(ctxt->vcpu, io_dir_in, (c->d & ByteOp) ? 1 : c->op_bytes, port) != 0) { c->eip = saved_eip; @@ -2227,13 +2294,21 @@ special_insn:
[PATCH 19/20] KVM: x86 emulator: code style cleanup
From: Wei Yongjun Just remove redundant semicolon. Signed-off-by: Wei Yongjun Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c2de9f0..dd1b935 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1016,7 +1016,7 @@ done_prefixes: } if (mode == X86EMUL_MODE_PROT64 && (c->d & No64)) { - kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 instruction");; + kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 instruction"); return -1; } -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/20] KVM: x86 emulator: Fix popf emulation
From: Gleb Natapov POPF behaves differently depending on current CPU mode. Emulate correct logic to prevent guest from changing flags that it can't change otherwise. Signed-off-by: Gleb Natapov Cc: sta...@kernel.org Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 55 +++- 1 files changed, 54 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 296e851..1782387 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -343,11 +343,18 @@ static u32 group2_table[] = { }; /* EFLAGS bit definitions. */ +#define EFLG_ID (1<<21) +#define EFLG_VIP (1<<20) +#define EFLG_VIF (1<<19) +#define EFLG_AC (1<<18) #define EFLG_VM (1<<17) #define EFLG_RF (1<<16) +#define EFLG_IOPL (3<<12) +#define EFLG_NT (1<<14) #define EFLG_OF (1<<11) #define EFLG_DF (1<<10) #define EFLG_IF (1<<9) +#define EFLG_TF (1<<8) #define EFLG_SF (1<<7) #define EFLG_ZF (1<<6) #define EFLG_AF (1<<4) @@ -1214,6 +1221,49 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, return rc; } +static int emulate_popf(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + void *dest, int len) +{ + int rc; + unsigned long val, change_mask; + int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; + int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu); + + rc = emulate_pop(ctxt, ops, &val, len); + if (rc != X86EMUL_CONTINUE) + return rc; + + change_mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | EFLG_OF + | EFLG_TF | EFLG_DF | EFLG_NT | EFLG_RF | EFLG_AC | EFLG_ID; + + switch(ctxt->mode) { + case X86EMUL_MODE_PROT64: + case X86EMUL_MODE_PROT32: + case X86EMUL_MODE_PROT16: + if (cpl == 0) + change_mask |= EFLG_IOPL; + if (cpl <= iopl) + change_mask |= EFLG_IF; + break; + case X86EMUL_MODE_VM86: + if (iopl < 3) { + kvm_inject_gp(ctxt->vcpu, 0); + return X86EMUL_PROPAGATE_FAULT; + } + change_mask |= EFLG_IF; + break; + default: /* real mode */ + change_mask |= (EFLG_IOPL | EFLG_IF); + break; + } + + *(unsigned long *)dest = + (ctxt->eflags & ~change_mask) | (val & change_mask); + + return rc; +} + static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg) { struct decode_cache *c = &ctxt->decode; @@ -2099,7 +2149,10 @@ special_insn: c->dst.type = OP_REG; c->dst.ptr = (unsigned long *) &ctxt->eflags; c->dst.bytes = c->op_bytes; - goto pop_instruction; + rc = emulate_popf(ctxt, ops, &c->dst.val, c->op_bytes); + if (rc != X86EMUL_CONTINUE) + goto done; + break; case 0xa0 ... 0xa1: /* mov */ c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX]; c->dst.val = c->src.val; -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest
From: Liu Yu Old method prematurely sets ESR and DEAR. Move this part after we decide to inject interrupt, which is more like hardware behave. Signed-off-by: Liu Yu Acked-by: Hollis Blanchard Acked-by: Alexander Graf Signed-off-by: Avi Kivity --- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/kvm/booke.c| 59 ++- arch/powerpc/kvm/emulate.c |4 +- 3 files changed, 48 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 715aa6b..5e5bae7 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -259,6 +259,8 @@ struct kvm_vcpu_arch { #endif ulong fault_dear; ulong fault_esr; + ulong queued_dear; + ulong queued_esr; gpa_t paddr_accessed; u8 io_gpr; /* GPR used as IO source/target */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index e283e44..4d686cc 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,9 +82,32 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, &vcpu->arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) +static void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, +ulong dear_flags, ulong esr_flags) { - /* BookE does flags in ESR, so ignore those we get here */ + vcpu->arch.queued_dear = dear_flags; + vcpu->arch.queued_esr = esr_flags; + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS); +} + +static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, + ulong dear_flags, ulong esr_flags) +{ + vcpu->arch.queued_dear = dear_flags; + vcpu->arch.queued_esr = esr_flags; + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE); +} + +static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, + ulong esr_flags) +{ + vcpu->arch.queued_esr = esr_flags; + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE); +} + +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong esr_flags) +{ + vcpu->arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } @@ -115,14 +138,19 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, { int allowed = 0; ulong msr_mask; + bool update_esr = false, update_dear = false; switch (priority) { - case BOOKE_IRQPRIO_PROGRAM: case BOOKE_IRQPRIO_DTLB_MISS: - case BOOKE_IRQPRIO_ITLB_MISS: - case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_DATA_STORAGE: + update_dear = true; + /* fall through */ case BOOKE_IRQPRIO_INST_STORAGE: + case BOOKE_IRQPRIO_PROGRAM: + update_esr = true; + /* fall through */ + case BOOKE_IRQPRIO_ITLB_MISS: + case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: case BOOKE_IRQPRIO_SPE_UNAVAIL: case BOOKE_IRQPRIO_SPE_FP_DATA: @@ -157,6 +185,10 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, vcpu->arch.srr0 = vcpu->arch.pc; vcpu->arch.srr1 = vcpu->arch.msr; vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[priority]; + if (update_esr == true) + vcpu->arch.esr = vcpu->arch.queued_esr; + if (update_dear == true) + vcpu->arch.dear = vcpu->arch.queued_dear; kvmppc_set_msr(vcpu, vcpu->arch.msr & msr_mask); clear_bit(priority, &vcpu->arch.pending_exceptions); @@ -229,8 +261,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, if (vcpu->arch.msr & MSR_PR) { /* Program traps generated by user-level software must be handled * by the guest kernel. */ - vcpu->arch.esr = vcpu->arch.fault_esr; - kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); + kvmppc_core_queue_program(vcpu, vcpu->arch.fault_esr); r = RESUME_GUEST; kvmppc_account_exit(vcpu, USR_PR_INST); break; @@ -286,16 +317,14 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; case BOOKE_INTERRUPT_DATA_STORAGE: - vcpu->arch.dear = vcpu->arch.fault_dear; - vcpu->arch.esr = vcpu->arch.fault_esr; - kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE); + kvmppc_core_queue_data_storage(vcpu, vcpu->arch.fault_dear, + vcpu->arch.fault_esr); kvmppc_acc
[PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking
From: Gleb Natapov Instructions which are not allowed to have LOCK prefix should generate #UD if one is used. [avi: fold opcode 82 fix from another patch] Signed-off-by: Gleb Natapov Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 97 +++ 1 files changed, 56 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d632111..c2de9f0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -76,6 +76,7 @@ #define GroupDual (1<<15) /* Alternate decoding of mod == 3 */ #define GroupMask 0xff/* Group number stored in bits 0:7 */ /* Misc flags */ +#define Lock(1<<26) /* lock prefix is allowed for the instruction */ #define Priv(1<<27) /* instruction generates #GP if current CPL != 0 */ #define No64 (1<<28) /* Source 2 operand type */ @@ -94,35 +95,35 @@ enum { static u32 opcode_table[256] = { /* 0x00 - 0x07 */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x08 - 0x0F */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, ImplicitOps | Stack | No64, 0, /* 0x10 - 0x17 */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x18 - 0x1F */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, ImplicitOps | Stack | No64, ImplicitOps | Stack | No64, /* 0x20 - 0x27 */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, DstAcc | SrcImmByte, DstAcc | SrcImm, 0, 0, /* 0x28 - 0x2F */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, 0, 0, 0, 0, /* 0x30 - 0x37 */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, 0, 0, 0, 0, /* 0x38 - 0x3F */ @@ -158,7 +159,7 @@ static u32 opcode_table[256] = { Group | Group1_80, Group | Group1_81, Group | Group1_82, Group | Group1_83, ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, /* 0x88 - 0x8F */ ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, @@ -263,17 +264,18 @@ static u32 twobyte_table[256] = { DstMem | SrcReg | Src2CL | ModRM, 0, 0, /* 0xA8 - 0xAF */ ImplicitOps | Stack, ImplicitOps | Stack, - 0, DstMem | SrcReg | ModRM | BitOp, + 0, DstMem | SrcReg | ModRM | BitOp | Lock, DstMem | SrcReg | Src2ImmByte | ModRM, DstMem | SrcReg | Src2CL | ModRM, ModRM, 0, /* 0xB0 - 0xB7 */ - ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, 0, - DstMem | SrcReg | ModRM | BitOp, + ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock, + 0, DstMem | SrcReg | ModRM | BitOp | Lock, 0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem16 | ModRM | Mov, /* 0xB8 - 0xBF */ - 0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp, + 0, 0, + Group | Group8, DstMem | SrcReg | ModRM | BitOp | Lock, 0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem16 | ModRM | Mov, /* 0xC0 - 0xCF */ @@ -290,25 +292,41 @@ static u32 twobyte_table[256] = { static u32 group_table[] = { [Group1_80*8] = - ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, - ByteOp | DstMem |
[PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode
From: Gleb Natapov Instructions with opcode 82 are not valid in 64 bit mode. Signed-off-by: Gleb Natapov Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dd1b935..c280c23 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -310,14 +310,14 @@ static u32 group_table[] = { DstMem | SrcImm | ModRM | Lock, DstMem | SrcImm | ModRM, [Group1_82*8] = - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM | Lock, - ByteOp | DstMem | SrcImm | ModRM, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64 | Lock, + ByteOp | DstMem | SrcImm | ModRM | No64, [Group1_83*8] = DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock, -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
These features are unused by modern userspace and can go away. Paravirt mmu needs to stay a little longer for live migration. Signed-off-by: Avi Kivity --- Documentation/feature-removal-schedule.txt | 30 1 files changed, 30 insertions(+), 0 deletions(-) diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 0a46833..47a6554 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -542,3 +542,33 @@ Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only sensors) wich are also supported by the gspca_zc3xx driver (which supports 53 USB-ID's in total) Who: Hans de Goede + + + +What: KVM memory aliases support +When: July 2010 +Why: Memory aliasing support is used for speeding up guest vga access + through the vga windows. + + Modern userspace no longer uses this feature, so it's just bitrotted + code and can be removed with no impact. +Who: Avi Kivity + + + +What: KVM kernel-allocated memory slots +When: July 2010 +Why: Since 2.6.25, kvm supports user-allocated memory slots, which are + much more flexible than kernel-allocated slots. All current userspace + supports the newer interface and this code can be removed with no + impact. +Who: Avi Kivity + + + +What: KVM paravirt mmu host support +When: January 2011 +Why: The paravirt mmu host support is slower than non-paravirt mmu, both + on newer and older hardware. It is already not exposed to the guest, + and kept only for live migration purposes. +Who: Avi Kivity -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c
From: Jochen Maes Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c Signed-off-by: Jochen Maes Signed-off-by: Avi Kivity --- virt/kvm/coalesced_mmio.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5de6594..5169736 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -133,7 +133,7 @@ void kvm_coalesced_mmio_free(struct kvm *kvm) } int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, -struct kvm_coalesced_mmio_zone *zone) +struct kvm_coalesced_mmio_zone *zone) { struct kvm_coalesced_mmio_dev *dev = kvm->coalesced_mmio_dev; @@ -166,7 +166,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, mutex_lock(&kvm->slots_lock); i = dev->nb_zones; - while(i) { + while (i) { z = &dev->zone[i - 1]; /* unregister all zones -- 1.6.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/4] KVM: rework of "Fix x86 emulator's fault propagations"
On 02/12/2010 08:50 AM, Takuya Yoshikawa wrote: This is the rework of "Fix x86 emulator's fault propagations". -- http://www.spinics.net/lists/kvm/msg28874.html I read the review comments from Avi, Marcelo and Gleb and removed some parts which should be done with more care: descriptor related part and emulator_sys* part. Now the contents is like this: - patch 1: X86EMUL macro replacements: from do_fetch_insn_byte() to x86_decode_insn() - patch 2: X86EMUL macro replacements: x86_emulate_insn() and its helpers - patch 3: Fix x86_emulate_insn() not to use the variable rc for non-X86EMUL values - patch 4: Tiny fix: remove redundant prototype of load_pdptrs() Applied all, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote: > >> On 02/17/2010 12:43 PM, Gleb Natapov wrote: > And, again: This is an _existing_ user space ABI. We could only provide > an alternative, but we have to maintain what is there at least for some > longer grace period. > > >>> But it was always broken for SVM and was broken for VMX for a year and > >>> nobody noticed, so may be instead of reintroducing old interface we should > >>> do it right this time? > >> We need to fix the existing interface first, and then think long and > >> hard if we want yet another interface, since we're likely to screw > >> it up as well. > >> > >> The more interfaces we introduce, the harder maintenance becomes. > >> > > We are in a sad state if we cannot improve interface. The current one > > outsource part of CPU functionality into userspace. This should be a big > > no-no. > > I still disagree on this. Moving the decision logic to user space > prevented to re-implement a gdbstub in kernel space. I oversaw that > re-injecting #BP over older SVM was broken, but it is now fixed for all > vendors. So moving it back to kernel has actually no long-term reason. > There were patches to implement gdbstub in kernel space! And not so long time ago :) But I want to move only a tiny bit of logic into the kernel space. And #BP reinjection brokenness is a different issue. It should be fixed anyway no matter where decision about reinfection happens. If maintainers think that we should not have improved interface and we should support reinjection of #DB from userspace then this patch should be applied. I don't have other objections to it. But I, at least, would prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB) and not the new one. The old one makes it explicit what we are doing, the new one allows injection of any event and should be used only during migration or CPU reset. It would be event good idea to fail setting events if CPU is running. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote: > >> Jan Kiszka wrote: > >>> Gleb Natapov wrote: > Lets check if SVM works. I can do that if you tell me how. > >>> - Fire up some Linux guest with gdb installed > >>> - Attach gdb to gdbstub of the VM > >>> - Set a soft breakpoint in guest kernel, ideally where it does not > >>> immediately trigger, e.g. on sys_reboot (use grep sys_reboot > >>> /proc/kallsyms if you don't have symbols for the guest kernel) > >>> - Start gdb /bin/true in the guest > >>> - run > >>> > >>> As gdb sets some automatic breakpoints, this already exercises the > >>> reinjection of #BP. > >> I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), > >> and it just worked. > >> > > I tested it on processor without NextRIP and your test case works there too, > > but it shouldn't have, so I looked deeper into that and what I see is > > that GDB outsmart us. It doesn't matter if we inject event before int3 > > inserted by GDB or after it GDB correctly finds breakpoint that > > triggered and restart instruction correctly. I assume it doesn't use > > exact match between rip where int3 was inserted and where exceptions > > triggers. > > At latest when you have two successive breakpoints on single-byte > instructions, gdb will reach its limits (for it failed earlier, BTW). > And other debuggers under other OSes may become unhappy as well. Yes, and that is why I am saying checking with GDB is not a good test. GDB may work, but it doesn't mean injection works correctly. It took me some time to write test that finally confused gdb. It was like this: 1: int main(int argc, char **argv) 2: { 3: if (argc == 1) 4: goto a; 5: asm("cmc"); 6: a: 7: asm("cmc"); 8: return 0; 9: } If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB thinks it is on line 5. So can you run int3 test below on master on AMD with NextRIP support? I doubt the result will be correct. > > > But if I run program below on latest kernel which prints rip > > where #DB was delivered in dmesg I get different results with and > > without external breakpoint inserted. > > Does applying v2 of my patch corrects the picture? > Of course, since it now injects #DB at correct address. If exception will happen during #DB processing thins will go wrong, but we can do only so much on broken SVM without emulating int3 in software. > > > > int main(int argc, char **argv) > > { > > asm("int3"); > > return 0; > > } > > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: inject #UD in 64bit mode from instruction that are not valid there
On 02/11/2010 02:43 PM, Gleb Natapov wrote: Some instruction are obsolete in a long mode. Inject #UD. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [uq/master] use eventfd for iothread
On 02/11/2010 01:23 AM, Paolo Bonzini wrote: Signed-off-by: Paolo Bonzini Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] qemu-kvm: prepare for adding eventfd usage to upstream
On 02/11/2010 01:09 AM, Paolo Bonzini wrote: This patch series morphs the code in qemu-kvm's eventfd so that it looks like the code in upstream qemu. Patch 4 is not yet in upstream QEMU, I'm submitting it first to qemu-kvm to avoid conflicts. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 12:24:19PM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote: > >> On 02/15/2010 03:30 PM, Gleb Natapov wrote: > I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), > and it just worked. > > But this is a fairly new processor. Consequently, it reports NextRIP > support via cpuid function 0x800A. Looking for an older one too. > > In the meantime I also browsed a bit more in the manuals, and I don't > think stepping over or (what is actually required) into an INT3 will > work. We can't step into as the processor clears TF on any event handler > entry. And stepping over would cause troubles > > a) as an unknown amount of code may run without #DB interception > b) we would fiddle with TF in code that is already under debugger > control, thus we would very likely run into conflicts. > > Leaves us with tricky INT3 emulation. Sigh. > > >>> So the question is do we want to support this kind of debugging on older > >>> AMDs. May we don't. > >> How much older are they? > >> > > Actually I am not sure new AMDs support this correctly. Need one to run > > tests. GDB is not a good test case, it is too smart. > > It works well - and gdb is far from being "smart": one byte off the > expected INT3 address, and everything falls apart. That's what the VMX > bug demonstrated. > Simple test on AMD shows the one byte off doesn't matter for GDB, at least as long as this byte still belong to the same instruction or may be same line of source code. On VMX something else happens. I can't reproduce problem on master with VMX since event_exit_inst_len is always 1 when #DB is reinjected. May be in your test we are much more then 1 byte off on VMX? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Build fix for #define KVM_DEBUG
Avi Kivity wrote: > On 02/17/2010 04:41 AM, Tsuyoshi Ozawa wrote: shadow_efer was renamed to efer, so this should be modified rather than deleted. >>> OK. The new patch uses efer instead of deleting shadow_efer >>> >> Excuse me, and what should I do next ? >> > > Copy Jan - he maintains kvm-kmod, and probably didn't see your patch. > Yes, I did. Proper subject prefixing can help a lot here... Could you please repost, avoiding that the patch is line-wrapped and giving it an up-to-date changelog? TIA, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
Gleb Natapov wrote: > On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote: >> Jan Kiszka wrote: >>> Gleb Natapov wrote: Lets check if SVM works. I can do that if you tell me how. >>> - Fire up some Linux guest with gdb installed >>> - Attach gdb to gdbstub of the VM >>> - Set a soft breakpoint in guest kernel, ideally where it does not >>> immediately trigger, e.g. on sys_reboot (use grep sys_reboot >>> /proc/kallsyms if you don't have symbols for the guest kernel) >>> - Start gdb /bin/true in the guest >>> - run >>> >>> As gdb sets some automatic breakpoints, this already exercises the >>> reinjection of #BP. >> I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), >> and it just worked. >> > I tested it on processor without NextRIP and your test case works there too, > but it shouldn't have, so I looked deeper into that and what I see is > that GDB outsmart us. It doesn't matter if we inject event before int3 > inserted by GDB or after it GDB correctly finds breakpoint that > triggered and restart instruction correctly. I assume it doesn't use > exact match between rip where int3 was inserted and where exceptions > triggers. At latest when you have two successive breakpoints on single-byte instructions, gdb will reach its limits (for it failed earlier, BTW). And other debuggers under other OSes may become unhappy as well. > But if I run program below on latest kernel which prints rip > where #DB was delivered in dmesg I get different results with and > without external breakpoint inserted. Does applying v2 of my patch corrects the picture? > > int main(int argc, char **argv) > { > asm("int3"); > return 0; > } > Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
Gleb Natapov wrote: > On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote: >> On 02/15/2010 03:30 PM, Gleb Natapov wrote: I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), and it just worked. But this is a fairly new processor. Consequently, it reports NextRIP support via cpuid function 0x800A. Looking for an older one too. In the meantime I also browsed a bit more in the manuals, and I don't think stepping over or (what is actually required) into an INT3 will work. We can't step into as the processor clears TF on any event handler entry. And stepping over would cause troubles a) as an unknown amount of code may run without #DB interception b) we would fiddle with TF in code that is already under debugger control, thus we would very likely run into conflicts. Leaves us with tricky INT3 emulation. Sigh. >>> So the question is do we want to support this kind of debugging on older >>> AMDs. May we don't. >> How much older are they? >> > Actually I am not sure new AMDs support this correctly. Need one to run > tests. GDB is not a good test case, it is too smart. It works well - and gdb is far from being "smart": one byte off the expected INT3 address, and everything falls apart. That's what the VMX bug demonstrated. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
Gleb Natapov wrote: > On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote: >> On 02/17/2010 12:43 PM, Gleb Natapov wrote: And, again: This is an _existing_ user space ABI. We could only provide an alternative, but we have to maintain what is there at least for some longer grace period. >>> But it was always broken for SVM and was broken for VMX for a year and >>> nobody noticed, so may be instead of reintroducing old interface we should >>> do it right this time? >> We need to fix the existing interface first, and then think long and >> hard if we want yet another interface, since we're likely to screw >> it up as well. >> >> The more interfaces we introduce, the harder maintenance becomes. >> > We are in a sad state if we cannot improve interface. The current one > outsource part of CPU functionality into userspace. This should be a big > no-no. I still disagree on this. Moving the decision logic to user space prevented to re-implement a gdbstub in kernel space. I oversaw that re-injecting #BP over older SVM was broken, but it is now fixed for all vendors. So moving it back to kernel has actually no long-term reason. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote: > On 02/17/2010 12:43 PM, Gleb Natapov wrote: > >>And, again: This is an _existing_ user space ABI. We could only provide > >>an alternative, but we have to maintain what is there at least for some > >>longer grace period. > >> > >But it was always broken for SVM and was broken for VMX for a year and > >nobody noticed, so may be instead of reintroducing old interface we should > >do it right this time? > > We need to fix the existing interface first, and then think long and > hard if we want yet another interface, since we're likely to screw > it up as well. > > The more interfaces we introduce, the harder maintenance becomes. > We are in a sad state if we cannot improve interface. The current one outsource part of CPU functionality into userspace. This should be a big no-no. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote: > On 02/15/2010 03:30 PM, Gleb Natapov wrote: > > > >>I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), > >>and it just worked. > >> > >>But this is a fairly new processor. Consequently, it reports NextRIP > >>support via cpuid function 0x800A. Looking for an older one too. > >> > >>In the meantime I also browsed a bit more in the manuals, and I don't > >>think stepping over or (what is actually required) into an INT3 will > >>work. We can't step into as the processor clears TF on any event handler > >>entry. And stepping over would cause troubles > >> > >>a) as an unknown amount of code may run without #DB interception > >>b) we would fiddle with TF in code that is already under debugger > >>control, thus we would very likely run into conflicts. > >> > >>Leaves us with tricky INT3 emulation. Sigh. > >> > >So the question is do we want to support this kind of debugging on older > >AMDs. May we don't. > > How much older are they? > Actually I am not sure new AMDs support this correctly. Need one to run tests. GDB is not a good test case, it is too smart. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
On 02/17/2010 12:43 PM, Gleb Natapov wrote: And, again: This is an _existing_ user space ABI. We could only provide an alternative, but we have to maintain what is there at least for some longer grace period. But it was always broken for SVM and was broken for VMX for a year and nobody noticed, so may be instead of reintroducing old interface we should do it right this time? We need to fix the existing interface first, and then think long and hard if we want yet another interface, since we're likely to screw it up as well. The more interfaces we introduce, the harder maintenance becomes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On 02/15/2010 03:30 PM, Gleb Natapov wrote: I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), and it just worked. But this is a fairly new processor. Consequently, it reports NextRIP support via cpuid function 0x800A. Looking for an older one too. In the meantime I also browsed a bit more in the manuals, and I don't think stepping over or (what is actually required) into an INT3 will work. We can't step into as the processor clears TF on any event handler entry. And stepping over would cause troubles a) as an unknown amount of code may run without #DB interception b) we would fiddle with TF in code that is already under debugger control, thus we would very likely run into conflicts. Leaves us with tricky INT3 emulation. Sigh. So the question is do we want to support this kind of debugging on older AMDs. May we don't. How much older are they? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP
On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote: > Jan Kiszka wrote: > > Gleb Natapov wrote: > >> Lets check if SVM works. I can do that if you tell me how. > > > > - Fire up some Linux guest with gdb installed > > - Attach gdb to gdbstub of the VM > > - Set a soft breakpoint in guest kernel, ideally where it does not > > immediately trigger, e.g. on sys_reboot (use grep sys_reboot > > /proc/kallsyms if you don't have symbols for the guest kernel) > > - Start gdb /bin/true in the guest > > - run > > > > As gdb sets some automatic breakpoints, this already exercises the > > reinjection of #BP. > > I just did this on our primary AMD platform (Embedded Opteron, 13KS EE), > and it just worked. > I tested it on processor without NextRIP and your test case works there too, but it shouldn't have, so I looked deeper into that and what I see is that GDB outsmart us. It doesn't matter if we inject event before int3 inserted by GDB or after it GDB correctly finds breakpoint that triggered and restart instruction correctly. I assume it doesn't use exact match between rip where int3 was inserted and where exceptions triggers. But if I run program below on latest kernel which prints rip where #DB was delivered in dmesg I get different results with and without external breakpoint inserted. int main(int argc, char **argv) { asm("int3"); return 0; } > But this is a fairly new processor. Consequently, it reports NextRIP > support via cpuid function 0x800A. Looking for an older one too. > > In the meantime I also browsed a bit more in the manuals, and I don't > think stepping over or (what is actually required) into an INT3 will > work. We can't step into as the processor clears TF on any event handler > entry. And stepping over would cause troubles > > a) as an unknown amount of code may run without #DB interception > b) we would fiddle with TF in code that is already under debugger >control, thus we would very likely run into conflicts. > > Leaves us with tricky INT3 emulation. Sigh. > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recommended network driver for a windows KVM guest
Hi all, I need to install several windows KVM (rhel5.4 host fully updated) guests for iSCSI boot. iSCSI servers are Solaris/OpenSolaris storage servers and I need to boot windows guests (2008R2 and Win7) using gpxe. Can i use virtio net dirver during windows install or e1000 driver?? Many thanks. -- CL Martinez carlopmart {at} gmail {d0t} com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Build fix for #define KVM_DEBUG
On 02/17/2010 04:41 AM, Tsuyoshi Ozawa wrote: shadow_efer was renamed to efer, so this should be modified rather than deleted. OK. The new patch uses efer instead of deleting shadow_efer Excuse me, and what should I do next ? Copy Jan - he maintains kvm-kmod, and probably didn't see your patch. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP
On Tue, Feb 16, 2010 at 10:11:06AM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Tue, Feb 16, 2010 at 09:05:40AM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Mon, Feb 15, 2010 at 03:53:04PM +0100, Jan Kiszka wrote: > We intercept #BP while in guest debugging mode. As VM exits due to > intercepted exceptions do not necessarily come with valid > idt_vectoring, we have to update event_exit_inst_len explicitly in such > cases. At least in the absence of migration, this ensures that > re-injections of #BP will find and use the correct instruction length. > > >>> Thinking about it some more. Why do we exit to userspace at all if we > >>> intercept wrong #DB? It seams to me not wise to have ability to inject > >>> exceptions from userspace. Exceptions generation mechanism is a part of > >>> CPU and we shouldn't outsource part of CPU functionality to userspace. > >> The guest debugging API was design to avoid maintaining a "countless" > >> number of breakpoints in kernel space and instead chose to loop over > >> user space to decide about #DB & #BP. So this part is required even if > >> we start thinking about an alternative interface in the future. > >> > > How much is "countless"? 1? I am sure we can handle this. > > We could even handle more. But would have to > - handle INT3 injection in kernel space, including step-over on resume > - fully parse HW breakpoints in kernel space > - probably deal with some more complications that are now handled in >user space, part of them even in gdb > The first point in this list is needed no anyway, no matter who reinjects #BP event. About point three what are those complications? As far as I see all we need to know in kernel is a list of cr3:address pairs that have breakpoint set. If #BP intercept happens we scan this list and if match is not found reinject event to the guest otherwise exit to userspace. > And, again: This is an _existing_ user space ABI. We could only provide > an alternative, but we have to maintain what is there at least for some > longer grace period. > But it was always broken for SVM and was broken for VMX for a year and nobody noticed, so may be instead of reintroducing old interface we should do it right this time? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] QEMU: Balloon support for device assignment
On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote: From: Eran Borovik This patch adds modifications to allow correct balloon operation when a virtual guest uses a direct assigned device. The modifications include a new interface between qemu and kvm to allow mapping and unmapping the pages from the IOMMU as well as pinning and unpinning as needed. Note, on reset we deflate the balloon completely, since the BIOS and boot loader (and possibly the OS post-reboot) are not aware of ballooning. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM: Balloon support for device assignment
On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote: From: Eran Borovik This patch adds modifications to allow correct balloon operation when a virtual guest uses a direct assigned device. The modifications include a new interface between qemu and kvm to allow mapping and unmapping the pages from the IOMMU as well as pinning and unpinning as needed. The plan for iommu support is to push it into uio. Instead of kvm managing the iommu directly, I'd like qemu to open a uio device and set up an iommu mapping there, which will just happen to match the kvm memory slots. Similarly, interrupts will be forwarded using irqfds. This will allow using the iommu without kvm, and reduce the amount of special purpose kvm code. These patches make the transition more difficult which worries me. I know Gerd looked at making the move, but no longer. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 17.02.2010, at 10:47, Avi Kivity wrote: > On 02/17/2010 11:42 AM, OHMURA Kei wrote: > "We think"? I mean - yes, I think so too. But have you actually measured > it? > How much improvement are we talking here? > Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? >>> >>> It'd make more sense to have a real stand alone test program, no? >>> I can try to write one today, but I have some really nasty important bugs >>> to fix first. >> >> >> OK. I will prepare a test code with sample data. Since I found a ppc >> machine around, I will run the code and post the results of >> x86 and ppc. >> > > I've applied the patch - I think the x86 results justify it, and I'll be very > surprised if ppc doesn't show a similar gain. Skipping 7 memory accesses and > 7 tests must be a win. Sounds good to me. I don't assume bswap to be horribly slow either. Just want to be sure. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/17/2010 11:42 AM, OHMURA Kei wrote: "We think"? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. I've applied the patch - I think the x86 results justify it, and I'll be very surprised if ppc doesn't show a similar gain. Skipping 7 memory accesses and 7 tests must be a win. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] KVM: Balloon support for device assignment
From: Eran Borovik This patch adds modifications to allow correct balloon operation when a virtual guest uses a direct assigned device. The modifications include a new interface between qemu and kvm to allow mapping and unmapping the pages from the IOMMU as well as pinning and unpinning as needed. Signed-off-by: Eran Borovik --- include/linux/kvm.h |3 ++ include/linux/kvm_host.h |4 ++ virt/kvm/iommu.c | 86 +++-- virt/kvm/kvm_main.c |9 + 4 files changed, 98 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f8f8900..567f5f8 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -514,6 +514,9 @@ struct kvm_irqfd { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_IOMMU_UNMAP_PAGE _IOW(KVMIO, 0x49, __u64) +#define KVM_IOMMU_MAP_PAGE _IOW(KVMIO, 0x50, __u64) + /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b7bbb5d..ad904ec 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -411,6 +411,10 @@ int kvm_assign_device(struct kvm *kvm, struct kvm_assigned_dev_kernel *assigned_dev); int kvm_deassign_device(struct kvm *kvm, struct kvm_assigned_dev_kernel *assigned_dev); +void kvm_iommu_unmap_page(struct kvm *kvm, + gfn_t base_gfn); +int kvm_iommu_map_page(struct kvm *kvm, + gfn_t base_gfn); #else /* CONFIG_IOMMU_API */ static inline int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn, diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 1514758..54cfd33 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -190,23 +190,101 @@ static void kvm_iommu_put_pages(struct kvm *kvm, gfn_t gfn = base_gfn; pfn_t pfn; struct iommu_domain *domain = kvm->arch.iommu_domain; - unsigned long i; + unsigned long i, iommu_pages; u64 phys; /* check if iommu exists and in use */ if (!domain) return; - for (i = 0; i < npages; i++) { + for (i = 0, iommu_pages = 0; i < npages; i++, gfn++) { phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn)); + + /*Because of ballooning, there can be holes in the + range. In that case, we simply unmap everything + till now, and continue forward. + */ + if (!phys) { + + /*No consecutive IOMMU pages here*/ + if (iommu_pages == 0) + continue; + iommu_unmap_range(domain, + gfn_to_gpa(base_gfn), + PAGE_SIZE*iommu_pages); + + /*Reset consequtive iommu range counters*/ + base_gfn = gfn + 1; + iommu_pages = 0; + continue; + } pfn = phys >> PAGE_SHIFT; kvm_release_pfn_clean(pfn); - gfn++; + ++iommu_pages; } - iommu_unmap_range(domain, gfn_to_gpa(base_gfn), PAGE_SIZE * npages); + /*Unmap the last iommu range if any*/ + if (iommu_pages != 0) + iommu_unmap_range(domain, + gfn_to_gpa(base_gfn), + PAGE_SIZE * iommu_pages); +} + +/*Called to map a page from IOMMU */ +int kvm_iommu_map_page(struct kvm *kvm, + gfn_t base_gfn) +{ + gfn_t gfn = base_gfn; + pfn_t pfn; + struct iommu_domain *domain = kvm->arch.iommu_domain; + u64 phys; + int rc; + int flags; + + /* check if iommu exists and in use */ + if (!domain) + return 0; + phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn)); + + /*Verify addres is not mapped already*/ + if (phys) + return 0; + flags = IOMMU_READ | IOMMU_WRITE; + if (kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY) + flags |= IOMMU_CACHE; + pfn = gfn_to_pfn(kvm, gfn); + rc = iommu_map_range(domain, + gfn_to_gpa(gfn), + pfn_to_hpa(pfn), + PAGE_SIZE, flags); + return rc; +} + + + +/*Called to unmap a page from IOMMU */ +void kvm_iommu_unmap_page(struct kvm *kvm, + gfn_t base_gfn) +{ + gfn_t gfn = base_gfn; + pfn_t pfn; + struct iommu_domain *domain = kvm->arch.iommu_domain; + u64 phys; + + /* check
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 17.02.2010, at 10:42, OHMURA Kei wrote: "We think"? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? >>> Thanks for pointing out. >>> I will post the data for x86 later. >>> However, I don't have a test environment to check the impact of bswap. >>> Would you please measure the run time between the following section if >>> possible? >> It'd make more sense to have a real stand alone test program, no? >> I can try to write one today, but I have some really nasty important bugs to >> fix first. > > > OK. I will prepare a test code with sample data. Since I found a ppc > machine around, I will run the code and post the results of > x86 and ppc. > > > By the way, the following data is a result of x86 measured in QEMU/KVM. > This data shows, how many times the function is called (#called), runtime of > original function(orig.), runtime of this patch(patch), speedup ratio (ratio). That does indeed look promising! Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] QEMU: Balloon support for device assignment
From: Eran Borovik This patch adds modifications to allow correct balloon operation when a virtual guest uses a direct assigned device. The modifications include a new interface between qemu and kvm to allow mapping and unmapping the pages from the IOMMU as well as pinning and unpinning as needed. Signed-off-by: Eran Borovik --- hw/virtio-balloon.c | 13 ++--- kvm/include/linux/kvm.h |2 ++ kvm/libkvm/libkvm.h |4 qemu-kvm.c | 10 ++ qemu-kvm.h |4 5 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index 3792012..337f717 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -132,6 +132,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) elem.out_sg, elem.out_num) == 4) { ram_addr_t pa; ram_addr_t addr; + bool deflate; pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT; offset += 4; @@ -139,12 +140,18 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) addr = cpu_get_physical_page_desc(pa); if ((addr & ~TARGET_PAGE_MASK) != IO_MEM_RAM) continue; + deflate = !!(vq == s->dvq); +# ifdef KVM_CAP_DEVICE_ASSIGNMENT + if (deflate) + kvm_map_pfn(NULL, pfn); + else + kvm_unmap_pfn(NULL, pfn); +# endif /* Using qemu_get_ram_ptr is bending the rules a bit, but should be OK because we only want a single page. */ -balloon_page(qemu_get_ram_ptr(addr), !!(vq == s->dvq)); -} - + balloon_page(qemu_get_ram_ptr(addr), deflate); + } virtqueue_push(vq, &elem, offset); virtio_notify(vdev, vq); } diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h index 6485981..90f7723 100644 --- a/kvm/include/linux/kvm.h +++ b/kvm/include/linux/kvm.h @@ -595,6 +595,8 @@ struct kvm_clock_data { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_IOMMU_UNMAP_PAGE _IOW(KVMIO, 0x49, __u64) +#define KVM_IOMMU_MAP_PAGE _IOW(KVMIO, 0x50, __u64) /* Device model IOC */ #define KVM_CREATE_IRQCHIP_IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h index 4821a1e..7fa83b5 100644 --- a/kvm/libkvm/libkvm.h +++ b/kvm/libkvm/libkvm.h @@ -714,6 +714,10 @@ int kvm_s390_store_status(kvm_context_t kvm, int slot, unsigned long addr); int kvm_assign_pci_device(kvm_context_t kvm, struct kvm_assigned_pci_dev *assigned_dev); +int kvm_deflate_pfn(kvm_context_t kvm, uint32_t pfn); + +int kvm_inflate_pfn(kvm_context_t kvm, uint32_t pfn); + /*! * \brief Assign IRQ for an assigned device * diff --git a/qemu-kvm.c b/qemu-kvm.c index a305907..a5ca029 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1081,6 +1081,16 @@ static int kvm_old_assign_irq(kvm_context_t kvm, return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_IRQ, assigned_irq); } +int kvm_unmap_pfn(kvm_context_t kvm, uint32_t pfn) +{ + return kvm_vm_ioctl(kvm_state, KVM_IOMMU_UNMAP_PAGE, pfn); +} + +int kvm_map_pfn(kvm_context_t kvm, uint32_t pfn) +{ + return kvm_vm_ioctl(kvm_state, KVM_IOMMU_MAP_PAGE, pfn); +} + #ifdef KVM_CAP_ASSIGN_DEV_IRQ int kvm_assign_irq(kvm_context_t kvm, struct kvm_assigned_irq *assigned_irq) { diff --git a/qemu-kvm.h b/qemu-kvm.h index 6b3e5a1..861c336 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -691,6 +691,10 @@ int kvm_s390_store_status(kvm_context_t kvm, int slot, unsigned long addr); int kvm_assign_pci_device(kvm_context_t kvm, struct kvm_assigned_pci_dev *assigned_dev); +int kvm_unmap_pfn(kvm_context_t kvm, uint32_t pfn); + +int kvm_map_pfn(kvm_context_t kvm, uint32_t pfn); + /*! * \brief Assign IRQ for an assigned device * -- 1.6.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Balloon support for device assignment
Currently device assignment forces pinning the entire guest memory. The following kernel and qemu patches add balloon support for device assignment. When the balloon inflates, the corresponding pages are unmapped from the IOMMU and unpinned, and accordingly they are remapped and pinned when the balloon deflates. The kernel patch applies to tag v2.6.32 Comments appreciated. Regards, Eran. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
"We think"? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). Test1: Guest OS read 3GB file, which is bigger than memory. #called orig.(msec) patch(msec) ratio 108 1.1 0.1 7.6 102 1.0 0.1 6.8 132 1.6 0.2 7.1 Test2: Guest OS read/write 3GB file, which is bigger than memory. #called orig.(msec) patch(msec) ratio 239433 7.7 4.3 210029 7.1 4.1 283240 9.9 4.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: pcnet APROMWE bit location
On 02/14/2010 09:30 AM, Chris Kilgour wrote: I don't subscribe to the list, so please excuse any breach of etiquette. According to AMD document 21485D pp.141, APROMWE is bit 8 of BCR2. Please send this to the qemu mailing list, qemu-de...@nongnu.org, as this code is shared between qemu and qemu-kvm. Thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: use desc_ptr struct instead of kvm private descriptor_table
On 02/16/2010 10:51 AM, Gleb Natapov wrote: x86 arch defines desc_ptr for idt/gdt pointers, no need to define another structure in kvm code. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: add doc note about PIO/MMIO completion API
On 02/14/2010 10:17 AM, Avi Kivity wrote: --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -820,6 +820,11 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. +NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations +are complete (and guest state is consistent) only after userspace has +re-entered the kernel with KVM_RUN. The kernel side must first finish +uncomplete operations and then check for pending signals. + Well, s/must/will/, the document is written from userspace's point of view. Applied with this change, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
On Wed, Feb 17, 2010 at 10:03:58AM +0100, Jan Kiszka wrote: > > > > Also, as Avi mentioned it would be better to avoid this. Is it not > > possible to disallow migration while interrupt shadow is present? > > Which means disallowing user space exists while the shadow it set? Or > should we introduce some flag for user space that tells it "do not > migration now, resume the guest till next exit"? > I think disabling migration is a slippery slope. Guest may abuse it. May be it will be hard to do with interrupt shadow, but the mechanism will be used for other cases too. I remember there was an argument that we should not migrate while vcpu is in a nested guest mode. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
Zachary Amsden wrote: > On 02/16/2010 02:39 PM, Marcelo Tosatti wrote: >> On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote: >> >>> The interrupt shadow created by STI or MOV-SS-like operations is part of >>> the VCPU state and must be preserved across migration. Transfer it in >>> the spare padding field of kvm_vcpu_events.interrupt. > > STI and MOV-SS interrupt shadow are both treated differently by > hardware. Any attempt to unify them into a single field is wrong, > especially so in a hardware virtualization context, where they are > actually represented by different fields in the undocumented but > nevertheless extant format that can be inferred from the hardware > virtualization context used by specific vendors. Someone should ask AMD why they thought differently about this while designing SVM... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
On Tue, Feb 16, 2010 at 10:06:12PM -1000, Zachary Amsden wrote: > On 02/16/2010 02:39 PM, Marcelo Tosatti wrote: > >On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote: > >>The interrupt shadow created by STI or MOV-SS-like operations is part of > >>the VCPU state and must be preserved across migration. Transfer it in > >>the spare padding field of kvm_vcpu_events.interrupt. > > STI and MOV-SS interrupt shadow are both treated differently by > hardware. Any attempt to unify them into a single field is wrong, > especially so in a hardware virtualization context, where they are > actually represented by different fields in the undocumented but > nevertheless extant format that can be inferred from the hardware > virtualization context used by specific vendors. > The problem is SVM doesn't distinguish between those two. But we shouldn't design out interfaces based on SVM brokenness. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
Marcelo Tosatti wrote: > On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote: >> The interrupt shadow created by STI or MOV-SS-like operations is part of >> the VCPU state and must be preserved across migration. Transfer it in >> the spare padding field of kvm_vcpu_events.interrupt. >> >> As a side effect we now have to make vmx_set_interrupt_shadow robust >> against both shadow types being set. Give MOV SS a higher priority and >> skip STI in that case to avoid that VMX throws a fault on next entry. >> >> Signed-off-by: Jan Kiszka >> --- >> Documentation/kvm/api.txt | 11 ++- >> arch/x86/include/asm/kvm.h |3 ++- >> arch/x86/kvm/vmx.c |2 +- >> arch/x86/kvm/x86.c | 12 ++-- >> include/linux/kvm.h|1 + >> 5 files changed, 24 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt >> index c6416a3..8770b67 100644 >> --- a/Documentation/kvm/api.txt >> +++ b/Documentation/kvm/api.txt >> @@ -656,6 +656,7 @@ struct kvm_clock_data { >> 4.29 KVM_GET_VCPU_EVENTS >> >> Capability: KVM_CAP_VCPU_EVENTS >> +Extended by: KVM_CAP_INTR_SHADOW >> Architectures: x86 >> Type: vm ioctl >> Parameters: struct kvm_vcpu_event (out) >> @@ -676,7 +677,7 @@ struct kvm_vcpu_events { >> __u8 injected; >> __u8 nr; >> __u8 soft; >> -__u8 pad; >> +__u8 shadow; >> } interrupt; >> struct { >> __u8 injected; >> @@ -688,9 +689,13 @@ struct kvm_vcpu_events { >> __u32 flags; >> }; >> >> +KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that >> +interrupt.shadow contains a valid state. Otherwise, this field is undefined. >> + >> 4.30 KVM_SET_VCPU_EVENTS >> >> Capability: KVM_CAP_VCPU_EVENTS >> +Extended by: KVM_CAP_INTR_SHADOW >> Architectures: x86 >> Type: vm ioctl >> Parameters: struct kvm_vcpu_event (in) >> @@ -709,6 +714,10 @@ current in-kernel state. The bits are: >> KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel >> KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector >> >> +If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set >> in >> +the flags field to signal that interrupt.shadow contains a valid state and >> +shall be written into the VCPU. >> + >> >> 5. The kvm_run structure >> >> diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h >> index f46b79f..dc6cd24 100644 >> --- a/arch/x86/include/asm/kvm.h >> +++ b/arch/x86/include/asm/kvm.h >> @@ -257,6 +257,7 @@ struct kvm_reinject_control { >> /* When set in flags, include corresponding fields on KVM_SET_VCPU_EVENTS */ >> #define KVM_VCPUEVENT_VALID_NMI_PENDING 0x0001 >> #define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x0002 >> +#define KVM_VCPUEVENT_VALID_SHADOW 0x0004 >> >> /* for KVM_GET/SET_VCPU_EVENTS */ >> struct kvm_vcpu_events { >> @@ -271,7 +272,7 @@ struct kvm_vcpu_events { >> __u8 injected; >> __u8 nr; >> __u8 soft; >> -__u8 pad; >> +__u8 shadow; >> } interrupt; >> struct { >> __u8 injected; >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index f82b072..0fa74d0 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -854,7 +854,7 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu >> *vcpu, int mask) >> >> if (mask & X86_SHADOW_INT_MOV_SS) >> interruptibility |= GUEST_INTR_STATE_MOV_SS; >> -if (mask & X86_SHADOW_INT_STI) >> +else if (mask & X86_SHADOW_INT_STI) >> interruptibility |= GUEST_INTR_STATE_STI; >> >> if ((interruptibility != interruptibility_old)) >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 50d1d2a..60e6341 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -2132,6 +2132,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct >> kvm_vcpu *vcpu, >> vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft; >> events->interrupt.nr = vcpu->arch.interrupt.nr; >> events->interrupt.soft = 0; >> +events->interrupt.shadow = >> +!!kvm_x86_ops->get_interrupt_shadow(vcpu, >> +X86_SHADOW_INT_MOV_SS | X86_SHADOW_INT_STI); >> >> events->nmi.injected = vcpu->arch.nmi_injected; >> events->nmi.pending = vcpu->arch.nmi_pending; >> @@ -2140,7 +2143,8 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct >> kvm_vcpu *vcpu, >> events->sipi_vector = vcpu->arch.sipi_vector; >> >> events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING >> - | KVM_VCPUEVENT_VALID_SIPI_VECTOR); >> + | KVM_VCPUEVENT_VALID_SIPI_VECTOR >> + | KVM_VCPUEVENT_VALID_SHADOW); >> >> vcpu_put(vcpu); >> } >> @@ -2149,7 +2153,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct >> kvm_vcpu *vcpu, >>
Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask
On 02/16/2010 02:39 PM, Marcelo Tosatti wrote: On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote: The interrupt shadow created by STI or MOV-SS-like operations is part of the VCPU state and must be preserved across migration. Transfer it in the spare padding field of kvm_vcpu_events.interrupt. STI and MOV-SS interrupt shadow are both treated differently by hardware. Any attempt to unify them into a single field is wrong, especially so in a hardware virtualization context, where they are actually represented by different fields in the undocumented but nevertheless extant format that can be inferred from the hardware virtualization context used by specific vendors. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html