Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 08:16:45PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote:
> > Besides this, proper #DB forwarding to the guest was missing.
>  During NMI injection? How to reproduce?
> >>> Inject, e.g., an NMI over code with TF set. A bit harder is placing a
> >>> guest HW breakpoint at the spot the NMI handler returns to.
> >>>
> >> Will try to reproduce.
> >>
> > How can I make gdb to run debugged process with TF set? Is this patch
> > fixes it:
> > 
> > 
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 52f78dd..b85b200 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -109,6 +109,7 @@ struct vcpu_svm {
> > struct nested_state nested;
> >  
> > bool nmi_singlestep;
> > +   bool nmi_singlestep_tf;
> >  };
> >  
> >  /* enable NPT for AMD64 and X86 with PAE */
> > @@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm)
> >  
> > if (svm->nmi_singlestep) {
> > svm->nmi_singlestep = false;
> > -   if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> > +   if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> > svm->vmcb->save.rflags &=
> > ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +   if (svm->nmi_singlestep_tf) {
> > +   svm->vmcb->save.rflags |= X86_EFLAGS_TF;
> > +   kvm_queue_exception(&svm->vcpu, DB_VECTOR);
> > +   }
> > +   }
> > update_db_intercept(&svm->vcpu);
> > }
> >  
> > @@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> >possible problem (IRET or exception injection or interrupt
> >shadow) */
> > svm->nmi_singlestep = true;
> > +   svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF);
> > svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > update_db_intercept(vcpu);
> >  }
> 
> That's closer. However, I've a version here that restores TF&RF only if
> you did not execute an IRET but stepped over the shadow (which is still
> not correct either, e.g. when stepping popf). I will break up my patch
> into parts that fix the issues separately so that we can decide what to
> merge.
> 
I am not sure what do you mean here. Why should we restore RF? It is
cleared after each instruction execution and popf is not special in this
regards and SDM explicitly says so.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Avi Kivity

On 02/17/2010 08:07 PM, Alexander Graf wrote:

On 17.02.2010, at 17:34, Avi Kivity wrote:

   

On 02/17/2010 06:23 PM, Alexander Graf wrote:
 

On 17.02.2010, at 17:03, Avi Kivity wrote:


   

On 02/17/2010 04:56 PM, Alexander Graf wrote:

 

So I changed to code according to your input by making all FPU calls explicit, 
getting rid of all binary patching.

On the PowerStation again I'm running this code (simplified to the important 
instructions) using kvmctl:

 li  r2, 0x1234
 std r2, 0(r1)
 lfd f3, 0(r1)
 lfd f4, 0(r1)
do_mul:
 fmulf0, f3, f4
 b   do_mul


With the following kvm_stat output:

  dec   2236  53
  exits 60797802 1171403
  ext_intr   379   4
  halt_wakeup  0   0
  inst_emu  60795247 1171344
  ld60795132 1171348

So I'm getting 1171403 fmul operations per second. And that's even with 
non-optimized instruction fetching. Not bad.


   

It's a large number, but won't real hardware be three orders of magnitude 
faster?

 

Yes, it would. But we don't have to care. The only thing we need to worry about is 
being fast enough to emulate enough FPU instructions actually used in normal 
guests so the guest runs in full speed. And 1000k>   250k, so we can do that 
apparently, leaving some spare cycles for non-fpu instructions.

   

I'm sure 250k isn't representative of a floating point intensive program (but 
maybe there aren't fpu intensive applications on that cpu).
 

Now you made me check how fast the real hw is. I get about 65,000,000 fmul 
operations per second on it.

   


That's surprisingly low.


So we're 65x slower on a PowerStation. And that's for a tight FPU only loop. 
I'm still not convinced we're running into major problems.
   


Well, it's up to you.  I just hope we don't end up underperforming due 
to this.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 08:17:28PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote:
>  On 02/17/2010 12:43 PM, Gleb Natapov wrote:
> >> And, again: This is an _existing_ user space ABI. We could only provide
> >> an alternative, but we have to maintain what is there at least for some
> >> longer grace period.
> >>
> > But it was always broken for SVM and was broken for VMX for a year and
> > nobody noticed, so may be instead of reintroducing old interface we 
> > should
> > do it right this time?
>  We need to fix the existing interface first, and then think long and
>  hard if we want yet another interface, since we're likely to screw
>  it up as well.
> 
>  The more interfaces we introduce, the harder maintenance becomes.
> 
> >>> We are in a sad state if we cannot improve interface. The current one
> >>> outsource part of CPU functionality into userspace. This should be a big
> >>> no-no.
> >> I still disagree on this. Moving the decision logic to user space
> >> prevented to re-implement a gdbstub in kernel space. I oversaw that
> >> re-injecting #BP over older SVM was broken, but it is now fixed for all
> >> vendors. So moving it back to kernel has actually no long-term reason.
> >>
> > There were patches to implement gdbstub in kernel space! And not so long
> > time ago :)
> 
> Yes, a good reason to implement yet another one. :)
> 
We can you unify them later :). But seriously I am not proposing
anything like gdbstub in kernel, just track inserted breakpoints in
kernel.

> > But I want to move only a tiny bit of logic into the kernel space.
> > And #BP reinjection brokenness is a different issue. It should be fixed
> > anyway no matter where decision about reinfection happens.
> > 
> > If maintainers think that we should not have improved interface and we
> > should support reinjection of #DB from userspace then this patch should
> > be applied. I don't have other objections to it. But I, at least, would
> > prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB)
> > and not the new one. The old one makes it explicit what we are doing,
> > the new one allows injection of any event and should be used only during
> > migration or CPU reset. It would be event good idea to fail setting
> > events if CPU is running.
> 
> Event injection is well supported by both vendors (except for those
> software-triggered events). Just because QEMU mostly uses it for reset
> and migration doesn't mean we have to restrict other users to only those
> cases as well.
Yes we have too! Qemu implements device model and the way devices
communicates with CPU is well defined and called interrupts, so we have
a way to inject interrupts (KVM_IRQ_LINE/KVM_INTERRUPT). Input is
validated and passed into VCPU in the right time, we do not inject
interrupts directly into VCPU using event injection. Exceptions, on the
other hand, is completely internal CPU thing. QEMU shouldn't be a part
of CPU emulation.

> 
> And as we have true event injection now, and as it naturally conflicts
Now we have a bug that should be fixed ASAP. We should allow setting of
some VCPU state only when VCPU is stopped and only for migration/reset
purposes.

> with the special KVM_SET_GUEST_DEBUG interface, I have a patch that
> consolidates this usage for QEMU: use the old interface of
> SET_GUEST_DEBUG for pre-2.6.33 kernels, switch to SET_VCPU_EVENTS on
> recent ones.
Don't do that please, this will encourage use of SET_VCPU_EVENTS for
something it shouldn't be used for.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread OHMURA Kei

"We think"? I mean - yes, I think so too. But have you actually measured it?
How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?

It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to 
fix first.


OK.  I will prepare a test code with sample data.  Since I found a ppc machine 
around, I will run the code and post the results of
x86 and ppc.


By the way, the following data is a result of x86 measured in QEMU/KVM.  
This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).


That does indeed look promising!

Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
doesn't affect performance for big endian badly.



I measured runtime of the test code with sample data.  My test environment 
and results are described below.


x86 Test Environment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB

ppc Test Environment:
CPU: 2x Dual Core PPC970MP
Mem size: 2GB

The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
QEMU to my test program.


Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory. 
  orig.(msec)patch(msec)ratio
x860.30.16.4 
ppc7.92.73.0 

Test2: Guest OS read/write 3GB file, which is bigger than memory. 
  orig.(msec)patch(msec)ratio
x8612.0   3.23.7 
ppc251.1  1232.0 



I also measured the runtime of bswap itself on ppc, and I found it was only 
just 0.3% ~ 0.7 % of the runtime described above. 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 3/4] qemu: kvm: consume internal signal with sigtimedwait

2010-02-17 Thread Marcelo Tosatti
Change the way the internal qemu signal, used for communication between 
iothread and vcpus, is handled.

Block and consume it with sigtimedwait on the outer vcpu loop, which
allows more precise timing control.

Change from standard signal (SIGUSR1) to real-time one, so multiple
signals are not collapsed.

Set the signal number on KVM's in-kernel allowed sigmask.

Signed-off-by: Marcelo Tosatti 


Index: qemu-kvm/vl.c
===
--- qemu-kvm.orig/vl.c
+++ qemu-kvm/vl.c
@@ -271,6 +271,12 @@ uint8_t qemu_uuid[16];
 static QEMUBootSetHandler *boot_set_handler;
 static void *boot_set_opaque;
 
+#ifdef SIGRTMIN
+#define SIG_IPI (SIGRTMIN+4)
+#else
+#define SIG_IPI SIGUSR1
+#endif
+
 static int default_serial = 1;
 static int default_parallel = 1;
 static int default_virtcon = 1;
@@ -3379,7 +3385,8 @@ static QemuCond qemu_cpu_cond;
 static QemuCond qemu_system_cond;
 static QemuCond qemu_pause_cond;
 
-static void block_io_signals(void);
+static void tcg_block_io_signals(void);
+static void kvm_block_io_signals(CPUState *env);
 static void unblock_io_signals(void);
 static int tcg_has_work(void);
 static int cpu_has_work(CPUState *env);
@@ -3431,11 +3438,36 @@ static void qemu_wait_io_event(CPUState 
 qemu_wait_io_event_common(env);
 }
 
+static void qemu_kvm_eat_signal(CPUState *env, int timeout)
+{
+struct timespec ts;
+int r, e;
+siginfo_t siginfo;
+sigset_t waitset;
+
+ts.tv_sec = timeout / 1000;
+ts.tv_nsec = (timeout % 1000) * 100;
+
+sigemptyset(&waitset);
+sigaddset(&waitset, SIG_IPI);
+
+qemu_mutex_unlock(&qemu_global_mutex);
+r = sigtimedwait(&waitset, &siginfo, &ts);
+e = errno;
+qemu_mutex_lock(&qemu_global_mutex);
+
+if (r == -1 && !(e == EAGAIN || e == EINTR)) {
+fprintf(stderr, "sigtimedwait: %s\n", strerror(e));
+exit(1);
+}
+}
+
 static void qemu_kvm_wait_io_event(CPUState *env)
 {
 while (!cpu_has_work(env))
 qemu_cond_timedwait(env->halt_cond, &qemu_global_mutex, 1000);
 
+qemu_kvm_eat_signal(env, 0);
 qemu_wait_io_event_common(env);
 }
 
@@ -3445,11 +3477,12 @@ static void *kvm_cpu_thread_fn(void *arg
 {
 CPUState *env = arg;
 
-block_io_signals();
 qemu_thread_self(env->thread);
 if (kvm_enabled())
 kvm_init_vcpu(env);
 
+kvm_block_io_signals(env);
+
 /* signal CPU creation */
 qemu_mutex_lock(&qemu_global_mutex);
 env->created = 1;
@@ -3474,7 +3507,7 @@ static void *tcg_cpu_thread_fn(void *arg
 {
 CPUState *env = arg;
 
-block_io_signals();
+tcg_block_io_signals();
 qemu_thread_self(env->thread);
 
 /* signal CPU creation */
@@ -3500,7 +3533,7 @@ void qemu_cpu_kick(void *_env)
 CPUState *env = _env;
 qemu_cond_broadcast(env->halt_cond);
 if (kvm_enabled())
-qemu_thread_signal(env->thread, SIGUSR1);
+qemu_thread_signal(env->thread, SIG_IPI);
 }
 
 int qemu_cpu_self(void *_env)
@@ -3519,7 +3552,7 @@ static void cpu_signal(int sig)
 cpu_exit(cpu_single_env);
 }
 
-static void block_io_signals(void)
+static void tcg_block_io_signals(void)
 {
 sigset_t set;
 struct sigaction sigact;
@@ -3532,12 +3565,44 @@ static void block_io_signals(void)
 pthread_sigmask(SIG_BLOCK, &set, NULL);
 
 sigemptyset(&set);
-sigaddset(&set, SIGUSR1);
+sigaddset(&set, SIG_IPI);
 pthread_sigmask(SIG_UNBLOCK, &set, NULL);
 
 memset(&sigact, 0, sizeof(sigact));
 sigact.sa_handler = cpu_signal;
-sigaction(SIGUSR1, &sigact, NULL);
+sigaction(SIG_IPI, &sigact, NULL);
+}
+
+static void dummy_signal(int sig)
+{
+}
+
+static void kvm_block_io_signals(CPUState *env)
+{
+int r;
+sigset_t set;
+struct sigaction sigact;
+
+sigemptyset(&set);
+sigaddset(&set, SIGUSR2);
+sigaddset(&set, SIGIO);
+sigaddset(&set, SIGALRM);
+sigaddset(&set, SIGCHLD);
+sigaddset(&set, SIG_IPI);
+pthread_sigmask(SIG_BLOCK, &set, NULL);
+
+pthread_sigmask(SIG_BLOCK, NULL, &set);
+sigdelset(&set, SIG_IPI);
+
+memset(&sigact, 0, sizeof(sigact));
+sigact.sa_handler = dummy_signal;
+sigaction(SIG_IPI, &sigact, NULL);
+
+r = kvm_set_signal_mask(env, &set);
+if (r) {
+fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(r));
+exit(1);
+}
 }
 
 static void unblock_io_signals(void)
@@ -3551,7 +3616,7 @@ static void unblock_io_signals(void)
 pthread_sigmask(SIG_UNBLOCK, &set, NULL);
 
 sigemptyset(&set);
-sigaddset(&set, SIGUSR1);
+sigaddset(&set, SIG_IPI);
 pthread_sigmask(SIG_BLOCK, &set, NULL);
 }
 
@@ -3560,7 +3625,7 @@ static void qemu_signal_lock(unsigned in
 qemu_mutex_lock(&qemu_fair_mutex);
 
 while (qemu_mutex_trylock(&qemu_global_mutex)) {
-qemu_thread_signal(tcg_cpu_thread, SIGUSR1);
+qemu_thread_signal(tcg_cpu_thread, SIG_IPI);
 if (!qemu_mutex_timedlock(&qemu_global_mutex, msecs))
 break;
 }
@@ -3601,7 +3

[patch uq/master 4/4] qemu: kvm: remove pre-entry exit_request check with iothread enabled

2010-02-17 Thread Marcelo Tosatti
With SIG_IPI blocked vcpu loop exit notification happens via -EAGAIN
from KVM_RUN.

Signed-off-by: Marcelo Tosatti 

Index: qemu/kvm-all.c
===
--- qemu.orig/kvm-all.c
+++ qemu/kvm-all.c
@@ -753,11 +753,13 @@ int kvm_cpu_exec(CPUState *env)
 dprintf("kvm_cpu_exec()\n");
 
 do {
+#ifndef CONFIG_IOTHREAD
 if (env->exit_request) {
 dprintf("interrupt exit requested\n");
 ret = 0;
 break;
 }
+#endif
 
 if (env->kvm_vcpu_dirty) {
 kvm_arch_put_registers(env);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 2/4] qemu: kvm specific wait_io_event

2010-02-17 Thread Marcelo Tosatti
In KVM mode the global mutex is released when vcpus are executing,
which means acquiring the fairness mutex is not required.

Also for KVM there is one thread per vcpu, so tcg_has_work is meaningless.

Add a new qemu_wait_io_event_common function to hold common code
between TCG/KVM.

Signed-off-by: Marcelo Tosatti 

Index: qemu/vl.c
===
--- qemu.orig/vl.c
+++ qemu/vl.c
@@ -3382,6 +3382,7 @@ static QemuCond qemu_pause_cond;
 static void block_io_signals(void);
 static void unblock_io_signals(void);
 static int tcg_has_work(void);
+static int cpu_has_work(CPUState *env);
 
 static int qemu_init_main_loop(void)
 {
@@ -3402,6 +3403,15 @@ static int qemu_init_main_loop(void)
 return 0;
 }
 
+static void qemu_wait_io_event_common(CPUState *env)
+{
+if (env->stop) {
+env->stop = 0;
+env->stopped = 1;
+qemu_cond_signal(&qemu_pause_cond);
+}
+}
+
 static void qemu_wait_io_event(CPUState *env)
 {
 while (!tcg_has_work())
@@ -3418,11 +3428,15 @@ static void qemu_wait_io_event(CPUState 
 qemu_mutex_unlock(&qemu_fair_mutex);
 
 qemu_mutex_lock(&qemu_global_mutex);
-if (env->stop) {
-env->stop = 0;
-env->stopped = 1;
-qemu_cond_signal(&qemu_pause_cond);
-}
+qemu_wait_io_event_common(env);
+}
+
+static void qemu_kvm_wait_io_event(CPUState *env)
+{
+while (!cpu_has_work(env))
+qemu_cond_timedwait(env->halt_cond, &qemu_global_mutex, 1000);
+
+qemu_wait_io_event_common(env);
 }
 
 static int qemu_cpu_exec(CPUState *env);
@@ -3448,7 +3462,7 @@ static void *kvm_cpu_thread_fn(void *arg
 while (1) {
 if (cpu_can_run(env))
 qemu_cpu_exec(env);
-qemu_wait_io_event(env);
+qemu_kvm_wait_io_event(env);
 }
 
 return NULL;


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 0/4] uq/master: iothread consume signals via sigtimedwait and cleanups

2010-02-17 Thread Marcelo Tosatti
See individual patches for details.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 1/4] qemu: block SIGCHLD in vcpu thread(s)

2010-02-17 Thread Marcelo Tosatti
Otherwise a vcpu thread can run the sigchild handler causing
waitpid() from iothread to fail.

Signed-off-by: Marcelo Tosatti 

Index: qemu/vl.c
===
--- qemu.orig/vl.c
+++ qemu/vl.c
@@ -3514,6 +3514,7 @@ static void block_io_signals(void)
 sigaddset(&set, SIGUSR2);
 sigaddset(&set, SIGIO);
 sigaddset(&set, SIGALRM);
+sigaddset(&set, SIGCHLD);
 pthread_sigmask(SIG_BLOCK, &set, NULL);
 
 sigemptyset(&set);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache

2010-02-17 Thread Marcelo Tosatti
On Wed, Feb 17, 2010 at 10:26:56PM +0100, Jes Sorensen wrote:
> On 02/17/10 22:08, Marcelo Tosatti wrote:
> >The KVM_CAP_CR3_CACHE reference can be removed since the feature
> >was never implemented/included.
> 
> Ok that works too, would you rather a patch to remove all references
> to it, or leave it in in case someone decides to pick it up later?

I'd say remove all references, its obsolete due to EPT/NPT.

> 
> Cheers,
> Jes
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache

2010-02-17 Thread Jes Sorensen

On 02/17/10 22:08, Marcelo Tosatti wrote:

The KVM_CAP_CR3_CACHE reference can be removed since the feature
was never implemented/included.


Ok that works too, would you rather a patch to remove all references
to it, or leave it in in case someone decides to pick it up later?

Cheers,
Jes


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache

2010-02-17 Thread Marcelo Tosatti
On Wed, Feb 17, 2010 at 06:44:12PM +0100, Jes Sorensen wrote:
> Hi,
> 
> Comparing the features tested for in get_para_features() with the
> kvm_feature_names in target-i386/helper.c, I noticed that we didn't
> list the cr3_cache feature in the real name table.
> 
> I presume this is unintentional so here's a patch to correct it.
> 
> Cheers,
> Jes
> 

The KVM_CAP_CR3_CACHE reference can be removed since the feature 
was never implemented/included.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote:
 On 02/17/2010 12:43 PM, Gleb Natapov wrote:
>> And, again: This is an _existing_ user space ABI. We could only provide
>> an alternative, but we have to maintain what is there at least for some
>> longer grace period.
>>
> But it was always broken for SVM and was broken for VMX for a year and
> nobody noticed, so may be instead of reintroducing old interface we should
> do it right this time?
 We need to fix the existing interface first, and then think long and
 hard if we want yet another interface, since we're likely to screw
 it up as well.

 The more interfaces we introduce, the harder maintenance becomes.

>>> We are in a sad state if we cannot improve interface. The current one
>>> outsource part of CPU functionality into userspace. This should be a big
>>> no-no.
>> I still disagree on this. Moving the decision logic to user space
>> prevented to re-implement a gdbstub in kernel space. I oversaw that
>> re-injecting #BP over older SVM was broken, but it is now fixed for all
>> vendors. So moving it back to kernel has actually no long-term reason.
>>
> There were patches to implement gdbstub in kernel space! And not so long
> time ago :)

Yes, a good reason to implement yet another one. :)

> But I want to move only a tiny bit of logic into the kernel space.
> And #BP reinjection brokenness is a different issue. It should be fixed
> anyway no matter where decision about reinfection happens.
> 
> If maintainers think that we should not have improved interface and we
> should support reinjection of #DB from userspace then this patch should
> be applied. I don't have other objections to it. But I, at least, would
> prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB)
> and not the new one. The old one makes it explicit what we are doing,
> the new one allows injection of any event and should be used only during
> migration or CPU reset. It would be event good idea to fail setting
> events if CPU is running.

Event injection is well supported by both vendors (except for those
software-triggered events). Just because QEMU mostly uses it for reset
and migration doesn't mean we have to restrict other users to only those
cases as well.

And as we have true event injection now, and as it naturally conflicts
with the special KVM_SET_GUEST_DEBUG interface, I have a patch that
consolidates this usage for QEMU: use the old interface of
SET_GUEST_DEBUG for pre-2.6.33 kernels, switch to SET_VCPU_EVENTS on
recent ones.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote:
> Besides this, proper #DB forwarding to the guest was missing.
 During NMI injection? How to reproduce?
>>> Inject, e.g., an NMI over code with TF set. A bit harder is placing a
>>> guest HW breakpoint at the spot the NMI handler returns to.
>>>
>> Will try to reproduce.
>>
> How can I make gdb to run debugged process with TF set? Is this patch
> fixes it:
> 
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 52f78dd..b85b200 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -109,6 +109,7 @@ struct vcpu_svm {
>   struct nested_state nested;
>  
>   bool nmi_singlestep;
> + bool nmi_singlestep_tf;
>  };
>  
>  /* enable NPT for AMD64 and X86 with PAE */
> @@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm)
>  
>   if (svm->nmi_singlestep) {
>   svm->nmi_singlestep = false;
> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
>   svm->vmcb->save.rflags &=
>   ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> + if (svm->nmi_singlestep_tf) {
> + svm->vmcb->save.rflags |= X86_EFLAGS_TF;
> + kvm_queue_exception(&svm->vcpu, DB_VECTOR);
> + }
> + }
>   update_db_intercept(&svm->vcpu);
>   }
>  
> @@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
>  possible problem (IRET or exception injection or interrupt
>  shadow) */
>   svm->nmi_singlestep = true;
> + svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF);
>   svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>   update_db_intercept(vcpu);
>  }

That's closer. However, I've a version here that restores TF&RF only if
you did not execute an IRET but stepped over the shadow (which is still
not correct either, e.g. when stepping popf). I will break up my patch
into parts that fix the issues separately so that we can decide what to
merge.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2915201 ] Nested kvm (SVM)

2010-02-17 Thread SourceForge.net
Bugs item #2915201, was opened at 2009-12-15 17:35
Message generated for change (Comment added) made by alex_williamson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: v1.0 (example)
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: jbl001 (jbl001)
Assigned to: Nobody/Anonymous (nobody)
Summary: Nested kvm (SVM)

Initial Comment:
I have seen a couple messages where people have stated that nested SVM works 
properly, but I cannot replicate it. I first attempted to use the following 
configurations:

Hardware:
desktop system: Gigabyte MA785GM board with Athlon X2 4400
server system: Tyan h2000M board with Opteron 2354

Software:
Host OS: Ubuntu 9.10 with production kernel
Host KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried 
git-tip
Guest VMM Host OS: Ubuntu 9.10
Guest VMM KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also 
tried git-tip
True guest: tried Slackware 10.2, 64-bit Ubuntu 8.10, 64-bit Ubuntu 9.10, and 
32-bit XP

All configurations result in the true guest not booting, but the Slackware 10.2 
true guest is the easiest to analyze. It hangs at various places during boot 
with the most common being the "calibrating delay loop", "testing HLT 
instruction", mounting the hard disks, or starting the INIT processes. It seems 
it is losing interrupts. 

I also tried an older host (64-bit Ubuntu 8.10) and guest VMM (64-bit Ubuntu 
8.10) with the KVM-88 release. With this configuration, the Slackware 10.2 true 
guest will usually boot, but will then get a constant flow of "hda: lost 
interrupt" and "hda: dma_timer_expiry: dma status == 0x24". Again, it seems to 
be losing interrupts.

I have ensured that the nested=1 is passed to the module and that 
enable-nesting is passed to the qemu. It obviously works for some time and I've 
tried printing out exit reasons in the handle_exit() function of the guest VMM, 
but it consistently fails in some form or another across all the hardware and 
software I have to try it on.


--

Comment By: Alex Williamson (alex_williamson)
Date: 2010-02-17 12:02

Message:
Try reverting cd3ff653ae0b45bac7a19208e9c75034fcacc85f from kvm-kmod
(kvm-svm).  I ran into trouble with nested kvm about a month ago and
bisected it back to this change.  I alerted Joerg, but he might need
another poke if this fixes nesting for you too.

--

Comment By: jbl001 (jbl001)
Date: 2010-02-17 10:47

Message:
I tried this again with qemu-0.12.2 and kvm-kmod-2.6.32.3 while passing
no-kvmclock to both the host and guest VMM kernels. It did not help the
problem of lost interrupts in the true guest, however.


--

Comment By: Brian Jackson (iggy_cav)
Date: 2010-02-09 12:07

Message:
Can you try disabling kvmclock in both guests?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-kmod: Build fix for #define KVM_DEBUG

2010-02-17 Thread Jan Kiszka
Tsuyoshi Ozawa wrote:
>>> Copy Jan - he maintains kvm-kmod, and probably didn't see your patch.
>>>
>> Yes, I did. Proper subject prefixing can help a lot here...
>>
> 
> I'm sorry for I forgot to prefix "kvm-kmod" and thank you for telling me this.
> I mind this from now.
> 
>> Could you please repost, avoiding that the patch is line-wrapped and
>> giving it an up-to-date changelog?
> 
> Yes, this new patch for the newest commit passed checkpatch.

Thanks, merged.
[And as your original changelog was even better, I included it as well.]

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-kmod: Build fix for #define KVM_DEBUG

2010-02-17 Thread Tsuyoshi Ozawa
>> Copy Jan - he maintains kvm-kmod, and probably didn't see your patch.
>>
>
> Yes, I did. Proper subject prefixing can help a lot here...
>

I'm sorry for I forgot to prefix "kvm-kmod" and thank you for telling me this.
I mind this from now.

> Could you please repost, avoiding that the patch is line-wrapped and
> giving it an up-to-date changelog?

Yes, this new patch for the newest commit passed checkpatch.


0001-Build-fix-for-define-KVM_DEBUG.patch
Description: Binary data


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 17:34, Avi Kivity wrote:

> On 02/17/2010 06:23 PM, Alexander Graf wrote:
>> On 17.02.2010, at 17:03, Avi Kivity wrote:
>> 
>>   
>>> On 02/17/2010 04:56 PM, Alexander Graf wrote:
>>> 
 So I changed to code according to your input by making all FPU calls 
 explicit, getting rid of all binary patching.
 
 On the PowerStation again I'm running this code (simplified to the 
 important instructions) using kvmctl:
 
 li  r2, 0x1234
 std r2, 0(r1)
 lfd f3, 0(r1)
 lfd f4, 0(r1)
 do_mul:
 fmulf0, f3, f4
 b   do_mul
 
 
 With the following kvm_stat output:
 
  dec   2236  53
  exits 60797802 1171403
  ext_intr   379   4
  halt_wakeup  0   0
  inst_emu  60795247 1171344
  ld60795132 1171348
 
 So I'm getting 1171403 fmul operations per second. And that's even with 
 non-optimized instruction fetching. Not bad.
 
   
>>> It's a large number, but won't real hardware be three orders of magnitude 
>>> faster?
>>> 
>> Yes, it would. But we don't have to care. The only thing we need to worry 
>> about is being fast enough to emulate enough FPU instructions actually used 
>> in normal guests so the guest runs in full speed. And 1000k>  250k, so we 
>> can do that apparently, leaving some spare cycles for non-fpu instructions.
>>   
> 
> I'm sure 250k isn't representative of a floating point intensive program (but 
> maybe there aren't fpu intensive applications on that cpu).

Now you made me check how fast the real hw is. I get about 65,000,000 fmul 
operations per second on it.

So we're 65x slower on a PowerStation. And that's for a tight FPU only loop. 
I'm still not convinced we're running into major problems.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Balloon support for device assignment

2010-02-17 Thread Muli Ben-Yehuda
On Wed, Feb 17, 2010 at 12:27:09PM +0200, Avi Kivity wrote:
> On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote:
> >From: Eran Borovik
> >
> >This patch adds modifications to allow correct
> >balloon operation when a virtual guest uses a direct assigned device.
> >The modifications include a new interface between qemu and kvm to allow
> >mapping and unmapping the pages from the IOMMU as well as pinning and 
> >unpinning as needed.
> 
> The plan for iommu support is to push it into uio.  Instead of kvm
> managing the iommu directly, I'd like qemu to open a uio device and
> set up an iommu mapping there, which will just happen to match the
> kvm memory slots.  Similarly, interrupts will be forwarded using
> irqfds.  This will allow using the iommu without kvm, and reduce the
> amount of special purpose kvm code.
> 
> These patches make the transition more difficult which worries me.

That's a fair point, but they also address a real short-coming of the
current device assignment code, which pins all of the guest's memory
unconditionally. Unless the uio effort is in progress and expected to
complete shortly, I would think the benefit of these simple patches
trumps the cost.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2915201 ] Nested kvm (SVM)

2010-02-17 Thread SourceForge.net
Bugs item #2915201, was opened at 2009-12-15 16:35
Message generated for change (Comment added) made by jbl001
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: v1.0 (example)
>Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: jbl001 (jbl001)
Assigned to: Nobody/Anonymous (nobody)
Summary: Nested kvm (SVM)

Initial Comment:
I have seen a couple messages where people have stated that nested SVM works 
properly, but I cannot replicate it. I first attempted to use the following 
configurations:

Hardware:
desktop system: Gigabyte MA785GM board with Athlon X2 4400
server system: Tyan h2000M board with Opteron 2354

Software:
Host OS: Ubuntu 9.10 with production kernel
Host KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also tried 
git-tip
Guest VMM Host OS: Ubuntu 9.10
Guest VMM KVM: tried kmod 2.6.32 with qemu 0.11.1 and qemu 0.12.0rc2, also 
tried git-tip
True guest: tried Slackware 10.2, 64-bit Ubuntu 8.10, 64-bit Ubuntu 9.10, and 
32-bit XP

All configurations result in the true guest not booting, but the Slackware 10.2 
true guest is the easiest to analyze. It hangs at various places during boot 
with the most common being the "calibrating delay loop", "testing HLT 
instruction", mounting the hard disks, or starting the INIT processes. It seems 
it is losing interrupts. 

I also tried an older host (64-bit Ubuntu 8.10) and guest VMM (64-bit Ubuntu 
8.10) with the KVM-88 release. With this configuration, the Slackware 10.2 true 
guest will usually boot, but will then get a constant flow of "hda: lost 
interrupt" and "hda: dma_timer_expiry: dma status == 0x24". Again, it seems to 
be losing interrupts.

I have ensured that the nested=1 is passed to the module and that 
enable-nesting is passed to the qemu. It obviously works for some time and I've 
tried printing out exit reasons in the handle_exit() function of the guest VMM, 
but it consistently fails in some form or another across all the hardware and 
software I have to try it on.


--

>Comment By: jbl001 (jbl001)
Date: 2010-02-17 09:47

Message:
I tried this again with qemu-0.12.2 and kvm-kmod-2.6.32.3 while passing
no-kvmclock to both the host and guest VMM kernels. It did not help the
problem of lost interrupts in the true guest, however.


--

Comment By: Brian Jackson (iggy_cav)
Date: 2010-02-09 11:07

Message:
Can you try disabling kvmclock in both guests?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2915201&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm Set kvm_features name for kvm_cr3_cache

2010-02-17 Thread Jes Sorensen

Hi,

Comparing the features tested for in get_para_features() with the
kvm_feature_names in target-i386/helper.c, I noticed that we didn't
list the cr3_cache feature in the real name table.

I presume this is unintentional so here's a patch to correct it.

Cheers,
Jes

commit 39cb576d15a6ffbbcade3c4f282c2f3e76e3098a
Author: Jes Sorensen 
Date:   Wed Feb 17 18:03:37 2010 +0100

Add kvm_cr3_cache to the list of KVM features.

This is to match the features automatically added by
target-i386/kvm.c:get_para_features()

Signed-off-by: Jes Sorensen 

diff --git a/target-i386/helper.c b/target-i386/helper.c
index f9d63f6..2cd3dca 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -61,7 +61,8 @@ static const char *ext3_feature_name[] = {
 };
 
 static const char *kvm_feature_name[] = {
-"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL,
+"kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvm_cr3_cache",
+NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,


buildbot failure in qemu-kvm on default_x86_64_out_of_tree

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of default_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/218

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_i386_out_of_tree

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/216

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_i386_debian_5_0

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/279

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Avi Kivity

On 02/17/2010 06:23 PM, Alexander Graf wrote:

On 17.02.2010, at 17:03, Avi Kivity wrote:

   

On 02/17/2010 04:56 PM, Alexander Graf wrote:
 

So I changed to code according to your input by making all FPU calls explicit, 
getting rid of all binary patching.

On the PowerStation again I'm running this code (simplified to the important 
instructions) using kvmctl:

 li  r2, 0x1234
 std r2, 0(r1)
 lfd f3, 0(r1)
 lfd f4, 0(r1)
do_mul:
 fmulf0, f3, f4
 b   do_mul


With the following kvm_stat output:

  dec   2236  53
  exits 60797802 1171403
  ext_intr   379   4
  halt_wakeup  0   0
  inst_emu  60795247 1171344
  ld60795132 1171348

So I'm getting 1171403 fmul operations per second. And that's even with 
non-optimized instruction fetching. Not bad.

   

It's a large number, but won't real hardware be three orders of magnitude 
faster?
 

Yes, it would. But we don't have to care. The only thing we need to worry about is 
being fast enough to emulate enough FPU instructions actually used in normal 
guests so the guest runs in full speed. And 1000k>  250k, so we can do that 
apparently, leaving some spare cycles for non-fpu instructions.
   


I'm sure 250k isn't representative of a floating point intensive program 
(but maybe there aren't fpu intensive applications on that cpu).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/268

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/216

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/267

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on default_x86_64_debian_5_0

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/277

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree

2010-02-17 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/216

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Amit Shah ,Anthony Liguori 
,Artyom Tarasenko ,Aurelien 
Jarno ,Avi Kivity ,Blue Swirl 
,Brian Jackson ,Christian Krause 
,Christoph Hellwig ,David S. Ahern 
,Dirk Ullrich ,Edgar E. 
Iglesias ,Evgeniy Dushistov ,Isaku 
Yamahata ,Jan Kiszka ,Jim 
Meyering ,Kevin Wolf ,Liran Schour 
,Loïc Minier ,Luiz Capitulino 
,Marcelo Tosatti ,Markus 
Armbruster ,Michael S. Tsirkin ,OHMURA Kei 
,Paolo Bonzini ,Richard 
Henderson ,Riku Voipio ,Roy 
 Tam ,Scott Tsai ,Sheng Yang 
,Stefan Weil ,TeLeMan 
,Tom Lendacky ,h...@lst.de 
,malc 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 17:03, Avi Kivity wrote:

> On 02/17/2010 04:56 PM, Alexander Graf wrote:
>> 
>> So I changed to code according to your input by making all FPU calls 
>> explicit, getting rid of all binary patching.
>> 
>> On the PowerStation again I'm running this code (simplified to the important 
>> instructions) using kvmctl:
>> 
>> li  r2, 0x1234
>> std r2, 0(r1)
>> lfd f3, 0(r1)
>> lfd f4, 0(r1)
>> do_mul:
>> fmulf0, f3, f4
>> b   do_mul
>> 
>> 
>> With the following kvm_stat output:
>> 
>>  dec   2236  53
>>  exits 60797802 1171403
>>  ext_intr   379   4
>>  halt_wakeup  0   0
>>  inst_emu  60795247 1171344
>>  ld60795132 1171348
>> 
>> So I'm getting 1171403 fmul operations per second. And that's even with 
>> non-optimized instruction fetching. Not bad.
>>   
> 
> It's a large number, but won't real hardware be three orders of magnitude 
> faster?

Yes, it would. But we don't have to care. The only thing we need to worry about 
is being fast enough to emulate enough FPU instructions actually used in normal 
guests so the guest runs in full speed. And 1000k > 250k, so we can do that 
apparently, leaving some spare cycles for non-fpu instructions.

The kernel on my PS3 is still compiling. Let's see how fast I get there.

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 04:13:11PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote:
>  Jan Kiszka wrote:
> > Gleb Natapov wrote:
> >> Lets check if SVM works. I can do that if you tell me how.
> > - Fire up some Linux guest with gdb installed
> > - Attach gdb to gdbstub of the VM
> > - Set a soft breakpoint in guest kernel, ideally where it does not
> >   immediately trigger, e.g. on sys_reboot (use grep sys_reboot
> >   /proc/kallsyms if you don't have symbols for the guest kernel)
> > - Start gdb /bin/true in the guest
> > - run
> >
> > As gdb sets some automatic breakpoints, this already exercises the
> > reinjection of #BP.
>  I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
>  and it just worked.
> 
> >>> I tested it on processor without NextRIP and your test case works there 
> >>> too,
> >>> but it shouldn't have, so I looked deeper into that and what I see is
> >>> that GDB outsmart us. It doesn't matter if we inject event before int3
> >>> inserted by GDB or after it GDB correctly finds breakpoint that
> >>> triggered and restart instruction correctly. I assume it doesn't use
> >>> exact match between rip where int3 was inserted and where exceptions
> >>> triggers.
> >> At latest when you have two successive breakpoints on single-byte
> >> instructions, gdb will reach its limits (for it failed earlier, BTW).
> >> And other debuggers under other OSes may become unhappy as well.
> > Yes, and that is why I am saying checking with GDB is not a good test.
> > GDB may work, but it doesn't mean injection works correctly. It took me
> > some time to write test that finally confused gdb. It was like this:
> > 
> > 1: int main(int argc, char **argv)
> > 2: {
> > 3:  if (argc == 1)
> > 4:  goto a;
> > 5:  asm("cmc");
> > 6: a:
> > 7:  asm("cmc");
> > 8:  return 0;
> > 9: }
> > 
> > If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB
> > thinks it is on line 5.
> > 
> > So can you run int3 test below on master on AMD with NextRIP support?
> > I doubt the result will be correct.
> 
> If you meant your test above: Works out of the box with unpatched kvm on
> modern AMD CPUs, ie. gdb always stops at line 7 even if host debugging
> is active.
> 
I meant test that does asm("int3") and see that rip it reports with and
without host debugging active is the same and points after int3. But I
guess if program above works correctly int3 test should work too. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Avi Kivity

On 02/17/2010 04:56 PM, Alexander Graf wrote:


So I changed to code according to your input by making all FPU calls explicit, 
getting rid of all binary patching.

On the PowerStation again I'm running this code (simplified to the important 
instructions) using kvmctl:

 li  r2, 0x1234
 std r2, 0(r1)
 lfd f3, 0(r1)
 lfd f4, 0(r1)
do_mul:
 fmulf0, f3, f4
 b   do_mul


With the following kvm_stat output:

  dec   2236  53
  exits 60797802 1171403
  ext_intr   379   4
  halt_wakeup  0   0
  inst_emu  60795247 1171344
  ld60795132 1171348

So I'm getting 1171403 fmul operations per second. And that's even with 
non-optimized instruction fetching. Not bad.
   


It's a large number, but won't real hardware be three orders of 
magnitude faster?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote:
 Jan Kiszka wrote:
> Gleb Natapov wrote:
>> Lets check if SVM works. I can do that if you tell me how.
> - Fire up some Linux guest with gdb installed
> - Attach gdb to gdbstub of the VM
> - Set a soft breakpoint in guest kernel, ideally where it does not
>   immediately trigger, e.g. on sys_reboot (use grep sys_reboot
>   /proc/kallsyms if you don't have symbols for the guest kernel)
> - Start gdb /bin/true in the guest
> - run
>
> As gdb sets some automatic breakpoints, this already exercises the
> reinjection of #BP.
 I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
 and it just worked.

>>> I tested it on processor without NextRIP and your test case works there too,
>>> but it shouldn't have, so I looked deeper into that and what I see is
>>> that GDB outsmart us. It doesn't matter if we inject event before int3
>>> inserted by GDB or after it GDB correctly finds breakpoint that
>>> triggered and restart instruction correctly. I assume it doesn't use
>>> exact match between rip where int3 was inserted and where exceptions
>>> triggers.
>> At latest when you have two successive breakpoints on single-byte
>> instructions, gdb will reach its limits (for it failed earlier, BTW).
>> And other debuggers under other OSes may become unhappy as well.
> Yes, and that is why I am saying checking with GDB is not a good test.
> GDB may work, but it doesn't mean injection works correctly. It took me
> some time to write test that finally confused gdb. It was like this:
> 
> 1: int main(int argc, char **argv)
> 2: {
> 3:if (argc == 1)
> 4:goto a;
> 5:asm("cmc");
> 6: a:
> 7:asm("cmc");
> 8:return 0;
> 9: }
> 
> If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB
> thinks it is on line 5.
> 
> So can you run int3 test below on master on AMD with NextRIP support?
> I doubt the result will be correct.

If you meant your test above: Works out of the box with unpatched kvm on
modern AMD CPUs, ie. gdb always stops at line 7 even if host debugging
is active.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-17 Thread Alexander Graf

On 09.02.2010, at 13:27, Avi Kivity wrote:

> On 02/09/2010 01:13 PM, Alexander Graf wrote:
>> Avi Kivity wrote:
>>   
>>> On 02/09/2010 01:00 PM, Alexander Graf wrote:
>>> 
   
> That's pretty impressive (never saw x86 with this exit rate) but it's
> more than 1000 times slower than the hardware, assuming 1 fpu IPC (and
> the processor can probably do more).  An fpu intensive application
> will slow to a crawl.
> 
> 
 Measuring a typical Gekko application, I get about 200k-250k of fpu
 (incl. paired singles) instructions per second.
 
   
>>> Virtualized, yes?  What's the rate on bare metal?
>>> 
>> 
>> Emulated. I can't measure anything on bare metal.
>>   
> 
> Well, then, the rate may be low due to virtualization overhead.  Any way to 
> compare absolute performance?

So I changed to code according to your input by making all FPU calls explicit, 
getting rid of all binary patching.

On the PowerStation again I'm running this code (simplified to the important 
instructions) using kvmctl:

li  r2, 0x1234
std r2, 0(r1)
lfd f3, 0(r1)
lfd f4, 0(r1)
do_mul:
fmulf0, f3, f4
b   do_mul


With the following kvm_stat output:

 dec   2236  53
 exits 60797802 1171403
 ext_intr   379   4
 halt_wakeup  0   0
 inst_emu  60795247 1171344
 ld60795132 1171348

So I'm getting 1171403 fmul operations per second. And that's even with 
non-optimized instruction fetching. Not bad.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Marcelo Tosatti
On Wed, Feb 17, 2010 at 11:10:07AM +0200, Gleb Natapov wrote:
> On Wed, Feb 17, 2010 at 10:03:58AM +0100, Jan Kiszka wrote:
> > > 
> > > Also, as Avi mentioned it would be better to avoid this. Is it not
> > > possible to disallow migration while interrupt shadow is present?
> > 
> > Which means disallowing user space exists while the shadow it set? Or
> > should we introduce some flag for user space that tells it "do not
> > migration now, resume the guest till next exit"?
> > 
> I think disabling migration is a slippery slope. Guest may abuse it. May
> be it will be hard to do with interrupt shadow, but the mechanism will be
> used for other cases too. I remember there was an argument that we
> should not migrate while vcpu is in a nested guest mode.

Agree that guest may abuse it. Better to save/restore
blocking-by-sti/by-mov-ss individually.

I was thinking the writeback of interrupt shadow / interruptibility state 
would be too complicated (eg necessary to care about ordering, etc), but 
now i see its handled in kernel (inject_pending_event and friends).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-17 Thread Thomas Gleixner
The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
them to raw_spinlock. No change for !RT kernels.

Signed-off-by: Thomas Gleixner 

---
 arch/x86/kvm/i8254.c |   10 +-
 arch/x86/kvm/i8254.h |2 +-
 arch/x86/kvm/i8259.c |   31 ---
 arch/x86/kvm/irq.h   |2 +-
 arch/x86/kvm/x86.c   |8 
 5 files changed, 27 insertions(+), 26 deletions(-)

Index: linux-2.6-tip/arch/x86/kvm/i8254.c
===
--- linux-2.6-tip.orig/arch/x86/kvm/i8254.c
+++ linux-2.6-tip/arch/x86/kvm/i8254.c
@@ -242,11 +242,11 @@ static void kvm_pit_ack_irq(struct kvm_i
 {
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 irq_ack_notifier);
-   spin_lock(&ps->inject_lock);
+   raw_spin_lock(&ps->inject_lock);
if (atomic_dec_return(&ps->pit_timer.pending) < 0)
atomic_inc(&ps->pit_timer.pending);
ps->irq_ack = 1;
-   spin_unlock(&ps->inject_lock);
+   raw_spin_unlock(&ps->inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -624,7 +624,7 @@ struct kvm_pit *kvm_create_pit(struct kv
 
mutex_init(&pit->pit_state.lock);
mutex_lock(&pit->pit_state.lock);
-   spin_lock_init(&pit->pit_state.inject_lock);
+   raw_spin_lock_init(&pit->pit_state.inject_lock);
 
kvm->arch.vpit = pit;
pit->kvm = kvm;
@@ -723,12 +723,12 @@ void kvm_inject_pit_timer_irqs(struct kv
/* Try to inject pending interrupts when
 * last one has been acked.
 */
-   spin_lock(&ps->inject_lock);
+   raw_spin_lock(&ps->inject_lock);
if (atomic_read(&ps->pit_timer.pending) && ps->irq_ack) {
ps->irq_ack = 0;
inject = 1;
}
-   spin_unlock(&ps->inject_lock);
+   raw_spin_unlock(&ps->inject_lock);
if (inject)
__inject_pit_timer_intr(kvm);
}
Index: linux-2.6-tip/arch/x86/kvm/i8254.h
===
--- linux-2.6-tip.orig/arch/x86/kvm/i8254.h
+++ linux-2.6-tip/arch/x86/kvm/i8254.h
@@ -27,7 +27,7 @@ struct kvm_kpit_state {
u32speaker_data_on;
struct mutex lock;
struct kvm_pit *pit;
-   spinlock_t inject_lock;
+   raw_spinlock_t inject_lock;
unsigned long irq_ack;
struct kvm_irq_ack_notifier irq_ack_notifier;
 };
Index: linux-2.6-tip/arch/x86/kvm/i8259.c
===
--- linux-2.6-tip.orig/arch/x86/kvm/i8259.c
+++ linux-2.6-tip/arch/x86/kvm/i8259.c
@@ -44,18 +44,19 @@ static void pic_clear_isr(struct kvm_kpi
 * Other interrupt may be delivered to PIC while lock is dropped but
 * it should be safe since PIC state is already updated at this stage.
 */
-   spin_unlock(&s->pics_state->lock);
+   raw_spin_unlock(&s->pics_state->lock);
kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
-   spin_lock(&s->pics_state->lock);
+   raw_spin_lock(&s->pics_state->lock);
 }
 
 void kvm_pic_clear_isr_ack(struct kvm *kvm)
 {
struct kvm_pic *s = pic_irqchip(kvm);
-   spin_lock(&s->lock);
+
+   raw_spin_lock(&s->lock);
s->pics[0].isr_ack = 0xff;
s->pics[1].isr_ack = 0xff;
-   spin_unlock(&s->lock);
+   raw_spin_unlock(&s->lock);
 }
 
 /*
@@ -156,9 +157,9 @@ static void pic_update_irq(struct kvm_pi
 
 void kvm_pic_update_irq(struct kvm_pic *s)
 {
-   spin_lock(&s->lock);
+   raw_spin_lock(&s->lock);
pic_update_irq(s);
-   spin_unlock(&s->lock);
+   raw_spin_unlock(&s->lock);
 }
 
 int kvm_pic_set_irq(void *opaque, int irq, int level)
@@ -166,14 +167,14 @@ int kvm_pic_set_irq(void *opaque, int ir
struct kvm_pic *s = opaque;
int ret = -1;
 
-   spin_lock(&s->lock);
+   raw_spin_lock(&s->lock);
if (irq >= 0 && irq < PIC_NUM_PINS) {
ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, level);
pic_update_irq(s);
trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
  s->pics[irq >> 3].imr, ret == 0);
}
-   spin_unlock(&s->lock);
+   raw_spin_unlock(&s->lock);
 
return ret;
 }
@@ -203,7 +204,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
int irq, irq2, intno;
struct kvm_pic *s = pic_irqchip(kvm);
 
-   spin_lock(&s->lock);
+   raw_spin_lock(&s->lock);
irq = pic_get_irq(&s->pics[0]);
if (irq >= 0) {
pic_intack(&s->pics[0], irq);
@@ -228,7 +229,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
intno = s->pics[0].irq_base + irq;
}
pic_update_irq(s);
-   spin_unlock(&s->lock);
+   raw_spin

[PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure

2010-02-17 Thread Avi Kivity
From: Wei Yongjun 

kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure
due to cannot register io dev.

Signed-off-by: Wei Yongjun 
Signed-off-by: Avi Kivity 
---
 virt/kvm/ioapic.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index a2edfd1..f3d0693 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -393,8 +393,10 @@ int kvm_ioapic_init(struct kvm *kvm)
mutex_lock(&kvm->slots_lock);
ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
mutex_unlock(&kvm->slots_lock);
-   if (ret < 0)
+   if (ret < 0) {
+   kvm->arch.vioapic = NULL;
kfree(ioapic);
+   }
 
return ret;
 }
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4)

2010-02-17 Thread Avi Kivity
This is the first of four batches of patches for the 2.6.34 merge window.  KVM
changes for this cycle include:

 - rdtscp support
 - powerpc server-class updates
 - much improved large-guest scaling (now up to 64 vcpus)
 - improved guest fpu handling
 - initial Hyper-V emulation
 - better swapping with EPT
 - 1GB pages on Intel
 - x86 emulator fixes

as well as the usual assortment of random fixes and improvements.

Avi Kivity (2):
  KVM: MMU: Add tracepoint for guest page aging
  KVM: Plan obsolescence of kernel allocated slots, paravirt mmu

Gleb Natapov (9):
  KVM: x86 emulator: Add group8 instruction decoding
  KVM: x86 emulator: Add group9 instruction decoding
  KVM: x86 emulator: Add Virtual-8086 mode of emulation
  KVM: x86 emulator: fix memory access during x86 emulation
  KVM: x86 emulator: Check IOPL level during io instruction emulation
  KVM: x86 emulator: Fix popf emulation
  KVM: x86 emulator: Check CPL level during privilege instruction
emulation
  KVM: x86 emulator: Add LOCK prefix validity checking
  KVM: x86 emulator: disallow opcode 82 in 64-bit mode

Jochen Maes (1):
  KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c

Liu Yu (1):
  KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest

Michael S. Tsirkin (1):
  KVM: do not store wqh in irqfd

Sheng Yang (1):
  KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT

Wei Yongjun (5):
  KVM: PIT: unregister kvm irq notifier if fail to create pit
  KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
  KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
  KVM: ia64: destroy ioapic device if fail to setup default irq routing
  KVM: x86 emulator: code style cleanup

 Documentation/feature-removal-schedule.txt |   30 +++
 arch/ia64/kvm/kvm-ia64.c   |2 +-
 arch/powerpc/include/asm/kvm_host.h|2 +
 arch/powerpc/kvm/booke.c   |   59 --
 arch/powerpc/kvm/emulate.c |4 +-
 arch/x86/include/asm/kvm_emulate.h |   15 ++-
 arch/x86/include/asm/kvm_host.h|8 +-
 arch/x86/include/asm/vmx.h |2 +-
 arch/x86/kvm/emulate.c |  300 +---
 arch/x86/kvm/i8254.c   |5 +-
 arch/x86/kvm/i8259.c   |   11 +
 arch/x86/kvm/irq.h |1 +
 arch/x86/kvm/mmu.c |   28 ++--
 arch/x86/kvm/mmu.h |6 +
 arch/x86/kvm/paging_tmpl.h |   11 +-
 arch/x86/kvm/vmx.c |4 +-
 arch/x86/kvm/x86.c |  152 ++
 include/trace/events/kvm.h |   22 ++
 virt/kvm/coalesced_mmio.c  |4 +-
 virt/kvm/eventfd.c |3 -
 virt/kvm/ioapic.c  |   15 ++-
 virt/kvm/ioapic.h  |1 +
 22 files changed, 525 insertions(+), 160 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing

2010-02-17 Thread Avi Kivity
From: Wei Yongjun 

If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(),
ioapic device is not destroyed and kvm->arch.vioapic is not set to
NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to
unexcepted memory.

Signed-off-by: Wei Yongjun 
Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 0618898..26e0e08 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -968,7 +968,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
goto out;
r = kvm_setup_default_irq_routing(kvm);
if (r) {
-   kfree(kvm->arch.vioapic);
+   kvm_ioapic_destroy(kvm);
goto out;
}
break;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/20] KVM: x86 emulator: Add group9 instruction decoding

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Use groups mechanism to decode 0F C7 instructions.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 435b1e4..45a4f7c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -88,7 +88,7 @@
 enum {
Group1_80, Group1_81, Group1_82, Group1_83,
Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
-   Group8,
+   Group8, Group9,
 };
 
 static u32 opcode_table[256] = {
@@ -272,7 +272,8 @@ static u32 twobyte_table[256] = {
0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem16 | ModRM | Mov,
/* 0xC0 - 0xCF */
-   0, 0, 0, DstMem | SrcReg | ModRM | Mov, 0, 0, 0, ImplicitOps | ModRM,
+   0, 0, 0, DstMem | SrcReg | ModRM | Mov,
+   0, 0, 0, Group | GroupDual | Group9,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0xD0 - 0xDF */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@@ -328,6 +329,8 @@ static u32 group_table[] = {
0, 0, 0, 0,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+   [Group9*8] =
+   0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0,
 };
 
 static u32 group2_table[] = {
@@ -335,6 +338,8 @@ static u32 group2_table[] = {
SrcNone | ModRM, 0, 0, SrcNone | ModRM,
SrcNone | ModRM | DstMem | Mov, 0,
SrcMem16 | ModRM | Mov, 0,
+   [Group9*8] =
+   0, 0, 0, 0, 0, 0, 0, 0,
 };
 
 /* EFLAGS bit definitions. */
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Use groups mechanism to decode 0F BA instructions.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 645b245..435b1e4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -88,6 +88,7 @@
 enum {
Group1_80, Group1_81, Group1_82, Group1_83,
Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
+   Group8,
 };
 
 static u32 opcode_table[256] = {
@@ -267,7 +268,7 @@ static u32 twobyte_table[256] = {
0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem16 | ModRM | Mov,
/* 0xB8 - 0xBF */
-   0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcReg | ModRM | BitOp,
+   0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp,
0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem16 | ModRM | Mov,
/* 0xC0 - 0xCF */
@@ -323,6 +324,10 @@ static u32 group_table[] = {
0, 0, ModRM | SrcMem, ModRM | SrcMem,
SrcNone | ModRM | DstMem | Mov, 0,
SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
+   [Group8*8] =
+   0, 0, 0, 0,
+   DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+   DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
 };
 
 static u32 group2_table[] = {
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Currently when x86 emulator needs to access memory, page walk is done with
broadest permission possible, so if emulated instruction was executed
by userspace process it can still access kernel memory. Fix that by
providing correct memory access to page walker during emulation.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   14 +++-
 arch/x86/include/asm/kvm_host.h|7 ++-
 arch/x86/kvm/emulate.c |6 +-
 arch/x86/kvm/mmu.c |   17 ++---
 arch/x86/kvm/mmu.h |6 ++
 arch/x86/kvm/paging_tmpl.h |   11 ++-
 arch/x86/kvm/x86.c |  131 +++-
 7 files changed, 142 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 784d7c5..7a6f54f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -54,13 +54,23 @@ struct x86_emulate_ctxt;
 struct x86_emulate_ops {
/*
 * read_std: Read bytes of standard (non-emulated/special) memory.
-*   Used for instruction fetch, stack operations, and others.
+*   Used for descriptor reading.
 *  @addr:  [IN ] Linear address from which to read.
 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
 *  @bytes: [IN ] Number of bytes to read from memory.
 */
int (*read_std)(unsigned long addr, void *val,
-   unsigned int bytes, struct kvm_vcpu *vcpu);
+   unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
+
+   /*
+* fetch: Read bytes of standard (non-emulated/special) memory.
+*Used for instruction fetch.
+*  @addr:  [IN ] Linear address from which to read.
+*  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
+*  @bytes: [IN ] Number of bytes to read from memory.
+*/
+   int (*fetch)(unsigned long addr, void *val,
+   unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
/*
 * read_emulated: Read bytes from emulated/special memory area.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1522337..c07c16f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -243,7 +243,8 @@ struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
void (*free)(struct kvm_vcpu *vcpu);
-   gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva);
+   gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
+   u32 *error);
void (*prefetch_page)(struct kvm_vcpu *vcpu,
  struct kvm_mmu_page *page);
int (*sync_page)(struct kvm_vcpu *vcpu,
@@ -660,6 +661,10 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e2df3..c44b460 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -616,7 +616,7 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
 
if (linear < fc->start || linear >= fc->end) {
size = min(15UL, PAGE_SIZE - offset_in_page(linear));
-   rc = ops->read_std(linear, fc->data, size, ctxt->vcpu);
+   rc = ops->fetch(linear, fc->data, size, ctxt->vcpu, NULL);
if (rc)
return rc;
fc->start = linear;
@@ -671,11 +671,11 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
op_bytes = 3;
*address = 0;
rc = ops->read_std((unsigned long)ptr, (unsigned long *)size, 2,
-  ctxt->vcpu);
+  ctxt->vcpu, NULL);
if (rc)
return rc;
rc = ops->read_std((unsigned long)ptr + 2, address, op_bytes,
-  ctxt->vcpu);
+  ctxt->vcpu, NULL);
return rc;
 }
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7397932..741373e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -138,12 +138,6 @@ module_param(oos_shadow, bool, 0644);
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
| PT64_NX_MASK)
 
-#define P

[PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

For some instructions CPU behaves differently for real-mode and
virtual 8086. Let emulator know which mode cpu is in, so it will
not poke into vcpu state directly.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   12 +++-
 arch/x86/kvm/x86.c |3 ++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 9b697c2..784d7c5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -168,6 +168,7 @@ struct x86_emulate_ctxt {
 
 /* Execution mode, passed to the emulator. */
 #define X86EMUL_MODE_REAL 0/* Real mode. */
+#define X86EMUL_MODE_VM86 1/* Virtual 8086 mode. */
 #define X86EMUL_MODE_PROT16   2/* 16-bit protected mode. */
 #define X86EMUL_MODE_PROT32   4/* 32-bit protected mode. */
 #define X86EMUL_MODE_PROT64   8/* 64-bit (long) mode.*/
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 45a4f7c..e4e2df3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -899,6 +899,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
switch (mode) {
case X86EMUL_MODE_REAL:
+   case X86EMUL_MODE_VM86:
case X86EMUL_MODE_PROT16:
def_op_bytes = def_ad_bytes = 2;
break;
@@ -1525,7 +1526,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
/* syscall is not available in real mode */
if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
-   || !is_protmode(ctxt->vcpu))
+   || ctxt->mode == X86EMUL_MODE_VM86)
return -1;
 
setup_syscalls_segments(ctxt, &cs, &ss);
@@ -1577,8 +1578,8 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
if (c->lock_prefix)
return -1;
 
-   /* inject #GP if in real mode or paging is disabled */
-   if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
+   /* inject #GP if in real mode */
+   if (ctxt->mode == X86EMUL_MODE_REAL) {
kvm_inject_gp(ctxt->vcpu, 0);
return -1;
}
@@ -1642,8 +1643,9 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
if (c->lock_prefix)
return -1;
 
-   /* inject #GP if in real mode or paging is disabled */
-   if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
+   /* inject #GP if in real mode or Virtual 8086 mode */
+   if (ctxt->mode == X86EMUL_MODE_REAL ||
+   ctxt->mode == X86EMUL_MODE_VM86) {
kvm_inject_gp(ctxt->vcpu, 0);
return -1;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2f91b9..a283795 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3348,8 +3348,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
vcpu->arch.emulate_ctxt.vcpu = vcpu;
vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
vcpu->arch.emulate_ctxt.mode =
+   (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
-   ? X86EMUL_MODE_REAL : cs_l
+   ? X86EMUL_MODE_VM86 : cs_l
? X86EMUL_MODE_PROT64 : cs_db
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Make stepping out of NMI handlers more robust

2010-02-17 Thread Gleb Natapov
On Tue, Feb 16, 2010 at 12:08:58PM +0200, Gleb Natapov wrote:
> > > 
> > >> Besides this, proper #DB forwarding to the guest was missing.
> > > During NMI injection? How to reproduce?
> > 
> > Inject, e.g., an NMI over code with TF set. A bit harder is placing a
> > guest HW breakpoint at the spot the NMI handler returns to.
> > 
> Will try to reproduce.
> 
How can I make gdb to run debugged process with TF set? Is this patch
fixes it:


diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 52f78dd..b85b200 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -109,6 +109,7 @@ struct vcpu_svm {
struct nested_state nested;
 
bool nmi_singlestep;
+   bool nmi_singlestep_tf;
 };
 
 /* enable NPT for AMD64 and X86 with PAE */
@@ -1221,9 +1222,14 @@ static int db_interception(struct vcpu_svm *svm)
 
if (svm->nmi_singlestep) {
svm->nmi_singlestep = false;
-   if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
+   if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
svm->vmcb->save.rflags &=
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
+   if (svm->nmi_singlestep_tf) {
+   svm->vmcb->save.rflags |= X86_EFLAGS_TF;
+   kvm_queue_exception(&svm->vcpu, DB_VECTOR);
+   }
+   }
update_db_intercept(&svm->vcpu);
}
 
@@ -2586,6 +2592,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
   possible problem (IRET or exception injection or interrupt
   shadow) */
svm->nmi_singlestep = true;
+   svm->nmi_singlestep_tf = (svm->vmcb->save.rflags | X86_EFLAGS_TF);
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
update_db_intercept(vcpu);
 }
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT

2010-02-17 Thread Avi Kivity
From: Sheng Yang 

Following the new SDM. Now the bit is named "Ignore PAT memory type".

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/vmx.h |2 +-
 arch/x86/kvm/vmx.c |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 43f1e9b..fb9a080 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -377,7 +377,7 @@ enum vmcs_field {
 #define VMX_EPT_READABLE_MASK  0x1ull
 #define VMX_EPT_WRITABLE_MASK  0x2ull
 #define VMX_EPT_EXECUTABLE_MASK0x4ull
-#define VMX_EPT_IGMT_BIT   (1ull << 6)
+#define VMX_EPT_IPAT_BIT   (1ull << 6)
 
 #define VMX_EPT_IDENTITY_PAGETABLE_ADDR0xfffbc000ul
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b400be0..f82b072 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4001,7 +4001,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 *   b. VT-d with snooping control feature: snooping control feature of
 *  VT-d engine can guarantee the cache correctness. Just set it
 *  to WB to keep consistent with host. So the same as item 3.
-* 3. EPT without VT-d: always map as WB and set IGMT=1 to keep
+* 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
 *consistent with host MTRR
 */
if (is_mmio)
@@ -4012,7 +4012,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
  VMX_EPT_MT_EPTE_SHIFT;
else
ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
-   | VMX_EPT_IGMT_BIT;
+   | VMX_EPT_IPAT_BIT;
 
return ret;
 }
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl

2010-02-17 Thread Avi Kivity
From: Wei Yongjun 

If we fail to init ioapic device or the fail to setup the default irq
routing, the device register by kvm_create_pic() and kvm_ioapic_init()
remain unregister. This patch fixed to do this.

Signed-off-by: Wei Yongjun 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/i8259.c |   11 +++
 arch/x86/kvm/irq.h   |1 +
 arch/x86/kvm/x86.c   |8 
 virt/kvm/ioapic.c|   11 +++
 virt/kvm/ioapic.h|1 +
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index d5753a7..a3711f9 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -543,3 +543,14 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 
return s;
 }
+
+void kvm_destroy_pic(struct kvm *kvm)
+{
+   struct kvm_pic *vpic = kvm->arch.vpic;
+
+   if (vpic) {
+   kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &vpic->dev);
+   kvm->arch.vpic = NULL;
+   kfree(vpic);
+   }
+}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index be399e2..0b71d48 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -75,6 +75,7 @@ struct kvm_pic {
 };
 
 struct kvm_pic *kvm_create_pic(struct kvm *kvm);
+void kvm_destroy_pic(struct kvm *kvm);
 int kvm_pic_read_irq(struct kvm *kvm);
 void kvm_pic_update_irq(struct kvm_pic *s);
 void kvm_pic_clear_isr_ack(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd3161c..b2f91b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2771,6 +2771,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (vpic) {
r = kvm_ioapic_init(kvm);
if (r) {
+   kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
+ &vpic->dev);
kfree(vpic);
goto create_irqchip_unlock;
}
@@ -2782,10 +2784,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_setup_default_irq_routing(kvm);
if (r) {
mutex_lock(&kvm->irq_lock);
-   kfree(kvm->arch.vpic);
-   kfree(kvm->arch.vioapic);
-   kvm->arch.vpic = NULL;
-   kvm->arch.vioapic = NULL;
+   kvm_ioapic_destroy(kvm);
+   kvm_destroy_pic(kvm);
mutex_unlock(&kvm->irq_lock);
}
create_irqchip_unlock:
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index f3d0693..3db15a8 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -401,6 +401,17 @@ int kvm_ioapic_init(struct kvm *kvm)
return ret;
 }
 
+void kvm_ioapic_destroy(struct kvm *kvm)
+{
+   struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+   if (ioapic) {
+   kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
+   kvm->arch.vioapic = NULL;
+   kfree(ioapic);
+   }
+}
+
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state)
 {
struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a505ce9..8a751b7 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -72,6 +72,7 @@ int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
+void kvm_ioapic_destroy(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging

2010-02-17 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/mmu.c |   11 ---
 include/trace/events/kvm.h |   22 ++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b8da671..7397932 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -151,6 +151,9 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#include 
+
+#undef TRACE_INCLUDE_FILE
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
 
@@ -792,6 +795,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
 unsigned long data))
 {
int i, j;
+   int ret;
int retval = 0;
struct kvm_memslots *slots;
 
@@ -806,16 +810,17 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
if (hva >= start && hva < end) {
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
 
-   retval |= handler(kvm, &memslot->rmap[gfn_offset],
- data);
+   ret = handler(kvm, &memslot->rmap[gfn_offset], data);
 
for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
int idx = gfn_offset;
idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + 
j);
-   retval |= handler(kvm,
+   ret |= handler(kvm,
&memslot->lpage_info[j][idx].rmap_pde,
data);
}
+   trace_kvm_age_page(hva, memslot, ret);
+   retval |= ret;
}
}
 
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 8abdc12..b17d49d 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -164,6 +164,28 @@ TRACE_EVENT(kvm_fpu,
TP_printk("%s", __print_symbolic(__entry->load, kvm_fpu_load_symbol))
 );
 
+TRACE_EVENT(kvm_age_page,
+   TP_PROTO(ulong hva, struct kvm_memory_slot *slot, int ref),
+   TP_ARGS(hva, slot, ref),
+
+   TP_STRUCT__entry(
+   __field(u64,hva )
+   __field(u64,gfn )
+   __field(u8, referenced  )
+   ),
+
+   TP_fast_assign(
+   __entry->hva= hva;
+   __entry->gfn=
+ slot->base_gfn + ((hva - slot->userspace_addr) >> PAGE_SHIFT);
+   __entry->referenced = ref;
+   ),
+
+   TP_printk("hva %llx gfn %llx %s",
+ __entry->hva, __entry->gfn,
+ __entry->referenced ? "YOUNG" : "OLD")
+);
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit

2010-02-17 Thread Avi Kivity
From: Wei Yongjun 

If fail to create pit, we should unregister kvm irq notifier
which register in kvm_create_pit().

Signed-off-by: Wei Yongjun 
Acked-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/i8254.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 6a74246..c9569f2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -663,8 +663,9 @@ fail_unregister:
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
 
 fail:
-   if (pit->irq_source_id >= 0)
-   kvm_free_irq_source_id(kvm, pit->irq_source_id);
+   kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+   kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
+   kvm_free_irq_source_id(kvm, pit->irq_source_id);
 
kfree(pit);
return NULL;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Add CPL checking in case emulator is tricked into emulating
privilege instruction from userspace.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   35 ---
 1 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1782387..d632111 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -76,6 +76,7 @@
 #define GroupDual   (1<<15) /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff/* Group number stored in bits 0:7 */
 /* Misc flags */
+#define Priv(1<<27) /* instruction generates #GP if current CPL != 0 */
 #define No64   (1<<28)
 /* Source 2 operand type */
 #define Src2None(0<<29)
@@ -211,7 +212,7 @@ static u32 opcode_table[256] = {
SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
/* 0xF0 - 0xF7 */
0, 0, 0, 0,
-   ImplicitOps, ImplicitOps, Group | Group3_Byte, Group | Group3,
+   ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3,
/* 0xF8 - 0xFF */
ImplicitOps, 0, ImplicitOps, ImplicitOps,
ImplicitOps, ImplicitOps, Group | Group4, Group | Group5,
@@ -219,16 +220,20 @@ static u32 opcode_table[256] = {
 
 static u32 twobyte_table[256] = {
/* 0x00 - 0x0F */
-   0, Group | GroupDual | Group7, 0, 0, 0, ImplicitOps, ImplicitOps, 0,
-   ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
+   0, Group | GroupDual | Group7, 0, 0,
+   0, ImplicitOps, ImplicitOps | Priv, 0,
+   ImplicitOps | Priv, ImplicitOps | Priv, 0, 0,
+   0, ImplicitOps | ModRM, 0, 0,
/* 0x10 - 0x1F */
0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, 0,
/* 0x20 - 0x2F */
-   ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
+   ModRM | ImplicitOps | Priv, ModRM | Priv,
+   ModRM | ImplicitOps | Priv, ModRM | Priv,
+   0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x30 - 0x3F */
-   ImplicitOps, 0, ImplicitOps, 0,
-   ImplicitOps, ImplicitOps, 0, 0,
+   ImplicitOps | Priv, 0, ImplicitOps | Priv, 0,
+   ImplicitOps, ImplicitOps | Priv, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x40 - 0x47 */
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -322,9 +327,9 @@ static u32 group_table[] = {
SrcMem | ModRM | Stack, 0,
SrcMem | ModRM | Stack, 0, SrcMem | ModRM | Stack, 0,
[Group7*8] =
-   0, 0, ModRM | SrcMem, ModRM | SrcMem,
+   0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv,
SrcNone | ModRM | DstMem | Mov, 0,
-   SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
+   SrcMem16 | ModRM | Mov | Priv, SrcMem | ModRM | ByteOp | Priv,
[Group8*8] =
0, 0, 0, 0,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
@@ -335,7 +340,7 @@ static u32 group_table[] = {
 
 static u32 group2_table[] = {
[Group7*8] =
-   SrcNone | ModRM, 0, 0, SrcNone | ModRM,
+   SrcNone | ModRM | Priv, 0, 0, SrcNone | ModRM,
SrcNone | ModRM | DstMem | Mov, 0,
SrcMem16 | ModRM | Mov, 0,
[Group9*8] =
@@ -1700,12 +1705,6 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return -1;
}
 
-   /* sysexit must be called from CPL 0 */
-   if (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0) {
-   kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
-   }
-
setup_syscalls_segments(ctxt, &cs, &ss);
 
if ((c->rex_prefix & 0x8) != 0x0)
@@ -1820,6 +1819,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
saved_eip = c->eip;
 
+   /* Privileged instruction can be executed only in CPL=0 */
+   if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
+
if (((c->d & ModRM) && (c->modrm_mod != 3)) || (c->d & MemAbs))
memop = c->modrm_ea;
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/20] KVM: do not store wqh in irqfd

2010-02-17 Thread Avi Kivity
From: Michael S. Tsirkin 

wqh is unused, so we do not need to store it in irqfd anymore

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Avi Kivity 
---
 virt/kvm/eventfd.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 486c604..7016319 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -47,7 +47,6 @@ struct _irqfd {
int   gsi;
struct list_head  list;
poll_tablept;
-   wait_queue_head_t*wqh;
wait_queue_t  wait;
struct work_structinject;
struct work_structshutdown;
@@ -159,8 +158,6 @@ irqfd_ptable_queue_proc(struct file *file, 
wait_queue_head_t *wqh,
poll_table *pt)
 {
struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
-
-   irqfd->wqh = wqh;
add_wait_queue(wqh, &irqfd->wait);
 }
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Make emulator check that vcpu is allowed to execute IN, INS, OUT,
OUTS, CLI, STI.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/emulate.c  |   89 +++---
 arch/x86/kvm/x86.c  |   10 ++---
 3 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c07c16f..f9a2f66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -678,6 +678,7 @@ void kvm_disable_tdp(void);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 int complete_pio(struct kvm_vcpu *vcpu);
+bool kvm_check_iopl(struct kvm_vcpu *vcpu);
 
 struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c44b460..296e851 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1698,6 +1698,57 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return 0;
 }
 
+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
+{
+   int iopl;
+   if (ctxt->mode == X86EMUL_MODE_REAL)
+   return false;
+   if (ctxt->mode == X86EMUL_MODE_VM86)
+   return true;
+   iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
+   return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl;
+}
+
+static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops,
+   u16 port, u16 len)
+{
+   struct kvm_segment tr_seg;
+   int r;
+   u16 io_bitmap_ptr;
+   u8 perm, bit_idx = port & 0x7;
+   unsigned mask = (1 << len) - 1;
+
+   kvm_get_segment(ctxt->vcpu, &tr_seg, VCPU_SREG_TR);
+   if (tr_seg.unusable)
+   return false;
+   if (tr_seg.limit < 103)
+   return false;
+   r = ops->read_std(tr_seg.base + 102, &io_bitmap_ptr, 2, ctxt->vcpu,
+ NULL);
+   if (r != X86EMUL_CONTINUE)
+   return false;
+   if (io_bitmap_ptr + port/8 > tr_seg.limit)
+   return false;
+   r = ops->read_std(tr_seg.base + io_bitmap_ptr + port/8, &perm, 1,
+ ctxt->vcpu, NULL);
+   if (r != X86EMUL_CONTINUE)
+   return false;
+   if ((perm >> bit_idx) & mask)
+   return false;
+   return true;
+}
+
+static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 port, u16 len)
+{
+   if (emulator_bad_iopl(ctxt))
+   if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
+   return false;
+   return true;
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
@@ -1889,7 +1940,12 @@ special_insn:
break;
case 0x6c:  /* insb */
case 0x6d:  /* insw/insd */
-if (kvm_emulate_pio_string(ctxt->vcpu,
+   if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
+ (c->d & ByteOp) ? 1 : c->op_bytes)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
+   if (kvm_emulate_pio_string(ctxt->vcpu,
1,
(c->d & ByteOp) ? 1 : c->op_bytes,
c->rep_prefix ?
@@ -1905,6 +1961,11 @@ special_insn:
return 0;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
+   if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
+ (c->d & ByteOp) ? 1 : c->op_bytes)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
if (kvm_emulate_pio_string(ctxt->vcpu,
0,
(c->d & ByteOp) ? 1 : c->op_bytes,
@@ -2202,7 +2263,13 @@ special_insn:
case 0xef: /* out (e/r)ax,dx */
port = c->regs[VCPU_REGS_RDX];
io_dir_in = 0;
-   do_io:  if (kvm_emulate_pio(ctxt->vcpu, io_dir_in,
+   do_io:
+   if (!emulator_io_permited(ctxt, ops, port,
+ (c->d & ByteOp) ? 1 : c->op_bytes)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
+   if (kvm_emulate_pio(ctxt->vcpu, io_dir_in,
   (c->d & ByteOp) ? 1 : c->op_bytes,
   port) != 0) {
c->eip = saved_eip;
@@ -2227,13 +2294,21 @@ special_insn:
 

[PATCH 19/20] KVM: x86 emulator: code style cleanup

2010-02-17 Thread Avi Kivity
From: Wei Yongjun 

Just remove redundant semicolon.

Signed-off-by: Wei Yongjun 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c2de9f0..dd1b935 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1016,7 +1016,7 @@ done_prefixes:
}
 
if (mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
-   kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 
instruction");;
+   kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 
instruction");
return -1;
}
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/20] KVM: x86 emulator: Fix popf emulation

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

POPF behaves differently depending on current CPU mode. Emulate correct
logic to prevent guest from changing flags that it can't change otherwise.

Signed-off-by: Gleb Natapov 
Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   55 +++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 296e851..1782387 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -343,11 +343,18 @@ static u32 group2_table[] = {
 };
 
 /* EFLAGS bit definitions. */
+#define EFLG_ID (1<<21)
+#define EFLG_VIP (1<<20)
+#define EFLG_VIF (1<<19)
+#define EFLG_AC (1<<18)
 #define EFLG_VM (1<<17)
 #define EFLG_RF (1<<16)
+#define EFLG_IOPL (3<<12)
+#define EFLG_NT (1<<14)
 #define EFLG_OF (1<<11)
 #define EFLG_DF (1<<10)
 #define EFLG_IF (1<<9)
+#define EFLG_TF (1<<8)
 #define EFLG_SF (1<<7)
 #define EFLG_ZF (1<<6)
 #define EFLG_AF (1<<4)
@@ -1214,6 +1221,49 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
+static int emulate_popf(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  void *dest, int len)
+{
+   int rc;
+   unsigned long val, change_mask;
+   int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
+   int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu);
+
+   rc = emulate_pop(ctxt, ops, &val, len);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   change_mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | EFLG_OF
+   | EFLG_TF | EFLG_DF | EFLG_NT | EFLG_RF | EFLG_AC | EFLG_ID;
+
+   switch(ctxt->mode) {
+   case X86EMUL_MODE_PROT64:
+   case X86EMUL_MODE_PROT32:
+   case X86EMUL_MODE_PROT16:
+   if (cpl == 0)
+   change_mask |= EFLG_IOPL;
+   if (cpl <= iopl)
+   change_mask |= EFLG_IF;
+   break;
+   case X86EMUL_MODE_VM86:
+   if (iopl < 3) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+   change_mask |= EFLG_IF;
+   break;
+   default: /* real mode */
+   change_mask |= (EFLG_IOPL | EFLG_IF);
+   break;
+   }
+
+   *(unsigned long *)dest =
+   (ctxt->eflags & ~change_mask) | (val & change_mask);
+
+   return rc;
+}
+
 static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg)
 {
struct decode_cache *c = &ctxt->decode;
@@ -2099,7 +2149,10 @@ special_insn:
c->dst.type = OP_REG;
c->dst.ptr = (unsigned long *) &ctxt->eflags;
c->dst.bytes = c->op_bytes;
-   goto pop_instruction;
+   rc = emulate_popf(ctxt, ops, &c->dst.val, c->op_bytes);
+   if (rc != X86EMUL_CONTINUE)
+   goto done;
+   break;
case 0xa0 ... 0xa1: /* mov */
c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX];
c->dst.val = c->src.val;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest

2010-02-17 Thread Avi Kivity
From: Liu Yu 

Old method prematurely sets ESR and DEAR.
Move this part after we decide to inject interrupt,
which is more like hardware behave.

Signed-off-by: Liu Yu 
Acked-by: Hollis Blanchard 
Acked-by: Alexander Graf 
Signed-off-by: Avi Kivity 
---
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/kvm/booke.c|   59 ++-
 arch/powerpc/kvm/emulate.c  |4 +-
 3 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 715aa6b..5e5bae7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -259,6 +259,8 @@ struct kvm_vcpu_arch {
 #endif
ulong fault_dear;
ulong fault_esr;
+   ulong queued_dear;
+   ulong queued_esr;
gpa_t paddr_accessed;
 
u8 io_gpr; /* GPR used as IO source/target */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e283e44..4d686cc 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -82,9 +82,32 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
set_bit(priority, &vcpu->arch.pending_exceptions);
 }
 
-void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
+static void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu,
+ulong dear_flags, ulong esr_flags)
 {
-   /* BookE does flags in ESR, so ignore those we get here */
+   vcpu->arch.queued_dear = dear_flags;
+   vcpu->arch.queued_esr = esr_flags;
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS);
+}
+
+static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
+   ulong dear_flags, ulong esr_flags)
+{
+   vcpu->arch.queued_dear = dear_flags;
+   vcpu->arch.queued_esr = esr_flags;
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE);
+}
+
+static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
+   ulong esr_flags)
+{
+   vcpu->arch.queued_esr = esr_flags;
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE);
+}
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong esr_flags)
+{
+   vcpu->arch.queued_esr = esr_flags;
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
 }
 
@@ -115,14 +138,19 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
 {
int allowed = 0;
ulong msr_mask;
+   bool update_esr = false, update_dear = false;
 
switch (priority) {
-   case BOOKE_IRQPRIO_PROGRAM:
case BOOKE_IRQPRIO_DTLB_MISS:
-   case BOOKE_IRQPRIO_ITLB_MISS:
-   case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_DATA_STORAGE:
+   update_dear = true;
+   /* fall through */
case BOOKE_IRQPRIO_INST_STORAGE:
+   case BOOKE_IRQPRIO_PROGRAM:
+   update_esr = true;
+   /* fall through */
+   case BOOKE_IRQPRIO_ITLB_MISS:
+   case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
case BOOKE_IRQPRIO_SPE_UNAVAIL:
case BOOKE_IRQPRIO_SPE_FP_DATA:
@@ -157,6 +185,10 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
vcpu->arch.srr0 = vcpu->arch.pc;
vcpu->arch.srr1 = vcpu->arch.msr;
vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[priority];
+   if (update_esr == true)
+   vcpu->arch.esr = vcpu->arch.queued_esr;
+   if (update_dear == true)
+   vcpu->arch.dear = vcpu->arch.queued_dear;
kvmppc_set_msr(vcpu, vcpu->arch.msr & msr_mask);
 
clear_bit(priority, &vcpu->arch.pending_exceptions);
@@ -229,8 +261,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
if (vcpu->arch.msr & MSR_PR) {
/* Program traps generated by user-level software must 
be handled
 * by the guest kernel. */
-   vcpu->arch.esr = vcpu->arch.fault_esr;
-   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
+   kvmppc_core_queue_program(vcpu, vcpu->arch.fault_esr);
r = RESUME_GUEST;
kvmppc_account_exit(vcpu, USR_PR_INST);
break;
@@ -286,16 +317,14 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
 
case BOOKE_INTERRUPT_DATA_STORAGE:
-   vcpu->arch.dear = vcpu->arch.fault_dear;
-   vcpu->arch.esr = vcpu->arch.fault_esr;
-   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE);
+   kvmppc_core_queue_data_storage(vcpu, vcpu->arch.fault_dear,
+  vcpu->arch.fault_esr);
kvmppc_acc

[PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Instructions which are not allowed to have LOCK prefix should
generate #UD if one is used.

[avi: fold opcode 82 fix from another patch]

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   97 +++
 1 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d632111..c2de9f0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -76,6 +76,7 @@
 #define GroupDual   (1<<15) /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff/* Group number stored in bits 0:7 */
 /* Misc flags */
+#define Lock(1<<26) /* lock prefix is allowed for the instruction */
 #define Priv(1<<27) /* instruction generates #GP if current CPL != 0 */
 #define No64   (1<<28)
 /* Source 2 operand type */
@@ -94,35 +95,35 @@ enum {
 
 static u32 opcode_table[256] = {
/* 0x00 - 0x07 */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x08 - 0x0F */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
ImplicitOps | Stack | No64, 0,
/* 0x10 - 0x17 */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x18 - 0x1F */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x20 - 0x27 */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
DstAcc | SrcImmByte, DstAcc | SrcImm, 0, 0,
/* 0x28 - 0x2F */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
0, 0, 0, 0,
/* 0x30 - 0x37 */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
0, 0, 0, 0,
/* 0x38 - 0x3F */
@@ -158,7 +159,7 @@ static u32 opcode_table[256] = {
Group | Group1_80, Group | Group1_81,
Group | Group1_82, Group | Group1_83,
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
/* 0x88 - 0x8F */
ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov,
ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -263,17 +264,18 @@ static u32 twobyte_table[256] = {
DstMem | SrcReg | Src2CL | ModRM, 0, 0,
/* 0xA8 - 0xAF */
ImplicitOps | Stack, ImplicitOps | Stack,
-   0, DstMem | SrcReg | ModRM | BitOp,
+   0, DstMem | SrcReg | ModRM | BitOp | Lock,
DstMem | SrcReg | Src2ImmByte | ModRM,
DstMem | SrcReg | Src2CL | ModRM,
ModRM, 0,
/* 0xB0 - 0xB7 */
-   ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, 0,
-   DstMem | SrcReg | ModRM | BitOp,
+   ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
+   0, DstMem | SrcReg | ModRM | BitOp | Lock,
0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem16 | ModRM | Mov,
/* 0xB8 - 0xBF */
-   0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp,
+   0, 0,
+   Group | Group8, DstMem | SrcReg | ModRM | BitOp | Lock,
0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem16 | ModRM | Mov,
/* 0xC0 - 0xCF */
@@ -290,25 +292,41 @@ static u32 twobyte_table[256] = {
 
 static u32 group_table[] = {
[Group1_80*8] =
-   ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-   ByteOp | DstMem |

[PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode

2010-02-17 Thread Avi Kivity
From: Gleb Natapov 

Instructions with opcode 82 are not valid in 64 bit mode.

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd1b935..c280c23 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -310,14 +310,14 @@ static u32 group_table[] = {
DstMem | SrcImm | ModRM | Lock,
DstMem | SrcImm | ModRM,
[Group1_82*8] =
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM | Lock,
-   ByteOp | DstMem | SrcImm | ModRM,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+   ByteOp | DstMem | SrcImm | ModRM | No64,
[Group1_83*8] =
DstMem | SrcImmByte | ModRM | Lock,
DstMem | SrcImmByte | ModRM | Lock,
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu

2010-02-17 Thread Avi Kivity
These features are unused by modern userspace and can go away.  Paravirt
mmu needs to stay a little longer for live migration.

Signed-off-by: Avi Kivity 
---
 Documentation/feature-removal-schedule.txt |   30 
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index 0a46833..47a6554 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -542,3 +542,33 @@ Why:   Duplicate functionality with the gspca_zc3xx 
driver, zc0301 only
sensors) wich are also supported by the gspca_zc3xx driver
(which supports 53 USB-ID's in total)
 Who:   Hans de Goede 
+
+
+
+What:  KVM memory aliases support
+When:  July 2010
+Why:   Memory aliasing support is used for speeding up guest vga access
+   through the vga windows.
+
+   Modern userspace no longer uses this feature, so it's just bitrotted
+   code and can be removed with no impact.
+Who:   Avi Kivity 
+
+
+
+What:  KVM kernel-allocated memory slots
+When:  July 2010
+Why:   Since 2.6.25, kvm supports user-allocated memory slots, which are
+   much more flexible than kernel-allocated slots.  All current userspace
+   supports the newer interface and this code can be removed with no
+   impact.
+Who:   Avi Kivity 
+
+
+
+What:  KVM paravirt mmu host support
+When:  January 2011
+Why:   The paravirt mmu host support is slower than non-paravirt mmu, both
+   on newer and older hardware.  It is already not exposed to the guest,
+   and kept only for live migration purposes.
+Who:   Avi Kivity 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c

2010-02-17 Thread Avi Kivity
From: Jochen Maes 

Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c

Signed-off-by: Jochen Maes 
Signed-off-by: Avi Kivity 
---
 virt/kvm/coalesced_mmio.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5de6594..5169736 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -133,7 +133,7 @@ void kvm_coalesced_mmio_free(struct kvm *kvm)
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
-struct kvm_coalesced_mmio_zone *zone)
+struct kvm_coalesced_mmio_zone *zone)
 {
struct kvm_coalesced_mmio_dev *dev = kvm->coalesced_mmio_dev;
 
@@ -166,7 +166,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
mutex_lock(&kvm->slots_lock);
 
i = dev->nb_zones;
-   while(i) {
+   while (i) {
z = &dev->zone[i - 1];
 
/* unregister all zones
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] KVM: rework of "Fix x86 emulator's fault propagations"

2010-02-17 Thread Avi Kivity

On 02/12/2010 08:50 AM, Takuya Yoshikawa wrote:

This is the rework of "Fix x86 emulator's fault propagations".
   -- http://www.spinics.net/lists/kvm/msg28874.html

I read the review comments from Avi, Marcelo and Gleb and removed
some parts which should be done with more care: descriptor related
part and emulator_sys* part.

Now the contents is like this:
   - patch 1: X86EMUL macro replacements: from do_fetch_insn_byte()
  to x86_decode_insn()
   - patch 2: X86EMUL macro replacements: x86_emulate_insn() and its
  helpers
   - patch 3: Fix x86_emulate_insn() not to use the variable rc for
  non-X86EMUL values
   - patch 4: Tiny fix: remove redundant prototype of load_pdptrs()


   


Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 12:23:39PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote:
> >> On 02/17/2010 12:43 PM, Gleb Natapov wrote:
>  And, again: This is an _existing_ user space ABI. We could only provide
>  an alternative, but we have to maintain what is there at least for some
>  longer grace period.
> 
> >>> But it was always broken for SVM and was broken for VMX for a year and
> >>> nobody noticed, so may be instead of reintroducing old interface we should
> >>> do it right this time?
> >> We need to fix the existing interface first, and then think long and
> >> hard if we want yet another interface, since we're likely to screw
> >> it up as well.
> >>
> >> The more interfaces we introduce, the harder maintenance becomes.
> >>
> > We are in a sad state if we cannot improve interface. The current one
> > outsource part of CPU functionality into userspace. This should be a big
> > no-no.
> 
> I still disagree on this. Moving the decision logic to user space
> prevented to re-implement a gdbstub in kernel space. I oversaw that
> re-injecting #BP over older SVM was broken, but it is now fixed for all
> vendors. So moving it back to kernel has actually no long-term reason.
> 
There were patches to implement gdbstub in kernel space! And not so long
time ago :) But I want to move only a tiny bit of logic into the kernel space.
And #BP reinjection brokenness is a different issue. It should be fixed
anyway no matter where decision about reinfection happens.

If maintainers think that we should not have improved interface and we
should support reinjection of #DB from userspace then this patch should
be applied. I don't have other objections to it. But I, at least, would
prefer the old interface for #DB reinjection (KVM_GUESTDBG_INJECT_DB)
and not the new one. The old one makes it explicit what we are doing,
the new one allows injection of any event and should be used only during
migration or CPU reset. It would be event good idea to fail setting
events if CPU is running.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 12:32:05PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> Gleb Natapov wrote:
>  Lets check if SVM works. I can do that if you tell me how.
> >>> - Fire up some Linux guest with gdb installed
> >>> - Attach gdb to gdbstub of the VM
> >>> - Set a soft breakpoint in guest kernel, ideally where it does not
> >>>   immediately trigger, e.g. on sys_reboot (use grep sys_reboot
> >>>   /proc/kallsyms if you don't have symbols for the guest kernel)
> >>> - Start gdb /bin/true in the guest
> >>> - run
> >>>
> >>> As gdb sets some automatic breakpoints, this already exercises the
> >>> reinjection of #BP.
> >> I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
> >> and it just worked.
> >>
> > I tested it on processor without NextRIP and your test case works there too,
> > but it shouldn't have, so I looked deeper into that and what I see is
> > that GDB outsmart us. It doesn't matter if we inject event before int3
> > inserted by GDB or after it GDB correctly finds breakpoint that
> > triggered and restart instruction correctly. I assume it doesn't use
> > exact match between rip where int3 was inserted and where exceptions
> > triggers.
> 
> At latest when you have two successive breakpoints on single-byte
> instructions, gdb will reach its limits (for it failed earlier, BTW).
> And other debuggers under other OSes may become unhappy as well.
Yes, and that is why I am saying checking with GDB is not a good test.
GDB may work, but it doesn't mean injection works correctly. It took me
some time to write test that finally confused gdb. It was like this:

1: int main(int argc, char **argv)
2: {
3:  if (argc == 1)
4:  goto a;
5:  asm("cmc");
6: a:
7:  asm("cmc");
8:  return 0;
9: }

If you set breakpoint on lines 5 and 7 when breakpoint triggers GDB
thinks it is on line 5.

So can you run int3 test below on master on AMD with NextRIP support?
I doubt the result will be correct.

> 
> > But if I run program below on latest kernel which prints rip
> > where #DB was delivered in dmesg I get different results with and
> > without external breakpoint inserted.
> 
> Does applying v2 of my patch corrects the picture?
> 
Of course, since it now injects #DB at correct address. If exception
will happen during #DB processing thins will go wrong, but we can do
only so much on broken SVM without emulating int3 in software.

> > 
> > int main(int argc, char **argv)
> > {
> > asm("int3");
> > return 0;
> > }
> > 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: inject #UD in 64bit mode from instruction that are not valid there

2010-02-17 Thread Avi Kivity

On 02/11/2010 02:43 PM, Gleb Natapov wrote:

Some instruction are obsolete in a long mode. Inject #UD.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [uq/master] use eventfd for iothread

2010-02-17 Thread Avi Kivity

On 02/11/2010 01:23 AM, Paolo Bonzini wrote:

Signed-off-by: Paolo Bonzini
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] qemu-kvm: prepare for adding eventfd usage to upstream

2010-02-17 Thread Avi Kivity

On 02/11/2010 01:09 AM, Paolo Bonzini wrote:

This patch series morphs the code in qemu-kvm's eventfd so that it looks
like the code in upstream qemu.  Patch 4 is not yet in upstream QEMU,
I'm submitting it first to qemu-kvm to avoid conflicts.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 12:24:19PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote:
> >> On 02/15/2010 03:30 PM, Gleb Natapov wrote:
>  I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
>  and it just worked.
> 
>  But this is a fairly new processor. Consequently, it reports NextRIP
>  support via cpuid function 0x800A. Looking for an older one too.
> 
>  In the meantime I also browsed a bit more in the manuals, and I don't
>  think stepping over or (what is actually required) into an INT3 will
>  work. We can't step into as the processor clears TF on any event handler
>  entry. And stepping over would cause troubles
> 
>  a) as an unknown amount of code may run without #DB interception
>  b) we would fiddle with TF in code that is already under debugger
> control, thus we would very likely run into conflicts.
> 
>  Leaves us with tricky INT3 emulation. Sigh.
> 
> >>> So the question is do we want to support this kind of debugging on older
> >>> AMDs. May we don't.
> >> How much older are they?
> >>
> > Actually I am not sure new AMDs support this correctly. Need one to run
> > tests. GDB is not a good test case, it is too smart.
> 
> It works well - and gdb is far from being "smart": one byte off the
> expected INT3 address, and everything falls apart. That's what the VMX
> bug demonstrated.
> 
Simple test on AMD shows the one byte off doesn't matter for GDB, at least as 
long as
this byte still belong to the same instruction or may be same line of
source code. On VMX something else happens. I can't reproduce problem on
master with VMX since event_exit_inst_len is always 1 when #DB is
reinjected. May be in your test we are much more then 1 byte off on VMX?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Build fix for #define KVM_DEBUG

2010-02-17 Thread Jan Kiszka
Avi Kivity wrote:
> On 02/17/2010 04:41 AM, Tsuyoshi Ozawa wrote:
 shadow_efer was renamed to efer, so this should be modified rather than 
 deleted.

>>> OK. The new patch uses efer instead of deleting shadow_efer
>>>  
>> Excuse me, and what should I do next ?
>>
> 
> Copy Jan - he maintains kvm-kmod, and probably didn't see your patch.
> 

Yes, I did. Proper subject prefixing can help a lot here...

Could you please repost, avoiding that the patch is line-wrapped and
giving it an up-to-date changelog?

TIA,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Gleb Natapov wrote:
 Lets check if SVM works. I can do that if you tell me how.
>>> - Fire up some Linux guest with gdb installed
>>> - Attach gdb to gdbstub of the VM
>>> - Set a soft breakpoint in guest kernel, ideally where it does not
>>>   immediately trigger, e.g. on sys_reboot (use grep sys_reboot
>>>   /proc/kallsyms if you don't have symbols for the guest kernel)
>>> - Start gdb /bin/true in the guest
>>> - run
>>>
>>> As gdb sets some automatic breakpoints, this already exercises the
>>> reinjection of #BP.
>> I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
>> and it just worked.
>>
> I tested it on processor without NextRIP and your test case works there too,
> but it shouldn't have, so I looked deeper into that and what I see is
> that GDB outsmart us. It doesn't matter if we inject event before int3
> inserted by GDB or after it GDB correctly finds breakpoint that
> triggered and restart instruction correctly. I assume it doesn't use
> exact match between rip where int3 was inserted and where exceptions
> triggers.

At latest when you have two successive breakpoints on single-byte
instructions, gdb will reach its limits (for it failed earlier, BTW).
And other debuggers under other OSes may become unhappy as well.

> But if I run program below on latest kernel which prints rip
> where #DB was delivered in dmesg I get different results with and
> without external breakpoint inserted.

Does applying v2 of my patch corrects the picture?

> 
> int main(int argc, char **argv)
> {
> asm("int3");
> return 0;
> }
> 

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote:
>> On 02/15/2010 03:30 PM, Gleb Natapov wrote:
 I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
 and it just worked.

 But this is a fairly new processor. Consequently, it reports NextRIP
 support via cpuid function 0x800A. Looking for an older one too.

 In the meantime I also browsed a bit more in the manuals, and I don't
 think stepping over or (what is actually required) into an INT3 will
 work. We can't step into as the processor clears TF on any event handler
 entry. And stepping over would cause troubles

 a) as an unknown amount of code may run without #DB interception
 b) we would fiddle with TF in code that is already under debugger
control, thus we would very likely run into conflicts.

 Leaves us with tricky INT3 emulation. Sigh.

>>> So the question is do we want to support this kind of debugging on older
>>> AMDs. May we don't.
>> How much older are they?
>>
> Actually I am not sure new AMDs support this correctly. Need one to run
> tests. GDB is not a good test case, it is too smart.

It works well - and gdb is far from being "smart": one byte off the
expected INT3 address, and everything falls apart. That's what the VMX
bug demonstrated.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Jan Kiszka
Gleb Natapov wrote:
> On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote:
>> On 02/17/2010 12:43 PM, Gleb Natapov wrote:
 And, again: This is an _existing_ user space ABI. We could only provide
 an alternative, but we have to maintain what is there at least for some
 longer grace period.

>>> But it was always broken for SVM and was broken for VMX for a year and
>>> nobody noticed, so may be instead of reintroducing old interface we should
>>> do it right this time?
>> We need to fix the existing interface first, and then think long and
>> hard if we want yet another interface, since we're likely to screw
>> it up as well.
>>
>> The more interfaces we introduce, the harder maintenance becomes.
>>
> We are in a sad state if we cannot improve interface. The current one
> outsource part of CPU functionality into userspace. This should be a big
> no-no.

I still disagree on this. Moving the decision logic to user space
prevented to re-implement a gdbstub in kernel space. I oversaw that
re-injecting #BP over older SVM was broken, but it is now fixed for all
vendors. So moving it back to kernel has actually no long-term reason.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 01:13:29PM +0200, Avi Kivity wrote:
> On 02/17/2010 12:43 PM, Gleb Natapov wrote:
> >>And, again: This is an _existing_ user space ABI. We could only provide
> >>an alternative, but we have to maintain what is there at least for some
> >>longer grace period.
> >>
> >But it was always broken for SVM and was broken for VMX for a year and
> >nobody noticed, so may be instead of reintroducing old interface we should
> >do it right this time?
> 
> We need to fix the existing interface first, and then think long and
> hard if we want yet another interface, since we're likely to screw
> it up as well.
> 
> The more interfaces we introduce, the harder maintenance becomes.
> 
We are in a sad state if we cannot improve interface. The current one
outsource part of CPU functionality into userspace. This should be a big
no-no.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 01:11:36PM +0200, Avi Kivity wrote:
> On 02/15/2010 03:30 PM, Gleb Natapov wrote:
> >
> >>I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
> >>and it just worked.
> >>
> >>But this is a fairly new processor. Consequently, it reports NextRIP
> >>support via cpuid function 0x800A. Looking for an older one too.
> >>
> >>In the meantime I also browsed a bit more in the manuals, and I don't
> >>think stepping over or (what is actually required) into an INT3 will
> >>work. We can't step into as the processor clears TF on any event handler
> >>entry. And stepping over would cause troubles
> >>
> >>a) as an unknown amount of code may run without #DB interception
> >>b) we would fiddle with TF in code that is already under debugger
> >>control, thus we would very likely run into conflicts.
> >>
> >>Leaves us with tricky INT3 emulation. Sigh.
> >>
> >So the question is do we want to support this kind of debugging on older
> >AMDs. May we don't.
> 
> How much older are they?
> 
Actually I am not sure new AMDs support this correctly. Need one to run
tests. GDB is not a good test case, it is too smart.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Avi Kivity

On 02/17/2010 12:43 PM, Gleb Natapov wrote:

And, again: This is an _existing_ user space ABI. We could only provide
an alternative, but we have to maintain what is there at least for some
longer grace period.

 

But it was always broken for SVM and was broken for VMX for a year and
nobody noticed, so may be instead of reintroducing old interface we should
do it right this time?
   


We need to fix the existing interface first, and then think long and 
hard if we want yet another interface, since we're likely to screw it up 
as well.


The more interfaces we introduce, the harder maintenance becomes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Avi Kivity

On 02/15/2010 03:30 PM, Gleb Natapov wrote:



I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
and it just worked.

But this is a fairly new processor. Consequently, it reports NextRIP
support via cpuid function 0x800A. Looking for an older one too.

In the meantime I also browsed a bit more in the manuals, and I don't
think stepping over or (what is actually required) into an INT3 will
work. We can't step into as the processor clears TF on any event handler
entry. And stepping over would cause troubles

a) as an unknown amount of code may run without #DB interception
b) we would fiddle with TF in code that is already under debugger
control, thus we would very likely run into conflicts.

Leaves us with tricky INT3 emulation. Sigh.

 

So the question is do we want to support this kind of debugging on older
AMDs. May we don't.
   


How much older are they?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Mon, Feb 15, 2010 at 02:20:31PM +0100, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Gleb Natapov wrote:
> >> Lets check if SVM works. I can do that if you tell me how.
> > 
> > - Fire up some Linux guest with gdb installed
> > - Attach gdb to gdbstub of the VM
> > - Set a soft breakpoint in guest kernel, ideally where it does not
> >   immediately trigger, e.g. on sys_reboot (use grep sys_reboot
> >   /proc/kallsyms if you don't have symbols for the guest kernel)
> > - Start gdb /bin/true in the guest
> > - run
> > 
> > As gdb sets some automatic breakpoints, this already exercises the
> > reinjection of #BP.
> 
> I just did this on our primary AMD platform (Embedded Opteron, 13KS EE),
> and it just worked.
> 
I tested it on processor without NextRIP and your test case works there too,
but it shouldn't have, so I looked deeper into that and what I see is
that GDB outsmart us. It doesn't matter if we inject event before int3
inserted by GDB or after it GDB correctly finds breakpoint that
triggered and restart instruction correctly. I assume it doesn't use
exact match between rip where int3 was inserted and where exceptions
triggers. But if I run program below on latest kernel which prints rip
where #DB was delivered in dmesg I get different results with and
without external breakpoint inserted.

int main(int argc, char **argv)
{
asm("int3");
return 0;
}

> But this is a fairly new processor. Consequently, it reports NextRIP
> support via cpuid function 0x800A. Looking for an older one too.
> 
> In the meantime I also browsed a bit more in the manuals, and I don't
> think stepping over or (what is actually required) into an INT3 will
> work. We can't step into as the processor clears TF on any event handler
> entry. And stepping over would cause troubles
> 
> a) as an unknown amount of code may run without #DB interception
> b) we would fiddle with TF in code that is already under debugger
>control, thus we would very likely run into conflicts.
> 
> Leaves us with tricky INT3 emulation. Sigh.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recommended network driver for a windows KVM guest

2010-02-17 Thread carlopmart

Hi all,

 I need to install several windows KVM (rhel5.4 host fully updated) guests for 
iSCSI boot. iSCSI servers are Solaris/OpenSolaris storage servers and I need to boot 
windows guests (2008R2 and Win7) using gpxe. Can i use virtio net dirver during 
windows install or e1000 driver??


Many thanks.
--
CL Martinez
carlopmart {at} gmail {d0t} com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Build fix for #define KVM_DEBUG

2010-02-17 Thread Avi Kivity

On 02/17/2010 04:41 AM, Tsuyoshi Ozawa wrote:

shadow_efer was renamed to efer, so this should be modified rather than deleted.
   

OK. The new patch uses efer instead of deleting shadow_efer
 

Excuse me, and what should I do next ?
   


Copy Jan - he maintains kvm-kmod, and probably didn't see your patch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Update instruction length on intercepted BP

2010-02-17 Thread Gleb Natapov
On Tue, Feb 16, 2010 at 10:11:06AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Feb 16, 2010 at 09:05:40AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Feb 15, 2010 at 03:53:04PM +0100, Jan Kiszka wrote:
>  We intercept #BP while in guest debugging mode. As VM exits due to
>  intercepted exceptions do not necessarily come with valid
>  idt_vectoring, we have to update event_exit_inst_len explicitly in such
>  cases. At least in the absence of migration, this ensures that
>  re-injections of #BP will find and use the correct instruction length.
> 
> >>> Thinking about it some more. Why do we exit to userspace at all if we
> >>> intercept wrong #DB? It seams to me not wise to have ability to inject
> >>> exceptions from userspace. Exceptions generation mechanism is a part of
> >>> CPU and we shouldn't outsource part of CPU functionality to userspace.
> >> The guest debugging API was design to avoid maintaining a "countless"
> >> number of breakpoints in kernel space and instead chose to loop over
> >> user space to decide about #DB & #BP. So this part is required even if
> >> we start thinking about an alternative interface in the future.
> >>
> > How much is "countless"? 1? I am sure we can handle this.
> 
> We could even handle more. But would have to
>  - handle INT3 injection in kernel space, including step-over on resume
>  - fully parse HW breakpoints in kernel space
>  - probably deal with some more complications that are now handled in
>user space, part of them even in gdb
> 
The first point in this list is needed no anyway, no matter who reinjects
#BP event.  About point three what are those complications? As far as
I see all we need to know in kernel is a list of cr3:address pairs that
have breakpoint set. If #BP intercept happens we scan this list and if
match is not found reinject event to the guest otherwise exit to
userspace.

> And, again: This is an _existing_ user space ABI. We could only provide
> an alternative, but we have to maintain what is there at least for some
> longer grace period.
> 
But it was always broken for SVM and was broken for VMX for a year and
nobody noticed, so may be instead of reintroducing old interface we should
do it right this time?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] QEMU: Balloon support for device assignment

2010-02-17 Thread Avi Kivity

On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote:

From: Eran Borovik

This patch adds modifications to allow correct
balloon operation when a virtual guest uses a direct assigned device.
The modifications include a new interface between qemu and kvm to allow
mapping and unmapping the pages from the IOMMU as well as pinning and unpinning 
as needed.
   


Note, on reset we deflate the balloon completely, since the BIOS and 
boot loader (and possibly the OS post-reboot) are not aware of ballooning.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Balloon support for device assignment

2010-02-17 Thread Avi Kivity

On 02/17/2010 11:43 AM, bor...@il.ibm.com wrote:

From: Eran Borovik

This patch adds modifications to allow correct
balloon operation when a virtual guest uses a direct assigned device.
The modifications include a new interface between qemu and kvm to allow
mapping and unmapping the pages from the IOMMU as well as pinning and unpinning 
as needed.
   


The plan for iommu support is to push it into uio.  Instead of kvm 
managing the iommu directly, I'd like qemu to open a uio device and set 
up an iommu mapping there, which will just happen to match the kvm 
memory slots.  Similarly, interrupts will be forwarded using irqfds.  
This will allow using the iommu without kvm, and reduce the amount of 
special purpose kvm code.


These patches make the transition more difficult which worries me.  I 
know Gerd looked at making the move, but no longer.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 10:47, Avi Kivity wrote:

> On 02/17/2010 11:42 AM, OHMURA Kei wrote:
> "We think"? I mean - yes, I think so too. But have you actually measured 
> it?
> How much improvement are we talking here?
> Is it still faster when a bswap is involved?
 Thanks for pointing out.
 I will post the data for x86 later.
 However, I don't have a test environment to check the impact of bswap.
 Would you please measure the run time between the following section if 
 possible?
>>> 
>>> It'd make more sense to have a real stand alone test program, no?
>>> I can try to write one today, but I have some really nasty important bugs 
>>> to fix first.
>> 
>> 
>> OK.  I will prepare a test code with sample data.  Since I found a ppc 
>> machine around, I will run the code and post the results of
>> x86 and ppc.
>> 
> 
> I've applied the patch - I think the x86 results justify it, and I'll be very 
> surprised if ppc doesn't show a similar gain.  Skipping 7 memory accesses and 
> 7 tests must be a win.

Sounds good to me. I don't assume bswap to be horribly slow either. Just want 
to be sure.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Avi Kivity

On 02/17/2010 11:42 AM, OHMURA Kei wrote:
"We think"? I mean - yes, I think so too. But have you actually 
measured it?

How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section 
if possible?


It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important 
bugs to fix first.



OK.  I will prepare a test code with sample data.  Since I found a ppc 
machine around, I will run the code and post the results of

x86 and ppc.



I've applied the patch - I think the x86 results justify it, and I'll be 
very surprised if ppc doesn't show a similar gain.  Skipping 7 memory 
accesses and 7 tests must be a win.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] KVM: Balloon support for device assignment

2010-02-17 Thread borove
From: Eran Borovik 

This patch adds modifications to allow correct
balloon operation when a virtual guest uses a direct assigned device.
The modifications include a new interface between qemu and kvm to allow
mapping and unmapping the pages from the IOMMU as well as pinning and unpinning 
as needed.

Signed-off-by: Eran Borovik 
---
 include/linux/kvm.h  |3 ++
 include/linux/kvm_host.h |4 ++
 virt/kvm/iommu.c |   86 +++--
 virt/kvm/kvm_main.c  |9 +
 4 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..567f5f8 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -514,6 +514,9 @@ struct kvm_irqfd {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO, 0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
+#define KVM_IOMMU_UNMAP_PAGE  _IOW(KVMIO, 0x49, __u64)
+#define KVM_IOMMU_MAP_PAGE  _IOW(KVMIO, 0x50, __u64)
+
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP   _IO(KVMIO,  0x60)
 #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b7bbb5d..ad904ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -411,6 +411,10 @@ int kvm_assign_device(struct kvm *kvm,
  struct kvm_assigned_dev_kernel *assigned_dev);
 int kvm_deassign_device(struct kvm *kvm,
struct kvm_assigned_dev_kernel *assigned_dev);
+void kvm_iommu_unmap_page(struct kvm *kvm,
+ gfn_t base_gfn);
+int kvm_iommu_map_page(struct kvm *kvm,
+  gfn_t base_gfn);
 #else /* CONFIG_IOMMU_API */
 static inline int kvm_iommu_map_pages(struct kvm *kvm,
  gfn_t base_gfn,
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 1514758..54cfd33 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -190,23 +190,101 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
gfn_t gfn = base_gfn;
pfn_t pfn;
struct iommu_domain *domain = kvm->arch.iommu_domain;
-   unsigned long i;
+   unsigned long i, iommu_pages;
u64 phys;
 
/* check if iommu exists and in use */
if (!domain)
return;
 
-   for (i = 0; i < npages; i++) {
+   for (i = 0, iommu_pages = 0; i < npages; i++, gfn++) {
phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
+
+   /*Because of ballooning, there can be holes in the
+ range. In that case, we simply unmap everything
+ till now, and continue forward.
+   */
+   if (!phys) {
+
+   /*No consecutive IOMMU pages here*/
+   if (iommu_pages == 0)
+   continue;
+   iommu_unmap_range(domain,
+  gfn_to_gpa(base_gfn),
+  PAGE_SIZE*iommu_pages);
+
+   /*Reset consequtive iommu range counters*/
+   base_gfn = gfn + 1;
+   iommu_pages = 0;
+   continue;
+   }
pfn = phys >> PAGE_SHIFT;
kvm_release_pfn_clean(pfn);
-   gfn++;
+   ++iommu_pages;
}
 
-   iommu_unmap_range(domain, gfn_to_gpa(base_gfn), PAGE_SIZE * npages);
+   /*Unmap the last iommu range if any*/
+   if (iommu_pages != 0)
+   iommu_unmap_range(domain,
+  gfn_to_gpa(base_gfn),
+  PAGE_SIZE * iommu_pages);
+}
+
+/*Called to map a page from IOMMU */
+int kvm_iommu_map_page(struct kvm *kvm,
+  gfn_t base_gfn)
+{
+   gfn_t gfn = base_gfn;
+   pfn_t pfn;
+   struct iommu_domain *domain = kvm->arch.iommu_domain;
+   u64 phys;
+   int rc;
+   int flags;
+
+   /* check if iommu exists and in use */
+   if (!domain)
+   return 0;
+   phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn));
+
+   /*Verify addres is not mapped already*/
+   if (phys)
+   return 0;
+   flags = IOMMU_READ | IOMMU_WRITE;
+   if (kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY)
+   flags |= IOMMU_CACHE;
+   pfn = gfn_to_pfn(kvm, gfn);
+   rc = iommu_map_range(domain,
+   gfn_to_gpa(gfn),
+   pfn_to_hpa(pfn),
+   PAGE_SIZE, flags);
+   return rc;
+}
+
+
+
+/*Called to unmap a page from IOMMU */
+void kvm_iommu_unmap_page(struct kvm *kvm,
+ gfn_t base_gfn)
+{
+   gfn_t gfn = base_gfn;
+   pfn_t pfn;
+   struct iommu_domain *domain = kvm->arch.iommu_domain;
+   u64 phys;
+
+   /* check

Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 10:42, OHMURA Kei wrote:

 "We think"? I mean - yes, I think so too. But have you actually measured 
 it?
 How much improvement are we talking here?
 Is it still faster when a bswap is involved?
>>> Thanks for pointing out.
>>> I will post the data for x86 later.
>>> However, I don't have a test environment to check the impact of bswap.
>>> Would you please measure the run time between the following section if 
>>> possible?
>> It'd make more sense to have a real stand alone test program, no?
>> I can try to write one today, but I have some really nasty important bugs to 
>> fix first.
> 
> 
> OK.  I will prepare a test code with sample data.  Since I found a ppc 
> machine around, I will run the code and post the results of
> x86 and ppc.
> 
> 
> By the way, the following data is a result of x86 measured in QEMU/KVM.  
> This data shows, how many times the function is called (#called), runtime of 
> original function(orig.), runtime of this patch(patch), speedup ratio (ratio).

That does indeed look promising!

Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
doesn't affect performance for big endian badly.


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] QEMU: Balloon support for device assignment

2010-02-17 Thread borove
From: Eran Borovik 

This patch adds modifications to allow correct
balloon operation when a virtual guest uses a direct assigned device.
The modifications include a new interface between qemu and kvm to allow
mapping and unmapping the pages from the IOMMU as well as pinning and unpinning 
as needed.

Signed-off-by: Eran Borovik 
---
 hw/virtio-balloon.c |   13 ++---
 kvm/include/linux/kvm.h |2 ++
 kvm/libkvm/libkvm.h |4 
 qemu-kvm.c  |   10 ++
 qemu-kvm.h  |4 
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 3792012..337f717 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -132,6 +132,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 elem.out_sg, elem.out_num) == 4) {
 ram_addr_t pa;
 ram_addr_t addr;
+   bool deflate;
 
 pa = (ram_addr_t)ldl_p(&pfn) << VIRTIO_BALLOON_PFN_SHIFT;
 offset += 4;
@@ -139,12 +140,18 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 addr = cpu_get_physical_page_desc(pa);
 if ((addr & ~TARGET_PAGE_MASK) != IO_MEM_RAM)
 continue;
+   deflate = !!(vq == s->dvq);
+#  ifdef KVM_CAP_DEVICE_ASSIGNMENT
+   if (deflate)
+   kvm_map_pfn(NULL, pfn);
+   else
+   kvm_unmap_pfn(NULL, pfn);
+#  endif
 
 /* Using qemu_get_ram_ptr is bending the rules a bit, but
should be OK because we only want a single page.  */
-balloon_page(qemu_get_ram_ptr(addr), !!(vq == s->dvq));
-}
-
+   balloon_page(qemu_get_ram_ptr(addr), deflate);
+   }
 virtqueue_push(vq, &elem, offset);
 virtio_notify(vdev, vq);
 }
diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index 6485981..90f7723 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -595,6 +595,8 @@ struct kvm_clock_data {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_IOMMU_UNMAP_PAGE  _IOW(KVMIO, 0x49, __u64)
+#define KVM_IOMMU_MAP_PAGE  _IOW(KVMIO, 0x50, __u64)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE  _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index 4821a1e..7fa83b5 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -714,6 +714,10 @@ int kvm_s390_store_status(kvm_context_t kvm, int slot, 
unsigned long addr);
 int kvm_assign_pci_device(kvm_context_t kvm,
  struct kvm_assigned_pci_dev *assigned_dev);
 
+int kvm_deflate_pfn(kvm_context_t kvm, uint32_t pfn);
+
+int kvm_inflate_pfn(kvm_context_t kvm, uint32_t pfn);
+
 /*!
  * \brief Assign IRQ for an assigned device
  *
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..a5ca029 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1081,6 +1081,16 @@ static int kvm_old_assign_irq(kvm_context_t kvm,
 return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_IRQ, assigned_irq);
 }
 
+int kvm_unmap_pfn(kvm_context_t kvm, uint32_t pfn)
+{
+   return kvm_vm_ioctl(kvm_state, KVM_IOMMU_UNMAP_PAGE, pfn);
+}
+
+int kvm_map_pfn(kvm_context_t kvm, uint32_t pfn)
+{
+   return kvm_vm_ioctl(kvm_state, KVM_IOMMU_MAP_PAGE, pfn);
+}
+
 #ifdef KVM_CAP_ASSIGN_DEV_IRQ
 int kvm_assign_irq(kvm_context_t kvm, struct kvm_assigned_irq *assigned_irq)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6b3e5a1..861c336 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -691,6 +691,10 @@ int kvm_s390_store_status(kvm_context_t kvm, int slot, 
unsigned long addr);
 int kvm_assign_pci_device(kvm_context_t kvm,
   struct kvm_assigned_pci_dev *assigned_dev);
 
+int kvm_unmap_pfn(kvm_context_t kvm, uint32_t pfn);
+
+int kvm_map_pfn(kvm_context_t kvm, uint32_t pfn);
+
 /*!
  * \brief Assign IRQ for an assigned device
  *
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Balloon support for device assignment

2010-02-17 Thread borove
Currently device assignment forces pinning the entire guest memory. The 
following kernel and qemu patches add
balloon support for device assignment. When the balloon inflates, the 
corresponding pages are unmapped from the IOMMU and unpinned, and accordingly 
they are remapped and pinned  when the balloon deflates.

The kernel patch applies to tag v2.6.32
Comments appreciated.

Regards,
Eran.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread OHMURA Kei

"We think"? I mean - yes, I think so too. But have you actually measured it?
How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?


It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to 
fix first.



OK.  I will prepare a test code with sample data.  
Since I found a ppc machine around, I will run the code and post the results of

x86 and ppc.


By the way, the following data is a result of x86 measured in QEMU/KVM.  

This data shows, how many times the function is called (#called), runtime of 
original function(orig.), runtime of this patch(patch), speedup ratio (ratio).


Test1: Guest OS read 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
108 1.1 0.1 7.6
102 1.0 0.1 6.8
132 1.6 0.2 7.1

Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
239433  7.7 4.3
210029  7.1 4.1
283240  9.9 4.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: pcnet APROMWE bit location

2010-02-17 Thread Avi Kivity

On 02/14/2010 09:30 AM, Chris Kilgour wrote:

I don't subscribe to the list, so please excuse any breach of etiquette.

According to AMD document 21485D pp.141, APROMWE is bit 8 of BCR2.
   


Please send this to the qemu mailing list, qemu-de...@nongnu.org, as 
this code is shared between qemu and qemu-kvm.  Thanks.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: use desc_ptr struct instead of kvm private descriptor_table

2010-02-17 Thread Avi Kivity

On 02/16/2010 10:51 AM, Gleb Natapov wrote:

x86 arch defines desc_ptr for idt/gdt pointers, no need to define
another structure in kvm code.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: add doc note about PIO/MMIO completion API

2010-02-17 Thread Avi Kivity

On 02/14/2010 10:17 AM, Avi Kivity wrote:

--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -820,6 +820,11 @@ executed a memory-mapped I/O instruction which 
could not be satisfied

  by kvm.  The 'data' member contains the written data if 'is_write' is
  true, and should be filled by application code otherwise.

+NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
+are complete (and guest state is consistent) only after userspace has
+re-entered the kernel with KVM_RUN. The kernel side must first finish
+uncomplete operations and then check for pending signals.
+



Well, s/must/will/, the document is written from userspace's point of 
view.




Applied with this change, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Gleb Natapov
On Wed, Feb 17, 2010 at 10:03:58AM +0100, Jan Kiszka wrote:
> > 
> > Also, as Avi mentioned it would be better to avoid this. Is it not
> > possible to disallow migration while interrupt shadow is present?
> 
> Which means disallowing user space exists while the shadow it set? Or
> should we introduce some flag for user space that tells it "do not
> migration now, resume the guest till next exit"?
> 
I think disabling migration is a slippery slope. Guest may abuse it. May
be it will be hard to do with interrupt shadow, but the mechanism will be
used for other cases too. I remember there was an argument that we
should not migrate while vcpu is in a nested guest mode.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Jan Kiszka
Zachary Amsden wrote:
> On 02/16/2010 02:39 PM, Marcelo Tosatti wrote:
>> On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote:
>>
>>> The interrupt shadow created by STI or MOV-SS-like operations is part of
>>> the VCPU state and must be preserved across migration. Transfer it in
>>> the spare padding field of kvm_vcpu_events.interrupt.
> 
> STI and MOV-SS interrupt shadow are both treated differently by 
> hardware.  Any attempt to unify them into a single field is wrong, 
> especially so in a hardware virtualization context, where they are 
> actually represented by different fields in the undocumented but 
> nevertheless extant format that can be inferred from the hardware 
> virtualization context used by specific vendors.

Someone should ask AMD why they thought differently about this while
designing SVM...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Gleb Natapov
On Tue, Feb 16, 2010 at 10:06:12PM -1000, Zachary Amsden wrote:
> On 02/16/2010 02:39 PM, Marcelo Tosatti wrote:
> >On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote:
> >>The interrupt shadow created by STI or MOV-SS-like operations is part of
> >>the VCPU state and must be preserved across migration. Transfer it in
> >>the spare padding field of kvm_vcpu_events.interrupt.
> 
> STI and MOV-SS interrupt shadow are both treated differently by
> hardware.  Any attempt to unify them into a single field is wrong,
> especially so in a hardware virtualization context, where they are
> actually represented by different fields in the undocumented but
> nevertheless extant format that can be inferred from the hardware
> virtualization context used by specific vendors.
> 
The problem is SVM doesn't distinguish between those two. But we shouldn't
design out interfaces based on SVM brokenness.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Jan Kiszka
Marcelo Tosatti wrote:
> On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote:
>> The interrupt shadow created by STI or MOV-SS-like operations is part of
>> the VCPU state and must be preserved across migration. Transfer it in
>> the spare padding field of kvm_vcpu_events.interrupt.
>>
>> As a side effect we now have to make vmx_set_interrupt_shadow robust
>> against both shadow types being set. Give MOV SS a higher priority and
>> skip STI in that case to avoid that VMX throws a fault on next entry.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  Documentation/kvm/api.txt  |   11 ++-
>>  arch/x86/include/asm/kvm.h |3 ++-
>>  arch/x86/kvm/vmx.c |2 +-
>>  arch/x86/kvm/x86.c |   12 ++--
>>  include/linux/kvm.h|1 +
>>  5 files changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
>> index c6416a3..8770b67 100644
>> --- a/Documentation/kvm/api.txt
>> +++ b/Documentation/kvm/api.txt
>> @@ -656,6 +656,7 @@ struct kvm_clock_data {
>>  4.29 KVM_GET_VCPU_EVENTS
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>> +Extended by: KVM_CAP_INTR_SHADOW
>>  Architectures: x86
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>> @@ -676,7 +677,7 @@ struct kvm_vcpu_events {
>>  __u8 injected;
>>  __u8 nr;
>>  __u8 soft;
>> -__u8 pad;
>> +__u8 shadow;
>>  } interrupt;
>>  struct {
>>  __u8 injected;
>> @@ -688,9 +689,13 @@ struct kvm_vcpu_events {
>>  __u32 flags;
>>  };
>>  
>> +KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
>> +interrupt.shadow contains a valid state. Otherwise, this field is undefined.
>> +
>>  4.30 KVM_SET_VCPU_EVENTS
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>> +Extended by: KVM_CAP_INTR_SHADOW
>>  Architectures: x86
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>> @@ -709,6 +714,10 @@ current in-kernel state. The bits are:
>>  KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
>>  KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
>>  
>> +If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set 
>> in
>> +the flags field to signal that interrupt.shadow contains a valid state and
>> +shall be written into the VCPU.
>> +
>>  
>>  5. The kvm_run structure
>>  
>> diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
>> index f46b79f..dc6cd24 100644
>> --- a/arch/x86/include/asm/kvm.h
>> +++ b/arch/x86/include/asm/kvm.h
>> @@ -257,6 +257,7 @@ struct kvm_reinject_control {
>>  /* When set in flags, include corresponding fields on KVM_SET_VCPU_EVENTS */
>>  #define KVM_VCPUEVENT_VALID_NMI_PENDING 0x0001
>>  #define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x0002
>> +#define KVM_VCPUEVENT_VALID_SHADOW  0x0004
>>  
>>  /* for KVM_GET/SET_VCPU_EVENTS */
>>  struct kvm_vcpu_events {
>> @@ -271,7 +272,7 @@ struct kvm_vcpu_events {
>>  __u8 injected;
>>  __u8 nr;
>>  __u8 soft;
>> -__u8 pad;
>> +__u8 shadow;
>>  } interrupt;
>>  struct {
>>  __u8 injected;
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f82b072..0fa74d0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -854,7 +854,7 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu 
>> *vcpu, int mask)
>>  
>>  if (mask & X86_SHADOW_INT_MOV_SS)
>>  interruptibility |= GUEST_INTR_STATE_MOV_SS;
>> -if (mask & X86_SHADOW_INT_STI)
>> +else if (mask & X86_SHADOW_INT_STI)
>>  interruptibility |= GUEST_INTR_STATE_STI;
>>  
>>  if ((interruptibility != interruptibility_old))
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 50d1d2a..60e6341 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2132,6 +2132,9 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
>> kvm_vcpu *vcpu,
>>  vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft;
>>  events->interrupt.nr = vcpu->arch.interrupt.nr;
>>  events->interrupt.soft = 0;
>> +events->interrupt.shadow =
>> +!!kvm_x86_ops->get_interrupt_shadow(vcpu,
>> +X86_SHADOW_INT_MOV_SS | X86_SHADOW_INT_STI);
>>  
>>  events->nmi.injected = vcpu->arch.nmi_injected;
>>  events->nmi.pending = vcpu->arch.nmi_pending;
>> @@ -2140,7 +2143,8 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
>> kvm_vcpu *vcpu,
>>  events->sipi_vector = vcpu->arch.sipi_vector;
>>  
>>  events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
>> - | KVM_VCPUEVENT_VALID_SIPI_VECTOR);
>> + | KVM_VCPUEVENT_VALID_SIPI_VECTOR
>> + | KVM_VCPUEVENT_VALID_SHADOW);
>>  
>>  vcpu_put(vcpu);
>>  }
>> @@ -2149,7 +2153,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
>> kvm_vcpu *vcpu,
>>  

Re: [PATCH 2/3] KVM: x86: Save&restore interrupt shadow mask

2010-02-17 Thread Zachary Amsden

On 02/16/2010 02:39 PM, Marcelo Tosatti wrote:

On Mon, Feb 15, 2010 at 10:45:42AM +0100, Jan Kiszka wrote:
   

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.


STI and MOV-SS interrupt shadow are both treated differently by 
hardware.  Any attempt to unify them into a single field is wrong, 
especially so in a hardware virtualization context, where they are 
actually represented by different fields in the undocumented but 
nevertheless extant format that can be inferred from the hardware 
virtualization context used by specific vendors.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html