Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 25.01.2012 17:00, schrieb Joerg Roedel:
 On Tue, Jan 24, 2012 at 06:23:50PM +0200, Gleb Natapov wrote:
 On Tue, Jan 24, 2012 at 03:24:50PM +0100, Kevin Wolf wrote:
 
 However, task_switch_interception() itself does some more based on the
 value of reason, for example it decides whether or not to call
 skip_emulated_instruction().

 Joerg need to help us here. If intercept of task switch happens before
 rip is advanced past instruction that cause it we have to know somehow
 that task switch was caused by instruction. It is not enough that HW
 checks permission, we still lack essential info.
 
 Hmm, the RIP in the VMCB points to the instruction causing the task
 switch. This is also true for lcall and ljmp. But in my experiments I
 have seen exit_int_info.valid = 1 for task-switches that went through
 the IDT. But I havn't tested the VM86 case, though.
 
 Kevin, can you please re-verify that exit_int_info.valid is always 0 in
 your experiment? On what hardware have you tested this?

I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
the tree patches of this series plus a printk to output exit_int_info in
task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
two failures and my VM86 unit test which hung when trying to return from
VM86. I also ran the kernel that made me aware of the issue initially.
All debug messages show exit_int_info = 0.

This is the /proc/cpuinfo snippet for the first core:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
stepping: 2
cpu MHz : 1800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips: 3592.64
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Joerg Roedel
On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
 Am 25.01.2012 17:00, schrieb Joerg Roedel:

 I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
 the tree patches of this series plus a printk to output exit_int_info in
 task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
 two failures and my VM86 unit test which hung when trying to return from
 VM86. I also ran the kernel that made me aware of the issue initially.
 All debug messages show exit_int_info = 0.

Okay, you are testing on a K8 which has exactly this bug. As I just
found out it is documented as erratum 701. The good news is that this
only happens on K8 and Fam11h, any later AMD processor doesn't have this
bug.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Adjusting kvm-clock more then 11% (9315508 vs 9311354)

2012-01-27 Thread Sasha Levin
Hi all,

I've started getting the following warnings while running a vm under load, 
starting with 3.3-rc1:

[ 5367.103017] [ cut here ]
[ 5367.103082] WARNING: at kernel/time/timekeeping.c:863 do_timer+0x360/0x4d0()
[ 5367.103082] Adjusting kvm-clock more then 11% (9315508 vs 9311354)
[ 5367.103082] Pid: 2775, comm: trinity Not tainted 
3.3.0-rc1-next-20120125-sasha-3-gb7da216 #148
[ 5367.103082] Call Trace:
[ 5367.103082]  IRQ  [810ac265] warn_slowpath_common+0x75/0xb0
[ 5367.103082]  [810ac301] warn_slowpath_fmt+0x41/0x50
[ 5367.103082]  [81100690] do_timer+0x360/0x4d0
[ 5367.103082]  [811077c0] ? tick_nohz_handler+0x100/0x100
[ 5367.103082]  [811077c0] ? tick_nohz_handler+0x100/0x100
[ 5367.103082]  [81107665] tick_do_update_jiffies64+0x75/0xd0
[ 5367.103082]  [81107870] tick_sched_timer+0xb0/0xc0
[ 5367.103082]  [810d4161] __run_hrtimer.clone.24+0x81/0x130
[ 5367.103082]  [810d4cbf] hrtimer_interrupt+0xef/0x220
[ 5367.103082]  [8106d693] smp_apic_timer_interrupt+0x63/0xa0
[ 5367.103082]  [8263e833] apic_timer_interrupt+0x73/0x80
[ 5367.103082]  EOI  [81053943] ? sched_clock+0x13/0x20
[ 5367.103082]  [8110d7eb] ? lock_acquired+0x13b/0x320
[ 5367.103082]  [81238033] ? task_dumpable+0x23/0x70
[ 5367.103082]  [8263c3b7] _raw_spin_lock+0x67/0x70
[ 5367.103082]  [81238033] ? task_dumpable+0x23/0x70
[ 5367.103082]  [81238033] task_dumpable+0x23/0x70
[ 5367.103082]  [8123bf2c] proc_pid_make_inode+0x6c/0x1f0
[ 5367.103082]  [8123c18f] proc_pident_instantiate+0x1f/0xc0
[ 5367.103082]  [8123ccad] proc_fill_cache+0x13d/0x170
[ 5367.103082]  [8118ff1e] ? might_fault+0x4e/0xa0
[ 5367.103082]  [8123c170] ? proc_pid_instantiate+0xc0/0xc0
[ 5367.103082]  [811e5010] ? filldir64+0xf0/0xf0
[ 5367.103082]  [811e5010] ? filldir64+0xf0/0xf0
[ 5367.103082]  [8123cdbb] proc_pident_readdir+0xdb/0x1c0
[ 5367.103082]  [811e5010] ? filldir64+0xf0/0xf0
[ 5367.103082]  [8123cef6] proc_tgid_base_readdir+0x16/0x20
[ 5367.103082]  [811e5270] vfs_readdir+0xb0/0xd0
[ 5367.103082]  [811d216f] ? fget+0xaf/0x2d0
[ 5367.103082]  [811d20fc] ? fget+0x3c/0x2d0
[ 5367.103082]  [811e5374] sys_getdents+0x84/0xf0
[ 5367.103082]  [8263dcf9] system_call_fastpath+0x16/0x1b
[ 5367.103082] ---[ end trace aeeafe4ebf839531 ]---

Usually the lower part of the stack changes and is usually the result of 
calling some syscall, but the part from the EOI and up is the same.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 27.01.2012 14:34, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
 Am 25.01.2012 17:00, schrieb Joerg Roedel:
 
 I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
 the tree patches of this series plus a printk to output exit_int_info in
 task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
 two failures and my VM86 unit test which hung when trying to return from
 VM86. I also ran the kernel that made me aware of the issue initially.
 All debug messages show exit_int_info = 0.
 
 Okay, you are testing on a K8 which has exactly this bug. As I just
 found out it is documented as erratum 701. The good news is that this
 only happens on K8 and Fam11h, any later AMD processor doesn't have this
 bug.

Meh. Unless you give me a newer processor, this doesn't really help
me... Doesn't look like there's any way to get a workaround, is there? I
guess I'll have to hack it locally and possibly break other guests with
the hacked module.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Joerg Roedel
On Fri, Jan 27, 2012 at 02:55:12PM +0100, Kevin Wolf wrote:
 Am 27.01.2012 14:34, schrieb Joerg Roedel:
  On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
  Am 25.01.2012 17:00, schrieb Joerg Roedel:
  
  I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
  the tree patches of this series plus a printk to output exit_int_info in
  task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
  two failures and my VM86 unit test which hung when trying to return from
  VM86. I also ran the kernel that made me aware of the issue initially.
  All debug messages show exit_int_info = 0.
  
  Okay, you are testing on a K8 which has exactly this bug. As I just
  found out it is documented as erratum 701. The good news is that this
  only happens on K8 and Fam11h, any later AMD processor doesn't have this
  bug.
 
 Meh. Unless you give me a newer processor, this doesn't really help
 me... Doesn't look like there's any way to get a workaround, is there? I
 guess I'll have to hack it locally and possibly break other guests with
 the hacked module.

No, unfortunatly there is no workaround for this problem. How do you
plan to hack around it?


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 27.01.2012 15:17, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 02:55:12PM +0100, Kevin Wolf wrote:
 Am 27.01.2012 14:34, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
 Am 25.01.2012 17:00, schrieb Joerg Roedel:

 I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
 the tree patches of this series plus a printk to output exit_int_info in
 task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
 two failures and my VM86 unit test which hung when trying to return from
 VM86. I also ran the kernel that made me aware of the issue initially.
 All debug messages show exit_int_info = 0.

 Okay, you are testing on a K8 which has exactly this bug. As I just
 found out it is documented as erratum 701. The good news is that this
 only happens on K8 and Fam11h, any later AMD processor doesn't have this
 bug.

 Meh. Unless you give me a newer processor, this doesn't really help
 me... Doesn't look like there's any way to get a workaround, is there? I
 guess I'll have to hack it locally and possibly break other guests with
 the hacked module.
 
 No, unfortunatly there is no workaround for this problem. How do you
 plan to hack around it?

I know that my guest only uses iret and exceptions for task switches, so
I think in my case I can assume that any TASK_SWITCH_CALL is really a
TASK_SWITCH_GATE and I don't have to skip an instruction.

Not quite upstreamable, obviously.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Gleb Natapov
On Fri, Jan 27, 2012 at 04:02:30PM +0100, Kevin Wolf wrote:
 Am 27.01.2012 15:17, schrieb Joerg Roedel:
  On Fri, Jan 27, 2012 at 02:55:12PM +0100, Kevin Wolf wrote:
  Am 27.01.2012 14:34, schrieb Joerg Roedel:
  On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
  Am 25.01.2012 17:00, schrieb Joerg Roedel:
 
  I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
  the tree patches of this series plus a printk to output exit_int_info in
  task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
  two failures and my VM86 unit test which hung when trying to return from
  VM86. I also ran the kernel that made me aware of the issue initially.
  All debug messages show exit_int_info = 0.
 
  Okay, you are testing on a K8 which has exactly this bug. As I just
  found out it is documented as erratum 701. The good news is that this
  only happens on K8 and Fam11h, any later AMD processor doesn't have this
  bug.
 
  Meh. Unless you give me a newer processor, this doesn't really help
  me... Doesn't look like there's any way to get a workaround, is there? I
  guess I'll have to hack it locally and possibly break other guests with
  the hacked module.
  
  No, unfortunatly there is no workaround for this problem. How do you
  plan to hack around it?
 
 I know that my guest only uses iret and exceptions for task switches, so
 I think in my case I can assume that any TASK_SWITCH_CALL is really a
 TASK_SWITCH_GATE and I don't have to skip an instruction.
 
You still need to know what exception caused task switch. Some of them
require you to skip an instruction.

 Not quite upstreamable, obviously.
 
 Kevin

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Fix task switches into/out of VM86

2012-01-27 Thread Kevin Wolf
I believe this should work with both VMX and SVM now. Gleb, Jörg, can one of
you test this with SVM? I did some testing on my buggy processor and it looks
as good as it gets, but it would be better if you could confirm.

Kevin Wolf (3):
  KVM: x86 emulator: Fix task switch privilege checks
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: x86 emulator: Allow PM/VM86 switch during task switch

 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   79 ---
 arch/x86/kvm/svm.c |5 ++-
 arch/x86/kvm/vmx.c |8 ++-
 arch/x86/kvm/x86.c |   12 -
 6 files changed, 94 insertions(+), 17 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   51 +++-
 arch/x86/kvm/svm.c |5 +++-
 arch/x86/kvm/vmx.c |8 +++--
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..1b98a2c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ulong old_tss_base =
ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
u32 desc_limit;
+   int dpl;
 
/* FIXME: old_tss_base == ~0 ? */
 
@@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. Software interrupt: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call: Check agains DPL of the TSS
+*/
+   dpl = -1;
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   struct kvm_desc_struct task_gate_desc;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   dpl = task_gate_desc.dpl;
+   }
+   } else if (reason != TASK_SWITCH_IRET) {
+   dpl = 

[PATCH v2 2/3] KVM: x86 emulator: VM86 segments must have DPL 3

2012-01-27 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1b98a2c..833969e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-27 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly and privilege checks
succeed.

VMX code calculates the CPL from the code segment selector and rflags,
so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
of the code segment instead, so we must be sure to give the right one
when updating the selector.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   26 ++
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 833969e..143ce8e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -552,6 +552,14 @@ static void set_segment_selector(struct x86_emulate_ctxt 
*ctxt, u16 selector,
struct desc_struct desc;
 
ctxt-ops-get_segment(ctxt, dummy, desc, base3, seg);
+
+   if (ctxt-mode == X86EMUL_MODE_REAL)
+   desc.dpl = 0;
+   else if (ctxt-mode == X86EMUL_MODE_VM86)
+   desc.dpl = 3;
+   else
+   desc.dpl = selector  0x3;
+
ctxt-ops-set_segment(ctxt, selector, desc, base3, seg);
 }
 
@@ -2273,6 +2281,24 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get it to the vcpu struct immediately because it influences
+* the CPL which is checked at least when loading the segment
+* descriptors and when pushing an error code to the new kernel stack.
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kemari

2012-01-27 Thread Vinod Chegu


Thanks for the pointers Mitsuru !

Vinod



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Fix task switches into/out of VM86

2012-01-27 Thread Gleb Natapov
On Fri, Jan 27, 2012 at 08:23:33PM +0100, Kevin Wolf wrote:
 I believe this should work with both VMX and SVM now. Gleb, Jörg, can one of
 you test this with SVM? I did some testing on my buggy processor and it looks
 as good as it gets, but it would be better if you could confirm.
 
You forgot to set cpl to 3 in vmcb in svm_set_rflags() when vm86 is enabled, no?

 Kevin Wolf (3):
   KVM: x86 emulator: Fix task switch privilege checks
   KVM: x86 emulator: VM86 segments must have DPL 3
   KVM: x86 emulator: Allow PM/VM86 switch during task switch
 
  arch/x86/include/asm/kvm_emulate.h |3 +-
  arch/x86/include/asm/kvm_host.h|4 +-
  arch/x86/kvm/emulate.c |   79 ---
  arch/x86/kvm/svm.c |5 ++-
  arch/x86/kvm/vmx.c |8 ++-
  arch/x86/kvm/x86.c |   12 -
  6 files changed, 94 insertions(+), 17 deletions(-)
 
 -- 
 1.7.6.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cpu_set n+1 online causing a guest to hang.

2012-01-27 Thread Vinod Chegu

Hello,

Wanted to check about the current status of the cpu hotplug support in KVM 
guests.

Pl. excuse me if the following is a known issue (pl. point me to 
the appropriate status/issue/bug-report if it is). 

I have an RHEL6.2 (x86_64)+ KVM host with a KVM guest running RHEL 6.2.

The guest is up and configured with 8 vCPUs (0-7) 

[root@testvm1 cpu]# ls
cpu0  cpu2  cpu4  cpu6  cpufreq  kernel_max  onlinepresent
cpu1  cpu3  cpu5  cpu7  cpuidle  offline possible


(I had set the maximum vcpus to 12).

I tried to hotplug add a new cpu and so I tried the following command from my 
host. 

[root@host ~]# virsh qemu-monitor-command testvm1 --hmp cpu_set 8 online


The guest hung and was non responsive. 

Did I miss any step or is some functionality missing?

Any help/pointers would be appreciated. 

Thanks
Vinod

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V10] Avoid soft lockup message when KVM is stopped by host

2012-01-27 Thread Eric B Munson
On Tue, 17 Jan 2012, Eric B Munson wrote:

 Changes from V9:
 Use kvm_for_each_vcpu to iterate online vcpu's
 
 Changes from V8:
 Make KVM_GUEST_PAUSED a per vm ioctl instead of per vcpu
 
 Changes from V7:
 Define KVM_CAP_GUEST_PAUSED and support check
 Call mark_page_dirty () after setting PVCLOCK_GUEST_STOPPED
 
 Changes from V6:
 Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag
 
 Changes from V5:
 Collapse generic check_and_clear_guest_stopped into patch 2
 Include check_and_clear_guest_stopped defintion to ia64, s390, and powerpc
 Change check_and_clear_guest_stopped to use __get_cpu_var instead of taking 
 the
  cpuid arg.
 Protect check_and_clear_guest_stopped declaration with CONFIG_KVM_CLOCK check
 
 Changes from V4:
 Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
 Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt
 
 Changes from V3:
 Include CC's on patch 3
 Drop clear flag ioctl and have the watchdog clear the flag when it is reset
 
 Changes from V2:
 A new kvm functions defined in kvm_para.h, the only change to pvclock is the
 initial flag definition
 
 Changes from V1:
 (Thanks Marcelo)
 Host code has all been moved to arch/x86/kvm/x86.c
 KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED
 
 Eric B Munson (4):
   Add flag to indicate that a vm was stopped by the host
   Add functions to check if the host has stopped the vm
   Add ioctl for KVMCLOCK_GUEST_STOPPED
   Add check for suspended vm in softlockup detector
 
  Documentation/virtual/kvm/api.txt   |   13 +
  arch/ia64/include/asm/kvm_para.h|5 +
  arch/powerpc/include/asm/kvm_para.h |5 +
  arch/s390/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/kvm_para.h |8 
  arch/x86/include/asm/pvclock-abi.h  |1 +
  arch/x86/kernel/kvmclock.c  |   21 +
  arch/x86/kvm/x86.c  |   25 +
  include/asm-generic/kvm_para.h  |   14 ++
  include/linux/kvm.h |3 +++
  kernel/watchdog.c   |   12 
  11 files changed, 112 insertions(+), 0 deletions(-)
  create mode 100644 include/asm-generic/kvm_para.h
 

Just poking to make sure this set hasn't fallen through the cracks.

Eric


signature.asc
Description: Digital signature


Re: [PATCH V7] Guest stop notification

2012-01-27 Thread Eric B Munson
On Tue, 17 Jan 2012, Eric B Munson wrote:

 Often when a guest is stopped from the qemu console, it will report spurious
 soft lockup warnings on resume.  There are kernel patches being discussed that
 will give the host the ability to tell the guest that it is being stopped and
 should ignore the soft lockup warning that generates.  This patch uses the 
 qemu
 Notifier system to tell the guest it is about to be stopped.
 
 Signed-off-by: Eric B Munson emun...@mgebm.net

Just poking to make sure this doesn't fall through the cracks.

Eric


signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH V7] Guest stop notification

2012-01-27 Thread Anthony Liguori

On 01/17/2012 12:27 PM, Eric B Munson wrote:

Often when a guest is stopped from the qemu console, it will report spurious
soft lockup warnings on resume.  There are kernel patches being discussed that
will give the host the ability to tell the guest that it is being stopped and
should ignore the soft lockup warning that generates.  This patch uses the qemu
Notifier system to tell the guest it is about to be stopped.

Signed-off-by: Eric B Munsonemun...@mgebm.net

Cc: Avi Kivitya...@redhat.com
Cc: Marcelo Tosattimtosa...@redhat.com
Cc: Jan Kiszkajan.kis...@siemens.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: kvm@vger.kernel.org
---
Changes from V6:
  Remove unnecessary include

Changes from V5:
  KVM_GUEST_PAUSED is now a per vm ioctl instead of per vcpu

Changes from V4:
  Test if the guest paused capability is available before use

Changes from V3:
  Collapse new state change notification function into existsing function.
  Correct whitespace issues
  Change ioctl name to KVMCLOCK_GUEST_PAUSED
  Use for loop to iterate vpcu's

Changes from V2:
  Move ioctl into hw/kvmclock.c so as other arches can use it as it is
implemented

Changes from V1:
  Remove unnecessary encapsulating function

  hw/kvmclock.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/kvmclock.c b/hw/kvmclock.c
index 3b9fb20..ad79f52 100644
--- a/hw/kvmclock.c
+++ b/hw/kvmclock.c
@@ -64,10 +64,21 @@ static int kvmclock_post_load(void *opaque, int version_id)
  static void kvmclock_vm_state_change(void *opaque, int running,
   RunState state)
  {
+int ret;
  KVMClockState *s = opaque;
+int cap_guest_paused = kvm_check_extension(kvm_state, 
KVM_CAP_GUEST_PAUSED);

  if (running) {
  s-clock_valid = false;
+
+if (!cap_guest_paused) {
+return;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVMCLOCK_GUEST_PAUSED, 0);
+if (ret) {
+fprintf(stderr, kvmclock_vm_state_change: %s\n, strerror(-ret));
+}
  }
  }



This change looks harmless enough.  What's the state of the kernel bits?  I 
would expect this through uq/master.


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merging kvm-apic into qemu-kvm

2012-01-27 Thread Jan Kiszka
On 2012-01-26 16:49, Avi Kivity wrote:
 On 01/26/2012 05:45 PM, Jan Kiszka wrote:

 I merged the upstream patches one by one, resolving the mechanical and
 logical conflicts in each step. Was done for that backend/frontend
 concept, but the adjustments should basically be the same now. Want me
 to prepare a branch or will you do this?

 It's much more likely that you'll get it right - I started to do this
 but backed out.

 btw, the branch doesn't appear to be merges, so I'll still have huge
 conflicts at the end.  If you do this with real merges, git will
 recognize it and just adopt your version.

 I will try to use your concept: pull in upstream commits into a merge
 branch as long as there is a mechanical or logical conflict. 
 
 That's what I do in my upstream merges.  I use bisect to find the first
 conflict, but in this case I imagine there will be a conflict in every
 merge except the memory.c one.
 
 Will then
 publish the branch for pulling. Can I start at the current 'next' head?
 
 Yes please.  It's halfway through autotest and looks good.  Even if I
 have to change it, we can 'git rebase -p --onto' your branch (though I
 doubt it will be necessary).

Done, see

git://git.kiszka.org/qemu-kvm.git queues/qemu-merge

Only moderately tested so far, I'm counting on your machinery.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] vhost-net: Acquire device lock when releasing device

2012-01-27 Thread Sasha Levin
I just noticed that it happened again, and that this patch didn't make
it's way in.

The patch below indeed fixes the problem for me. Please push it in.

On Sun, 2011-11-27 at 19:06 +0200, Michael S. Tsirkin wrote:
 On Sun, Nov 27, 2011 at 06:49:27PM +0200, Michael S. Tsirkin wrote:
  On Fri, Nov 18, 2011 at 11:19:42AM +0200, Sasha Levin wrote:
   Device lock should be held when releasing a device, and specifically
   when calling vhost_dev_cleanup(). Otherwise, RCU complains about it:
   
   [ 2025.642835] ===
   [ 2025.643838] [ INFO: suspicious RCU usage. ]
   [ 2025.645182] ---
   [ 2025.645927] drivers/vhost/vhost.c:475 suspicious 
   rcu_dereference_protected() usage!
   [ 2025.647329]
   [ 2025.647330] other info that might help us debug this:
   [ 2025.647331]
   [ 2025.649042]
   [ 2025.649043] rcu_scheduler_active = 1, debug_locks = 1
   [ 2025.650235] no locks held by trinity/21042.
   [ 2025.650971]
   [ 2025.650972] stack backtrace:
   [ 2025.651789] Pid: 21042, comm: trinity Not tainted 
   3.2.0-rc2-sasha-00057-ga9098b3 #5
   [ 2025.653342] Call Trace:
   [ 2025.653792]  [810b4a6a] lockdep_rcu_suspicious+0xaf/0xb9
   [ 2025.654916]  [818d4c2c] vhost_dev_cleanup+0x342/0x3ac
   [ 2025.655985]  [818d4f18] vhost_net_release+0x48/0x7f
   [ 2025.657247]  [811416e3] fput+0x11e/0x1dc
   [ 2025.658091]  [8113f1ec] filp_close+0x6e/0x79
   [ 2025.658964]  [81089ed7] put_files_struct+0xcc/0x196
   [ 2025.659971]  [8108a034] exit_files+0x46/0x4f
   [ 2025.660865]  [8108a716] do_exit+0x264/0x75c
   [ 2025.662201]  [8113f490] ? fsnotify_modify+0x60/0x68
   [ 2025.663260]  [81bbdbca] ? sysret_check+0x2e/0x69
   [ 2025.664269]  [8108acc1] do_group_exit+0x83/0xb1
   [ 2025.665448]  [8108ad01] sys_exit_group+0x12/0x16
   [ 2025.666396]  [81bbdb92] system_call_fastpath+0x16/0x1b
   
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: kvm@vger.kernel.org
   Cc: virtualizat...@lists.linux-foundation.org
   Cc: net...@vger.kernel.org
   Signed-off-by: Sasha Levin levinsasha...@gmail.com
   ---
drivers/vhost/net.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
   
   diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
   index 882a51f..c9be601 100644
   --- a/drivers/vhost/net.c
   +++ b/drivers/vhost/net.c
   @@ -586,6 +586,7 @@ static int vhost_net_release(struct inode *inode, 
   struct file *f)
 struct socket *tx_sock;
 struct socket *rx_sock;

   + mutex_lock(n-dev.mutex);
 vhost_net_stop(n, tx_sock, rx_sock);
 vhost_net_flush(n);
 vhost_dev_cleanup(n-dev);
   @@ -596,6 +597,7 @@ static int vhost_net_release(struct inode *inode, 
   struct file *f)
 /* We do an extra flush before freeing memory,
  * since jobs can re-queue themselves. */
 vhost_net_flush(n);
   + mutex_unlock(n-dev.mutex);
 kfree(n);
 return 0;
}
  
  This calls fput fom release under lock which is generally a bad idea,
  as locks become nested then. For example, consider what would happen if this
  somehow triggers a release which in turn needs the same mutex.
  
  And, we are releasing the device, so it seems better to check
  that no one has the lock.
  How about the following instead? What do you think?
  
  --
  
  vhost: fix release path lockdep checks
  
  We shouldn't hold any locks on release path. Pass a flag to
  vhost_dev_cleanup to use the lockdep info correctly.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  ---
 
 Sorry, this got cut short. Here's the full patch (1st chunk was
 missing).  Does this solve the problem for you?
 
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 8b16d16..9bba4b3 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -598,7 +598,7 @@ static int vhost_net_release(struct inode *inode, struct 
 file *f)
  
   vhost_net_stop(n, tx_sock, rx_sock);
   vhost_net_flush(n);
 - vhost_dev_cleanup(n-dev);
 + vhost_dev_cleanup(n-dev, false);
   if (tx_sock)
   fput(tx_sock-file);
   if (rx_sock)
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index a8c95ef..96f9769 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -424,7 +424,7 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
   if (!memory)
   return -ENOMEM;
  
 - vhost_dev_cleanup(dev);
 + vhost_dev_cleanup(dev, true);
  
   memory-nregions = 0;
   RCU_INIT_POINTER(dev-memory, memory);
 @@ -455,8 +455,8 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
   return j;
  }
  
 -/* Caller should have device mutex */
 -void vhost_dev_cleanup(struct vhost_dev *dev)
 +/* Caller should have device mutex if and only if locked is set */
 +void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
  {
   int i;
  
 @@ -493,7 +493,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
   

Re: [Qemu-devel] [PATCH V7] Guest stop notification

2012-01-27 Thread Alexander Graf


On 27.01.2012, at 22:49, Jan Kiszka jan.kis...@web.de wrote:

 On 2012-01-27 21:48, Anthony Liguori wrote:
 On 01/17/2012 12:27 PM, Eric B Munson wrote:
 Often when a guest is stopped from the qemu console, it will report
 spurious
 soft lockup warnings on resume.  There are kernel patches being
 discussed that
 will give the host the ability to tell the guest that it is being
 stopped and
 should ignore the soft lockup warning that generates.  This patch uses
 the qemu
 Notifier system to tell the guest it is about to be stopped.
 
 Signed-off-by: Eric B Munsonemun...@mgebm.net
 
 Cc: Avi Kivitya...@redhat.com
 Cc: Marcelo Tosattimtosa...@redhat.com
 Cc: Jan Kiszkajan.kis...@siemens.com
 Cc: ry...@linux.vnet.ibm.com
 Cc: aligu...@us.ibm.com
 Cc: kvm@vger.kernel.org
 ---
 Changes from V6:
  Remove unnecessary include
 
 Changes from V5:
  KVM_GUEST_PAUSED is now a per vm ioctl instead of per vcpu
 
 Changes from V4:
  Test if the guest paused capability is available before use
 
 Changes from V3:
  Collapse new state change notification function into existsing
 function.
  Correct whitespace issues
  Change ioctl name to KVMCLOCK_GUEST_PAUSED
  Use for loop to iterate vpcu's
 
 Changes from V2:
  Move ioctl into hw/kvmclock.c so as other arches can use it as it is
 implemented
 
 Changes from V1:
  Remove unnecessary encapsulating function
 
  hw/kvmclock.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)
 
 diff --git a/hw/kvmclock.c b/hw/kvmclock.c
 index 3b9fb20..ad79f52 100644
 --- a/hw/kvmclock.c
 +++ b/hw/kvmclock.c
 @@ -64,10 +64,21 @@ static int kvmclock_post_load(void *opaque, int
 version_id)
  static void kvmclock_vm_state_change(void *opaque, int running,
   RunState state)
  {
 +int ret;
  KVMClockState *s = opaque;
 +int cap_guest_paused = kvm_check_extension(kvm_state,
 KVM_CAP_GUEST_PAUSED);
 
  if (running) {
  s-clock_valid = false;
 +
 +if (!cap_guest_paused) {
 +return;
 +}
 +
 +ret = kvm_vm_ioctl(kvm_state, KVMCLOCK_GUEST_PAUSED, 0);
 +if (ret) {
 +fprintf(stderr, kvmclock_vm_state_change: %s\n,
 strerror(-ret));
 +}
  }
  }
 
 
 This change looks harmless enough.
 
 Yep, but needs to be redirected after the file renaming in upstream.
 
 What's the state of the kernel
 bits?  I would expect this through uq/master.
 
 Kernel bits aren't merged yet. And the kernel headers will have to be
 updated in a previous step.
 
 BTW, this series [1] would be nice to have. Dunno if there is a pull
 planned, but anyone trying to extend the KVM interface current runs
 against this.

Yeah, I was mostly waiting for acks/nacks. Since I didn't receive any, expect a 
pullreq next week.

Alex

 
 Jan
 
 [1] http://thread.gmane.org/gmane.comp.emulators.qemu/132962
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html