Re: [PATCH 2/2] add sysctl for kvm wallclock sync

2009-09-01 Thread Chris Lalancette
Glauber Costa wrote:
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index fc409e9..3a4e1bd 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,7 +27,7 @@
>  #define KVM_SCALE 22
>  
>  static int kvmclock = 1;
> -static unsigned int kvm_wall_update_interval = 5;
> +unsigned int kvm_wall_update_interval = 5;

I think the overall idea is very interesting, but I also think that it should be
disabled by default.  Because of the problems with time in virtualization,
people are already conditioned to run ntpd inside their guests, and this
kvmclock change will "fight" with ntpd.  Also, the command "# date 09091323" (or
whatever) ceases to work like it does on bare-metal, so I think it has to be an
opt-in feature.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vmxcap: Add tool to query vmx capabilities

2009-09-01 Thread Amit Shah
On (Tue) Sep 01 2009 [19:18:05], Avi Kivity wrote:
> This should really be in the kernel, but there are too many of them.
> 
> Signed-off-by: Avi Kivity 
> ---
> 
> v2: try /dev/cpu/0/msr first
> 
>  kvm/scripts/vmxcap |  154 
> 

Why not call it cpucap and add svm support later?

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-01 Thread Xin, Xiaohui
>* Code is easier to review than bullet points.

Yes. We'd send the code soon.

>* Direct I/O has to be safe when page is shared by multiple threads,
> and has to be non-blocking since network I/O can take indeterminately
> long (think big queue's, tunneling, ...)

In the situation, one queue pair NIC is assigned to only one guest, the pages 
are locked and a KVM guest will not swapped out.


>* In the past attempts at Direct I/O on network have always had SMP
> TLB issues. The page has to be flipped or marked as COW on all CPU's
> and the cost of the Inter Processor Interrupt to steal the page has
> been slower than copying

It may be, we have not thought about this more . Thanks.

Thanks
Xiaohui

-Original Message-
From: Stephen Hemminger [mailto:shemmin...@vyatta.com] 
Sent: Wednesday, September 02, 2009 12:05 AM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; kvm@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

On Tue, 1 Sep 2009 14:58:19 +0800
"Xin, Xiaohui"  wrote:

>   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
> 
> Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
> and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
> for each tx/rx queue pair. Here we present a generic framework, in which 
> network
> traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
> any software copy.
> 
> Actually this framework can apply to traditional network adapters which have
> just one tx/rx queue pair. And applications using the same user/kernel 
> interface
> can utilize this framework to send/receive network traffic directly thru a 
> tx/rx
> queue pair in a network adapter.
> 
> We use virtio-net architecture to illustrate the framework.
> 
> 
> || pop   add_buf||
> |Qemu process|  <-TX   <--  | Guest Kernel   |
> ||  -> -->  ||
> |Virtio-net  | push  get_buf||
> |  (Backend service) |  ->RX   -->  |  Virtio-net|
> ||  <- <--  |driver  |
> || push  get_buf||
> ||  ||
>|
>|
>| AIO (read & write) combined with Direct I/O
>|   (which substitute synced file operations)
> |---|
> | Host kernel  | read: copy-less with directly mapped user  |
> |  |   space to kernel, payload directly DMAed  |
> |  |   into user space  |
> |  | write: copy-less with directly mapped user |
> |  |   space to kernel, payload directly hooked |
> |  |   to a skb |
> |  ||
> |  (a likely   ||
> |   queue pair ||
> |   instance)  ||
> |  |   ||
> | NIC driver <-->  TUN/TAP driver   |
> |---|
>|
>|
>traditional adapter or a tx/rx queue pair
> 
> The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
> I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
> kernel, we still can see it in SCSI tape driver.
> 
> With traditional file operations, a copying of payload contents from/to the
> kernel DMA address to/from a user buffer is needed. That's what the copying we
> want to save.
> 
> The proposed framework is like this:
> A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
> in
> host side. KVM virto-net Backend service, the user space program submits
> asynchronous read/write I/O requests to the host kernel through TUN/TAP 
> device.
> The requests are corresponding to the vqueue elements include both 
> transmission
> & receive. They can be queued in one AIO request and later, the completion 
> will
> be notified through the underlying packets tx/rx processing of the rx/tx queue
> pair.
> 
> Detailed path:
> 
> To guest Virtio-net driver, packets receive corresponding to asynchronous read
> I/O re

Re: RFC: shadow page table reclaim

2009-09-01 Thread Max Laier
On Monday 31 August 2009 14:40:29 Avi Kivity wrote:
> On 08/31/2009 03:09 PM, Max Laier wrote:
> >>> As you can see there is less saw-
> >>> toothing in the after plot and also fewer changes overall (because we
> >>> don't zap mappings that are still in use as often).  This is with a
> >>> limit of 64 for the shadow page table to increase the effect and
> >>> vmx/ept.
> >>>
> >>> I realize that the list_move and parent walk are quite expensive and
> >>> that kvm_mmu_alloc_page is only half the story.  It should really be
> >>> done every time a new guest page table is mapped - maybe via rmap_add. 
> >>> This would obviously completely kill performance-wise, though.
> >>>
> >>> Another idea would be to improve the reclaim logic in a way that it
> >>> prefers "old" PT_PAGE_TABLE_LEVEL over directories.  Though I'm not
> >>> sure how to code that up sensibly, either.
> >>>
> >>> As I said, this is proof-of-concept and RFC.  So any comments welcome.
> >>> For my use case the proof-of-concept diff seems to do well enough,
> >>> though.
> >>
> >> Given that reclaim is fairly rare, we should try to move the cost
> >> there.  So how about this:
> >>
> >> - add an 'accessed' flag to struct kvm_mmu_page
> >> - when reclaiming, try to evict pages that were not recently accessed
> >> (but don't overscan - if you scan many recently accessed pages, evict
> >> some of them anyway)
> >
> > - prefer page table level pages over directory level pages in the face of
> > overscan.
>
> I'm hoping that overscan will only occur when we start to feel memory
> pressure, and that once we do a full scan we'll get accurate recency
> information.
>
> >> - when scanning, update the accessed flag with the accessed bit of all
> >> parent_ptes
> >
> > I might be misunderstanding, but I think it should be the other way
> > 'round. i.e. a page is accessed if any of it's children have been
> > accessed.
>
> They're both true, but looking at the parents is much more efficient.
> Note we need to look at the accessed bit of the parent_ptes, not parent
> kvm_mmu_pages.
>
> >> - when dropping an spte, update the accessed flag of the kvm_mmu_page it
> >> points to
> >> - when reloading cr3, mark the page as accessed (since it has no
> >> parent_ptes)
> >>
> >> This should introduce some LRU-ness that depends not only on fault
> >> behaviour but also on long-term guest access behaviour (which is
> >> important for long-running processes and kernel pages).
> >
> > I'll try to come up with a patch for this, later tonight.  Unless you
> > already have something in the making.  Thanks.
>
> Please do, it's an area that need attention.

Okay ... I have /something/, but I'm certainly not there yet as I have to 
admit that I don't understand your idea completely.  In addition it seems that 
EPT doesn't have an accessed bit :-\  Any idea for this?

Regardless, testing the attached with EPT, it turns out that not zapping 
shadow pages with root_count != 0 already makes much difference.  After all we 
don't really zap these pages anyways, but just mark them invalid after zapping 
the children.  So this could be a first improvement.

In any case, I clearly don't have the right idea here, yet.  Plus I don't 
really have time to look into this further right now.  And my hack is "good 
enough"[tm] for my testing ... so if anyone more knowledgeable would like to 
continue - much appreciated.  Maybe some of this can at least serve as food 
for thoughts.  Sorry.

-- 
/"\  Best regards,  | mla...@freebsd.org
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mla...@efnet
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3f637f..089ad0e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -394,6 +394,7 @@ struct kvm_arch{
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct kvm_mmu_page *scan_hand;
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	int iommu_flags;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f76d086..3715242 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -869,6 +869,8 @@ static int is_empty_shadow_page(u64 *spt)
 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	ASSERT(is_empty_shadow_page(sp->spt));
+	if (kvm->arch.scan_hand == sp)
+		kvm->arch.scan_hand = NULL;
 	list_del(&sp->link);
 	__free_page(virt_to_page(sp->spt));
 	__free_page(virt_to_page(sp->gfns));
@@ -1490,6 +1492,71 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	return ret;
 }
 
+static int kvm_mmu_test_and_clear_pte_active(struct kvm_mmu_page *sp)
+{
+	struct kvm_pte_chain *pte_chain;
+	struct hlist_node *node;
+	int i, accessed = 0;
+
+	if (!sp->multimapped) {
+		if (!sp->parent_pte) {
+			if (!sp->root_count)
+return 0;
+			else
+

[PATCH] Add pusha/popa instructions to the realmode test harness

2009-09-01 Thread Mohammed Gamal
This adds tests for pusha/popa to the test harness. Added some new typedefs
as well.

Signed-off-by: Mohammed Gamal 
---
 kvm/user/test/x86/realmode.c |   37 +
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
index 0db09b8..20d038e 100644
--- a/kvm/user/test/x86/realmode.c
+++ b/kvm/user/test/x86/realmode.c
@@ -4,6 +4,10 @@ typedef unsigned char u8;
 typedef unsigned short u16;
 typedef unsigned u32;
 typedef unsigned long long u64;
+typedef signed char s8;
+typedef signed short s16;
+typedef signed s32;
+typedef signed long long s64;
 
 void test_function(void);
 
@@ -470,6 +474,7 @@ void test_long_jmp()
 void test_push_pop()
 {
struct regs inregs = { 0 }, outregs;
+
MK_INSN(push32, "mov $0x12345678, %eax\n\t"
"push %eax\n\t"
"pop %ebx\n\t");
@@ -499,6 +504,10 @@ void test_push_pop()
 "mov %fs, %ebx\n\t"
 "pop %fs\n\t"
);
+   MK_INSN(pusha,  "pushaw\n\t");
+   MK_INSN(pushad, "pushal\n\t");
+   MK_INSN(popa, "popaw\n\t");
+   MK_INSN(popad, "popal\n\t");
 
exec_in_big_real_mode(&inregs, &outregs,
  insn_push32,
@@ -539,6 +548,34 @@ void test_push_pop()
 
if (!regs_equal(&inregs, &outregs, R_AX|R_BX) || outregs.ebx != 
outregs.eax)
print_serial("Push/Pop Test 6: FAIL\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+insn_pusha,
+insn_pusha_end - insn_pusha);
+   if (!regs_equal(&inregs, &outregs, R_SP) 
+   || (s16)outregs.esp != (s16)(inregs.esp - 16))
+   print_serial("Push/Pop Test 7: FAIL\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_popa,
+ insn_popa_end - insn_popa);
+   if (outregs.esp != (inregs.esp + 16))
+   print_serial("Push/Pop Test 8: FAIL\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_pushad,
+ insn_pushad_end - insn_pushad);
+
+   if (!regs_equal(&inregs, &outregs, R_SP)
+   || (s16)outregs.esp != (s16)(inregs.esp - 32))
+   print_serial("Push/Pop Test 9: FAIL\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_popad,
+ insn_popad_end - insn_popad);
+
+   if (outregs.esp != (inregs.esp + 32))
+   print_serial("Push/Pop Test 10: FAIL\n");
 }
 
 void test_null(void)
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Open vSwitch

2009-09-01 Thread Anthony Liguori
In case anyone hasn't seen it.  Is interesting to consider in the 
context of something like vhost-net.


http://openvswitch.org/

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 09:12 PM, Andrew Theurer wrote:

Here's a run from branch debugreg with thread debugreg storage +
conditionally reload dr6:

user  nice  system   irq  softirq guest   idle  iowait
5.79  0.009.28  0.08 1.00 20.81  58.784.26
total busy: 36.97

Previous run that had avoided calling adjust_vmx_controls twice:

user  nice  system   irq  softirq guest   idle  iowait
5.81  0.009.48  0.081.04  21.32  57.864.41
total busy: 37.73

A relative reduction CPU cycles of 2%
   


That was an wasy fruit to pick.  To bad it was a regression that we 
introduced.



new oprofile:

   

samples  %app name symbol name
876648   54.1555  kvm-intel.ko vmx_vcpu_run
37595 2.3225  qemu-system-x86_64   cpu_physical_memory_rw
35623 2.2006  qemu-system-x86_64   phys_page_find_alloc
24874 1.5366  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
native_write_msr_safe
17710 1.0940  libc-2.5.so  memcpy
14664 0.9059  kvm.ko   kvm_arch_vcpu_ioctl_run
14577 0.9005  qemu-system-x86_64   qemu_get_ram_ptr
12528 0.7739  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
native_read_msr_safe
10979 0.6782  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
copy_user_generic_string
9979  0.6165  qemu-system-x86_64   virtqueue_get_head
9371  0.5789  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 schedule
8333  0.5148  qemu-system-x86_64   virtqueue_avail_bytes
7899  0.4880  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 fget_light
7289  0.4503  qemu-system-x86_64   main_loop_wait
7217  0.4458  qemu-system-x86_64   lduw_phys
 


This is almost entirely host virtio.  I can reduce native_write_msr_safe 
by a bit, but not much.



6821  0.4214  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
audit_syscall_exit
6749  0.4169  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 do_select
5919  0.3657  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
audit_syscall_entry
5466  0.3377  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 kfree
4887  0.3019  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 fput
4689  0.2897  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 __switch_to
4636  0.2864  
vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 mwait_idle
 


Still not idle=poll, it may shave off 0.2%.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Andrew Theurer
On Tue, 2009-09-01 at 12:47 +0300, Avi Kivity wrote:
> On 09/01/2009 12:44 PM, Avi Kivity wrote:
> > Instead of saving the debug registers from the processor to a kvm data
> > structure, rely in the debug registers stored in the thread structure.
> > This allows us not to save dr6 and dr7.
> >
> > Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
> >
> 
> Andrew, this is now available as the 'debugreg' branch of kvm.git.  
> Given the massive performance improvement, it will be interesting to see 
> how the test results change.
> 
> Marcelo, please queue this for 2.6.32, and I think it's even suitable 
> for -stable.
> 

Here's a run from branch debugreg with thread debugreg storage +
conditionally reload dr6:

user  nice  system   irq  softirq guest   idle  iowait
5.79  0.009.28  0.08 1.00 20.81  58.784.26
total busy: 36.97

Previous run that had avoided calling adjust_vmx_controls twice:

user  nice  system   irq  softirq guest   idle  iowait
5.81  0.009.48  0.081.04  21.32  57.864.41
total busy: 37.73

A relative reduction CPU cycles of 2%

new oprofile:

> samples  %app name symbol name
> 876648   54.1555  kvm-intel.ko vmx_vcpu_run
> 37595 2.3225  qemu-system-x86_64   cpu_physical_memory_rw
> 35623 2.2006  qemu-system-x86_64   phys_page_find_alloc
> 24874 1.5366  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> native_write_msr_safe
> 17710 1.0940  libc-2.5.so  memcpy
> 14664 0.9059  kvm.ko   kvm_arch_vcpu_ioctl_run
> 14577 0.9005  qemu-system-x86_64   qemu_get_ram_ptr
> 12528 0.7739  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> native_read_msr_safe
> 10979 0.6782  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> copy_user_generic_string
> 9979  0.6165  qemu-system-x86_64   virtqueue_get_head
> 9371  0.5789  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 schedule
> 8333  0.5148  qemu-system-x86_64   virtqueue_avail_bytes
> 7899  0.4880  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 fget_light
> 7289  0.4503  qemu-system-x86_64   main_loop_wait
> 7217  0.4458  qemu-system-x86_64   lduw_phys
> 6821  0.4214  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> audit_syscall_exit
> 6749  0.4169  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 do_select
> 5919  0.3657  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> audit_syscall_entry
> 5466  0.3377  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 kfree
> 4887  0.3019  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 fput
> 4689  0.2897  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 __switch_to
> 4636  0.2864  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 mwait_idle
> 4505  0.2783  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 getnstimeofday
> 4453  0.2751  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 system_call
> 4403  0.2720  kvm.ko   kvm_load_guest_fpu
> 4285  0.2647  kvm.ko   kvm_put_guest_fpu
> 4241  0.2620  libpthread-2.5.sopthread_mutex_lock
> 4172  0.2577  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 
> unroll_tree_refs
> 4100  0.2533  qemu-system-x86_64   kvm_run
> 4044  0.2498  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 __down_read
> 3978  0.2457  qemu-system-x86_64   ldl_phys
> 3669  0.2267  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 do_vfs_ioctl
> 3655  0.2258  
> vmlinux-2.6.31-rc5_debugreg_v2.6.31-rc3-3441-g479fa73-autokern1 __up_read
> 

A diff of this and previous run's oprofile:


> profile1 is [./oprofile.before]
> profile2 is [./oprofile.after]
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
> mask of 0x00 (No unit mask) count 1000
> total samples (ts1) for profile1 is 1661542 
> total samples (ts2) for profile2 is 1618760 (includes multiplier of 1.00)
> functions which have a abs(pct2-pct1) < 0.02 are not displayed
> 
>   pct2:   pct1:   
>
>100*100*  pct2 
>
>s1s2   s2/s1  s2/ts1  s1/ts1  -pct1 symbol 
> bin
> - - --- --- --- -- -- 
> ---
>  1559  2747  1.76/1   0.165   0.094  0.071 dput   
> vmlinux
> 34764 35623  1.02/1   2.144   2.092  0.052 phys_page_find_alloc  
> qemu
>  5170  5919  1.14/1   0.356   0.311  0.045 audit_syscall_entry
> vmlinux
>  3593  4172  1.16/1 

Re: [PATCH] KVM: VMX: Check cpl before emulating debug register access

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 12:03:25PM +0300, Avi Kivity wrote:
> Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
> code to emulate the instruction even though it was issued from guest
> userspace, possibly leading to an unexpected trap later.
> 
> Cc: sta...@kernel.org
> Signed-off-by: Avi Kivity 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Avi Kivity

On 09/01/2009 04:32 PM, Mohammed Gamal wrote:

I'd appreciate any suggestions you have to alleviate this.
   

I fail to see why you need an internal loop if you can use the external
(__vcpu_run) one.
 

Because it's not just used by VMX. So I don't think it'd be wise to
use it for something that's VMX-specific.
   


The loop is there anyway.  The only question is whether 
vmx_handle_exit() emulates on instruction or many.


Emulating one instruction is slower, but will get interrupt injection 
more accurate (once we have emulated real mode interrupt injection).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity

On 09/01/2009 04:47 PM, Jan Kiszka wrote:

btw, something else I've considered was not to do any emulation for 'mov
dr' but instead load the debug registers and disable the intercept.  So
far the only issue I've seen is that we lose support for real mode guest
self-debug on intel (pre-unrestricted guest).  What do you think of this?
 

I think you can't have both: optimized dr save/restore on vmentry/exit
and optimized dr access. If you drop on-demand dr register
readout/update, you need to deal with this on every vmentry/exit. My
feeling is that this would be awfully slower, even slower than what we
currently have without your patches.
   


I'll clarify:

In the normal case, we'll have #DB and MOV DR intercepted.

On #DB, update vcpu->dr6.
On MOV DR, disable MOV DR interception, enable dr swap, and let the 
guest execute; on the next exit, reenable MOV DR interception and 
disable dr swap.


It's similar to the fpu code where we don't emulate fpu instructions.  
It doesn't really buy us a lot.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] vmxcap: Add tool to query vmx capabilities

2009-09-01 Thread Avi Kivity
This should really be in the kernel, but there are too many of them.

Signed-off-by: Avi Kivity 
---

v2: try /dev/cpu/0/msr first

 kvm/scripts/vmxcap |  154 
 1 files changed, 154 insertions(+), 0 deletions(-)
 create mode 100755 kvm/scripts/vmxcap

diff --git a/kvm/scripts/vmxcap b/kvm/scripts/vmxcap
new file mode 100755
index 000..50f094b
--- /dev/null
+++ b/kvm/scripts/vmxcap
@@ -0,0 +1,154 @@
+#!/usr/bin/python
+
+MSR_IA32_VMX_BASIC = 0x480
+MSR_IA32_VMX_PINBASED_CTLS = 0x481
+MSR_IA32_VMX_PROCBASED_CTLS = 0x482
+MSR_IA32_VMX_EXIT_CTLS = 0x483
+MSR_IA32_VMX_ENTRY_CTLS = 0x484
+MSR_IA32_VMX_MISC_CTLS = 0x485
+MSR_IA32_VMX_PROCBASED_CTLS2 = 0x48B
+MSR_IA32_VMX_TRUE_PINBASED_CTLS = 0x48D
+MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E
+MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F
+MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490
+
+class msr(object):
+def __init__(self):
+try:
+self.f = file('/dev/cpu/0/msr')
+except:
+self.f = file('/dev/msr0')
+def read(self, index, default = None):
+import struct
+self.f.seek(index)
+try:
+return struct.unpack('Q', self.f.read(8))[0]
+except:
+return default
+
+class Control(object):
+def __init__(self, name, bits, cap_msr, true_cap_msr = None):
+self.name = name
+self.bits = bits
+self.cap_msr = cap_msr
+self.true_cap_msr = true_cap_msr
+def read2(self, nr):
+m = msr()
+val = m.read(nr, 0)
+return (val & 0x, val >> 32)
+def show(self):
+print self.name
+mbz, mb1 = self.read2(self.cap_msr)
+tmbz, tmb1 = 0, 0
+if self.true_cap_msr:
+tmbz, tmb1 = self.read2(self.true_cap_msr)
+for bit in sorted(self.bits.keys()):
+zero = not (mbz & (1 << bit))
+one = mb1 & (1 << bit)
+true_zero = not (tmbz & (1 << bit))
+true_one = tmb1 & (1 << bit)
+s= '?'
+if (self.true_cap_msr and true_zero and true_one
+and one and not zero):
+s = 'default'
+elif zero and not one:
+s = 'no'
+elif one and not zero:
+s = 'forced'
+elif one and zero:
+s = 'yes'
+print '  %-40s %s' % (self.bits[bit], s)
+
+controls = [
+Control(
+name = 'pin-based controls',
+bits = {
+0: 'External interrupt exiting',
+3: 'NMI exiting',
+5: 'Virtual NMIs',
+6: 'Activate VMX-preemption timer',
+7: 'Unrestricted guest',
+},
+cap_msr = MSR_IA32_VMX_PINBASED_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+),
+
+Control(
+name = 'primary processor-based controls',
+bits = {
+2: 'Interrupt window exiting',
+3: 'Use TSC offsetting',
+7: 'HLT exiting',
+9: 'INVLPG exiting',
+10: 'MWAIT exiting',
+11: 'RDPMC exiting',
+12: 'RDTSC exiting',
+15: 'CR3-load exiting',
+16: 'CR3-store exiting',
+19: 'CR8-load exiting',
+20: 'CR8-store exiting',
+21: 'Use TPR shadow',
+22: 'NMI-window exiting',
+23: 'MOV-DR exiting',
+24: 'Unconditional I/O exiting',
+25: 'Use I/O bitmaps',
+27: 'Monitor trap flag',
+28: 'Use MSR bitmaps',
+29: 'MONITOR exiting',
+30: 'PAUSE exiting',
+31: 'Activate secondary control',
+},
+cap_msr = MSR_IA32_VMX_PROCBASED_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+),
+
+Control(
+name = 'secondary processor-based controls',
+bits = {
+0: 'Virtualize APIC accesses',
+1: 'Enable EPT',
+2: 'Descriptor-table exiting',
+4: 'Virtualize x2APIC mode',
+5: 'Enable VPID',
+6: 'WBINVD exiting',
+},
+cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2,
+),
+
+Control(
+name = 'VM-Exit controls',
+bits = {
+2: 'Save debug controls',
+9: 'Host address-space size',
+12: 'Load IA32_PERF_GLOBAL_CTRL',
+15: 'Acknowledge interrupt on exit',
+18: 'Save IA32_PAT',
+19: 'Load IA32_PAT',
+20: 'Save IA32_EFER',
+21: 'Load IA32_EFER',
+22: 'Save VMX-preemption timer value',
+},
+cap_msr = MSR_IA32_VMX_EXIT_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_EXIT_CTLS,
+),
+
+Control(
+name = 'VM-Entry controls',
+bits = {
+2: 'Load debug controls',
+9: 'IA-64 mode guest',
+10: 'Entry to SMM',
+11: 'Deactivate dual-monitor treatmen

Re: [PATCH] Add tool to query vmx capabilities

2009-09-01 Thread Avi Kivity

On 09/01/2009 07:07 PM, Jan Kiszka wrote:


For me this is /dev/cpu/0/msr.

Maybe:
 try:
 self.f = file('/dev/msr0')
 except:
 self.f = file('/dev/cpu/0/msr')

   


I have /dev/cpu as well, I think that's the canonical location.  Better 
to support both though.  Hurray for stable interfaces!


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add tool to query vmx capabilities

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> This should really be in the kernel, but there are too many of them.

Nice (and said as my host cpu has so many 'no').

> 
> Signed-off-by: Avi Kivity 
> ---
>  kvm/scripts/vmxcap |  151 
> 
>  1 files changed, 151 insertions(+), 0 deletions(-)
>  create mode 100755 kvm/scripts/vmxcap
> 
> diff --git a/kvm/scripts/vmxcap b/kvm/scripts/vmxcap
> new file mode 100755
> index 000..ff22d8c
> --- /dev/null
> +++ b/kvm/scripts/vmxcap
> @@ -0,0 +1,151 @@
> +#!/usr/bin/python
> +
> +MSR_IA32_VMX_BASIC = 0x480
> +MSR_IA32_VMX_PINBASED_CTLS = 0x481
> +MSR_IA32_VMX_PROCBASED_CTLS = 0x482
> +MSR_IA32_VMX_EXIT_CTLS = 0x483
> +MSR_IA32_VMX_ENTRY_CTLS = 0x484
> +MSR_IA32_VMX_MISC_CTLS = 0x485
> +MSR_IA32_VMX_PROCBASED_CTLS2 = 0x48B
> +MSR_IA32_VMX_TRUE_PINBASED_CTLS = 0x48D
> +MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E
> +MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F
> +MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490
> +
> +class msr(object):
> +def __init__(self):
> +self.f = file('/dev/msr0')

For me this is /dev/cpu/0/msr.

Maybe:
try:
self.f = file('/dev/msr0')
except:
self.f = file('/dev/cpu/0/msr')

Jan

> +def read(self, index, default = None):
> +import struct
> +self.f.seek(index)
> +try:
> +return struct.unpack('Q', self.f.read(8))[0]
> +except:
> +return default
> +
> +class Control(object):
> +def __init__(self, name, bits, cap_msr, true_cap_msr = None):
> +self.name = name
> +self.bits = bits
> +self.cap_msr = cap_msr
> +self.true_cap_msr = true_cap_msr
> +def read2(self, nr):
> +m = msr()
> +val = m.read(nr, 0)
> +return (val & 0x, val >> 32)
> +def show(self):
> +print self.name
> +mbz, mb1 = self.read2(self.cap_msr)
> +tmbz, tmb1 = 0, 0
> +if self.true_cap_msr:
> +tmbz, tmb1 = self.read2(self.true_cap_msr)
> +for bit in sorted(self.bits.keys()):
> +zero = not (mbz & (1 << bit))
> +one = mb1 & (1 << bit)
> +true_zero = not (tmbz & (1 << bit))
> +true_one = tmb1 & (1 << bit)
> +s= '?'
> +if (self.true_cap_msr and true_zero and true_one
> +and one and not zero):
> +s = 'default'
> +elif zero and not one:
> +s = 'no'
> +elif one and not zero:
> +s = 'forced'
> +elif one and zero:
> +s = 'yes'
> +print '  %-40s %s' % (self.bits[bit], s)
> +
> +controls = [
> +Control(
> +name = 'pin-based controls',
> +bits = {
> +0: 'External interrupt exiting',
> +3: 'NMI exiting',
> +5: 'Virtual NMIs',
> +6: 'Activate VMX-preemption timer',
> +7: 'Unrestricted guest',
> +},
> +cap_msr = MSR_IA32_VMX_PINBASED_CTLS,
> +true_cap_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS,
> +),
> +
> +Control(
> +name = 'primary processor-based controls',
> +bits = {
> +2: 'Interrupt window exiting',
> +3: 'Use TSC offsetting',
> +7: 'HLT exiting',
> +9: 'INVLPG exiting',
> +10: 'MWAIT exiting',
> +11: 'RDPMC exiting',
> +12: 'RDTSC exiting',
> +15: 'CR3-load exiting',
> +16: 'CR3-store exiting',
> +19: 'CR8-load exiting',
> +20: 'CR8-store exiting',
> +21: 'Use TPR shadow',
> +22: 'NMI-window exiting',
> +23: 'MOV-DR exiting',
> +24: 'Unconditional I/O exiting',
> +25: 'Use I/O bitmaps',
> +27: 'Monitor trap flag',
> +28: 'Use MSR bitmaps',
> +29: 'MONITOR exiting',
> +30: 'PAUSE exiting',
> +31: 'Activate secondary control',
> +},
> +cap_msr = MSR_IA32_VMX_PROCBASED_CTLS,
> +true_cap_msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
> +),
> +
> +Control(
> +name = 'secondary processor-based controls',
> +bits = {
> +0: 'Virtualize APIC accesses',
> +1: 'Enable EPT',
> +2: 'Descriptor-table exiting',
> +4: 'Virtualize x2APIC mode',
> +5: 'Enable VPID',
> +6: 'WBINVD exiting',
> +},
> +cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2,
> +),
> +
> +Control(
> +name = 'VM-Exit controls',
> +bits = {
> +2: 'Save debug controls',
> +9: 'Host address-space size',
> +12: 'Load IA32_PERF_GLOBAL_CTRL',
> +15: 'Acknowledge interrupt on exit',
> +18: 'Save IA32_PAT',
> +19: 'Load IA32_PAT',
> +20: 'Save IA32_EFER',
> +21: 'Load IA32_EFER',
> + 

Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-01 Thread Stephen Hemminger
On Tue, 1 Sep 2009 14:58:19 +0800
"Xin, Xiaohui"  wrote:

>   [RFC] Virtual Machine Device Queues (VMDq) support on KVM
> 
> Network adapter with VMDq technology presents multiple pairs of tx/rx queues,
> and renders network L2 sorting mechanism based on MAC addresses and VLAN tags
> for each tx/rx queue pair. Here we present a generic framework, in which 
> network
> traffic to/from a tx/rx queue pair can be directed from/to a KVM guest without
> any software copy.
> 
> Actually this framework can apply to traditional network adapters which have
> just one tx/rx queue pair. And applications using the same user/kernel 
> interface
> can utilize this framework to send/receive network traffic directly thru a 
> tx/rx
> queue pair in a network adapter.
> 
> We use virtio-net architecture to illustrate the framework.
> 
> 
> || pop   add_buf||
> |Qemu process|  <-TX   <--  | Guest Kernel   |
> ||  -> -->  ||
> |Virtio-net  | push  get_buf||
> |  (Backend service) |  ->RX   -->  |  Virtio-net|
> ||  <- <--  |driver  |
> || push  get_buf||
> ||  ||
>|
>|
>| AIO (read & write) combined with Direct I/O
>|   (which substitute synced file operations)
> |---|
> | Host kernel  | read: copy-less with directly mapped user  |
> |  |   space to kernel, payload directly DMAed  |
> |  |   into user space  |
> |  | write: copy-less with directly mapped user |
> |  |   space to kernel, payload directly hooked |
> |  |   to a skb |
> |  ||
> |  (a likely   ||
> |   queue pair ||
> |   instance)  ||
> |  |   ||
> | NIC driver <-->  TUN/TAP driver   |
> |---|
>|
>|
>traditional adapter or a tx/rx queue pair
> 
> The basic idea is to utilize the kernel Asynchronous I/O combined with Direct
> I/O to implements copy-less TUN/TAP device. AIO and Direct I/O is not new to
> kernel, we still can see it in SCSI tape driver.
> 
> With traditional file operations, a copying of payload contents from/to the
> kernel DMA address to/from a user buffer is needed. That's what the copying we
> want to save.
> 
> The proposed framework is like this:
> A TUN/TAP device is bound to a traditional NIC adapter or a tx/rx queue pair 
> in
> host side. KVM virto-net Backend service, the user space program submits
> asynchronous read/write I/O requests to the host kernel through TUN/TAP 
> device.
> The requests are corresponding to the vqueue elements include both 
> transmission
> & receive. They can be queued in one AIO request and later, the completion 
> will
> be notified through the underlying packets tx/rx processing of the rx/tx queue
> pair.
> 
> Detailed path:
> 
> To guest Virtio-net driver, packets receive corresponding to asynchronous read
> I/O requests of Backend service.
> 
> 1) Guest Virtio-net driver provides header and payload address through the
> receive vqueue to Virtio-net backend service.
> 
> 2) Virtio-net backend service encapsulates multiple vqueue elements into
> multiple AIO control blocks and composes them into one AIO read request.
> 
> 3) Virtio-net backend service uses io_submit() syscall to pass the request to
> the TUN/TAP device.
> 
> 4) Virtio-net backend service uses io_getevents() syscall to check the
> completion of the request.
> 
> 5) The TUN/TAP driver receives packets from the queue pair of NIC, and 
> prepares
> for Direct I/O.
>A modified NIC driver may render a skb which header is allocated in host
> kernel, but the payload buffer is directly mapped from user space buffer which
> are rendered through the AIO request by the Backend service. get_user_pages()
> may do this. For one AIO read request, the TUN/TAP driver maintains a list for
> the directly mapped buffers, and a NIC driver tries to get the buffers as
> payload buffer to compose the new skbs. Of course, if getting the buffers
> fails, then kernel allocated buffers are used.
> 
> 6) Modern NIC cards now most

Re: [PATCH] x86 emulator: Add pusha and popa instructions

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 03:28:11PM +0200, Mohammed Gamal wrote:
> This adds pusha and popa instructions (opcodes 0x60-0x61), this enables 
> booting
> MINIX with invalid guest state emulation on.

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 03:32:29PM +0200, Mohammed Gamal wrote:
> > +       if (vmx->emulation_required && emulate_invalid_guest_state)
> >                return;
> >
> > And then emulate in the exit handler.
> >
> >> I'd appreciate any suggestions you have to alleviate this.
> >
> > I fail to see why you need an internal loop if you can use the external
> > (__vcpu_run) one.
> 
> Because it's not just used by VMX. So I don't think it'd be wise to
> use it for something that's VMX-specific.

OK, it can be done incrementally. This is already an improvement.
Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Handle emulation failure in userspace

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 03:13:20PM +0200, Mohammed Gamal wrote:
> Since we return to userspace from KVM on invalid state emulation failure, let
> qemu handle it.

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add tool to query vmx capabilities

2009-09-01 Thread Avi Kivity
This should really be in the kernel, but there are too many of them.

Signed-off-by: Avi Kivity 
---
 kvm/scripts/vmxcap |  151 
 1 files changed, 151 insertions(+), 0 deletions(-)
 create mode 100755 kvm/scripts/vmxcap

diff --git a/kvm/scripts/vmxcap b/kvm/scripts/vmxcap
new file mode 100755
index 000..ff22d8c
--- /dev/null
+++ b/kvm/scripts/vmxcap
@@ -0,0 +1,151 @@
+#!/usr/bin/python
+
+MSR_IA32_VMX_BASIC = 0x480
+MSR_IA32_VMX_PINBASED_CTLS = 0x481
+MSR_IA32_VMX_PROCBASED_CTLS = 0x482
+MSR_IA32_VMX_EXIT_CTLS = 0x483
+MSR_IA32_VMX_ENTRY_CTLS = 0x484
+MSR_IA32_VMX_MISC_CTLS = 0x485
+MSR_IA32_VMX_PROCBASED_CTLS2 = 0x48B
+MSR_IA32_VMX_TRUE_PINBASED_CTLS = 0x48D
+MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E
+MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F
+MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490
+
+class msr(object):
+def __init__(self):
+self.f = file('/dev/msr0')
+def read(self, index, default = None):
+import struct
+self.f.seek(index)
+try:
+return struct.unpack('Q', self.f.read(8))[0]
+except:
+return default
+
+class Control(object):
+def __init__(self, name, bits, cap_msr, true_cap_msr = None):
+self.name = name
+self.bits = bits
+self.cap_msr = cap_msr
+self.true_cap_msr = true_cap_msr
+def read2(self, nr):
+m = msr()
+val = m.read(nr, 0)
+return (val & 0x, val >> 32)
+def show(self):
+print self.name
+mbz, mb1 = self.read2(self.cap_msr)
+tmbz, tmb1 = 0, 0
+if self.true_cap_msr:
+tmbz, tmb1 = self.read2(self.true_cap_msr)
+for bit in sorted(self.bits.keys()):
+zero = not (mbz & (1 << bit))
+one = mb1 & (1 << bit)
+true_zero = not (tmbz & (1 << bit))
+true_one = tmb1 & (1 << bit)
+s= '?'
+if (self.true_cap_msr and true_zero and true_one
+and one and not zero):
+s = 'default'
+elif zero and not one:
+s = 'no'
+elif one and not zero:
+s = 'forced'
+elif one and zero:
+s = 'yes'
+print '  %-40s %s' % (self.bits[bit], s)
+
+controls = [
+Control(
+name = 'pin-based controls',
+bits = {
+0: 'External interrupt exiting',
+3: 'NMI exiting',
+5: 'Virtual NMIs',
+6: 'Activate VMX-preemption timer',
+7: 'Unrestricted guest',
+},
+cap_msr = MSR_IA32_VMX_PINBASED_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+),
+
+Control(
+name = 'primary processor-based controls',
+bits = {
+2: 'Interrupt window exiting',
+3: 'Use TSC offsetting',
+7: 'HLT exiting',
+9: 'INVLPG exiting',
+10: 'MWAIT exiting',
+11: 'RDPMC exiting',
+12: 'RDTSC exiting',
+15: 'CR3-load exiting',
+16: 'CR3-store exiting',
+19: 'CR8-load exiting',
+20: 'CR8-store exiting',
+21: 'Use TPR shadow',
+22: 'NMI-window exiting',
+23: 'MOV-DR exiting',
+24: 'Unconditional I/O exiting',
+25: 'Use I/O bitmaps',
+27: 'Monitor trap flag',
+28: 'Use MSR bitmaps',
+29: 'MONITOR exiting',
+30: 'PAUSE exiting',
+31: 'Activate secondary control',
+},
+cap_msr = MSR_IA32_VMX_PROCBASED_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+),
+
+Control(
+name = 'secondary processor-based controls',
+bits = {
+0: 'Virtualize APIC accesses',
+1: 'Enable EPT',
+2: 'Descriptor-table exiting',
+4: 'Virtualize x2APIC mode',
+5: 'Enable VPID',
+6: 'WBINVD exiting',
+},
+cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2,
+),
+
+Control(
+name = 'VM-Exit controls',
+bits = {
+2: 'Save debug controls',
+9: 'Host address-space size',
+12: 'Load IA32_PERF_GLOBAL_CTRL',
+15: 'Acknowledge interrupt on exit',
+18: 'Save IA32_PAT',
+19: 'Load IA32_PAT',
+20: 'Save IA32_EFER',
+21: 'Load IA32_EFER',
+22: 'Save VMX-preemption timer value',
+},
+cap_msr = MSR_IA32_VMX_EXIT_CTLS,
+true_cap_msr = MSR_IA32_VMX_TRUE_EXIT_CTLS,
+),
+
+Control(
+name = 'VM-Entry controls',
+bits = {
+2: 'Load debug controls',
+9: 'IA-64 mode guest',
+10: 'Entry to SMM',
+11: 'Deactivate dual-monitor treatment',
+13: 'Load IA32_PERF_GLOBAL_CTRL',
+14: 'Load IA32_PAT',
+15: 'Load IA3

RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui

>It may be possible to make vmdq appear like an sr-iov capable device 
>from userspace.  sr-iov provides the userspace interfaces to allocate 
>interfaces and assign mac addresses.  To make it useful, you would have 
>to handle tx multiplexing in the driver but that would be much easier to 
>consume for kvm

What we have thought is to support multiple net_dev structures 
according to multiple queue pairs of one vmdq adapter and presents
multiple mac address in user space and each one mac can be used 
by a guest. 
What does the tx multiplexing in the driver exactly mean?

Thanks
Xiaohui

-Original Message-
From: Anthony Liguori [mailto:anth...@codemonkey.ws] 
Sent: Tuesday, September 01, 2009 5:57 AM
To: Avi Kivity
Cc: Xin, Xiaohui; m...@redhat.com; net...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; kvm@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

Avi Kivity wrote:
> On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
>> Hi, Michael
>> That's a great job. We are now working on support VMDq on KVM, and 
>> since the VMDq hardware presents L2 sorting based on MAC addresses 
>> and VLAN tags, our target is to implement a zero copy solution using 
>> VMDq. We stared from the virtio-net architecture. What we want to 
>> proposal is to use AIO combined with direct I/O:
>> 1) Modify virtio-net Backend service in Qemu to submit aio requests 
>> composed from virtqueue.
>> 2) Modify TUN/TAP device to support aio operations and the user space 
>> buffer directly mapping into the host kernel.
>> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
>> 4) Modify the net_dev and skb structure to permit allocated skb to 
>> use user space directly mapped payload buffer address rather then 
>> kernel allocated.
>>
>> As zero copy is also your goal, we are interested in what's in your 
>> mind, and would like to collaborate with you if possible.
>>
>
> One way to share the effort is to make vmdq queues available as normal 
> kernel interfaces.

It may be possible to make vmdq appear like an sr-iov capable device 
from userspace.  sr-iov provides the userspace interfaces to allocate 
interfaces and assign mac addresses.  To make it useful, you would have 
to handle tx multiplexing in the driver but that would be much easier to 
consume for kvm.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-01 Thread Xin, Xiaohui
>I don't think we should do that with the tun/tap driver. By design, tun/tap is 
>a way to interact >with the
>networking stack as if coming from a device. The only way this connects to an 
>external >adapter is through
>a bridge or through IP routing, which means that it does not correspond to a 
>specific NIC.
>I have worked on a driver I called 'macvtap' in lack of a better name, to add 
>a new tap >frontend to
>the 'macvlan' driver. Since macvlan lets you add slaves to a single NIC 
>device, this gives you >a direct
>connection between one or multiple tap devices to an external NIC, which works 
>a lot better >than when
>you have a bridge inbetween. There is also work underway to add a bridging 
>capability to >macvlan, so
>you can communicate directly between guests like you can do with a bridge.
>Michael's vhost_net can plug into the same macvlan infrastructure, so the work 
>is >complementary.

We use TUN/TAP device to implement the prototype, and agree that it's not the 
only
choice here. We'd compare the two if possible.
And what we cares more about is the modification in the kernel like the net_dev 
and 
skb structures' modifications, thanks.

Thanks
Xiaohui

-Original Message-
From: Arnd Bergmann [mailto:a...@arndb.de] 
Sent: Monday, August 31, 2009 11:24 PM
To: Xin, Xiaohui
Cc: m...@redhat.com; net...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; kvm@vger.kernel.org; 
linux-ker...@vger.kernel.org; mi...@elte.hu; linux...@kvack.org; 
a...@linux-foundation.org; h...@zytor.com; gregory.hask...@gmail.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On Monday 31 August 2009, Xin, Xiaohui wrote:
> 
> Hi, Michael
> That's a great job. We are now working on support VMDq on KVM, and since the 
> VMDq hardware presents L2 sorting
> based on MAC addresses and VLAN tags, our target is to implement a zero copy 
> solution using VMDq.

I'm also interested in helping there, please include me in the discussions.

> We stared
> from the virtio-net architecture. What we want to proposal is to use AIO 
> combined with direct I/O:
> 1) Modify virtio-net Backend service in Qemu to submit aio requests composed 
> from virtqueue.

right, that sounds useful.

> 2) Modify TUN/TAP device to support aio operations and the user space buffer 
> directly mapping into the host kernel.
> 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.

I don't think we should do that with the tun/tap driver. By design, tun/tap is 
a way to interact with the
networking stack as if coming from a device. The only way this connects to an 
external adapter is through
a bridge or through IP routing, which means that it does not correspond to a 
specific NIC.

I have worked on a driver I called 'macvtap' in lack of a better name, to add a 
new tap frontend to
the 'macvlan' driver. Since macvlan lets you add slaves to a single NIC device, 
this gives you a direct
connection between one or multiple tap devices to an external NIC, which works 
a lot better than when
you have a bridge inbetween. There is also work underway to add a bridging 
capability to macvlan, so
you can communicate directly between guests like you can do with a bridge.

Michael's vhost_net can plug into the same macvlan infrastructure, so the work 
is complementary.

> 4) Modify the net_dev and skb structure to permit allocated skb to use user 
> space directly mapped payload
> buffer address rather then kernel allocated.

yes.

> As zero copy is also your goal, we are interested in what's in your mind, and 
> would like to collaborate with you if possible.
> BTW, we will send our VMDq write-up very soon.

Ok, cool.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2848498 ] COMPILATION ERRORS

2009-09-01 Thread SourceForge.net
Bugs item #2848498, was opened at 2009-09-01 18:51
Message generated for change (Comment added) made by pavlinux
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2848498&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pavel Vasilyev (pavlinux)
Assigned to: Nobody/Anonymous (nobody)
Summary: COMPILATION ERRORS

Initial Comment:
CPU:  Dual Core AMD Opteron(tm) Processor 265
KVM:  kvm-88-905-g6025b2d (git)
KERNEL:   2.6.30.5
ARCH: x86_64
Guest OS: Windows XP SP3
qemu command: many


Unused variables, redefinition, etc. 

--

>Comment By: Pavel Vasilyev (pavlinux)
Date: 2009-09-01 18:52

Message:
diff --git a/qemu-kvm.c b/qemu-kvm.c
index d554749..245c4c7 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1815,7 +1815,6 @@ static void *ap_main_loop(void *_env)
 {
 CPUState *env = _env;
 sigset_t signals;
-struct ioperm_data *data = NULL;

 current_env = env;
 env->thread_id = kvm_get_thread_id();
@@ -1824,6 +1823,9 @@ static void *ap_main_loop(void *_env)
 env->kvm_cpu_state.vcpu_ctx = kvm_create_vcpu(env, env->cpu_index);

 #ifdef USE_KVM_DEVICE_ASSIGNMENT
+
+struct ioperm_data *data = NULL;
+
 /* do ioperm for io ports of assigned devices */
 LIST_FOREACH(data, &ioperm_head, entries)
 on_vcpu(env, kvm_arch_do_ioperm, data);
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 2c1730b..2ffc600 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1060,6 +1060,9 @@ int kvm_arch_init_irq_routing(void);
 int kvm_mmio_read(void *opaque, uint64_t addr, uint8_t * data, int len);
 int kvm_mmio_write(void *opaque, uint64_t addr, uint8_t * data, int
len);

+void kvm_mutex_unlock(void);
+void kvm_mutex_lock(void);
+
 #ifdef USE_KVM_DEVICE_ASSIGNMENT
 struct ioperm_data;

@@ -1072,6 +1075,7 @@ void kvm_arch_do_ioperm(void *_data);
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 #define BITMAP_SIZE(m) (ALIGN(((m)>>TARGET_PAGE_BITS), HOST_LONG_BITS) /
8)

+
 #ifdef CONFIG_KVM
 #include "sys-queue.h"

@@ -1124,10 +1128,6 @@ static inline void kvm_init_vcpu(CPUState *env)
 static inline void kvm_load_tsc(CPUState *env)
 {
 }
-#endif
-
-void kvm_mutex_unlock(void);
-void kvm_mutex_lock(void);

 static inline void qemu_mutex_unlock_iothread(void)
 {
@@ -1140,6 +1140,9 @@ static inline void qemu_mutex_lock_iothread(void)
 if (kvm_enabled())
 kvm_mutex_lock();
 }
+#endif
+
+void kvm_arch_do_ioperm(void *data);

 int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
target_phys_addr_t end_addr);


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2848498&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2848498 ] COMPILATION ERRORS

2009-09-01 Thread SourceForge.net
Bugs item #2848498, was opened at 2009-09-01 18:51
Message generated for change (Tracker Item Submitted) made by pavlinux
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2848498&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pavel Vasilyev (pavlinux)
Assigned to: Nobody/Anonymous (nobody)
Summary: COMPILATION ERRORS

Initial Comment:
CPU:  Dual Core AMD Opteron(tm) Processor 265
KVM:  kvm-88-905-g6025b2d (git)
KERNEL:   2.6.30.5
ARCH: x86_64
Guest OS: Windows XP SP3
qemu command: many


Unused variables, redefinition, etc. 

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2848498&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> On 09/01/2009 04:15 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> Only reload debug register 6 if we're running with the guest's
>>> debug registers.  Saves around 150 cycles from the guest lightweight
>>> exit path.
>>>
>>> dr6 contains a couple of bits that are updated on #DB, so intercept
>>> that unconditionally and update those bits then.
>>>
>>> Signed-off-by: Avi Kivity
>>> ---
>>> v2: trap #DB so we maintain the TF related bits of DR7.
>>>
>>>   arch/x86/kvm/vmx.c |   14 +-
>>>   1 files changed, 9 insertions(+), 5 deletions(-)
>>>
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 05cd554..d4be978 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -540,10 +540,12 @@ static void update_exception_bitmap(struct kvm_vcpu 
>>> *vcpu)
>>> eb = (1u<<  PF_VECTOR) | (1u<<  UD_VECTOR) | (1u<<  MC_VECTOR);
>>> if (!vcpu->fpu_active)
>>> eb |= 1u<<  NM_VECTOR;
>>> +   /*
>>> +* Unconditionally intercept #DB so we can maintain dr6 without
>>> +* reading it every exit.
>>> +*/
>>> +   eb |= 1u<<  DB_VECTOR;
>>>  
>> If this is safe...
>>
>>
>>> if (vcpu->guest_debug&  KVM_GUESTDBG_ENABLE) {
>>> -   if (vcpu->guest_debug&
>>> -   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
>>> -   eb |= 1u<<  DB_VECTOR;
>>> if (vcpu->guest_debug&  KVM_GUESTDBG_USE_SW_BP)
>>> eb |= 1u<<  BP_VECTOR;
>>> }
>>> @@ -3629,7 +3631,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>  */
>>> vmcs_writel(HOST_CR0, read_cr0());
>>>
>>> -   set_debugreg(vcpu->arch.dr6, 6);
>>> +   if (vcpu->arch.switch_db_regs)
>>> +   set_debugreg(vcpu->arch.dr6, 6);
>>>
>>> asm(
>>> /* Store host registers */
>>> @@ -3731,7 +3734,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>   | (1<<  VCPU_EXREG_PDPTR));
>>> vcpu->arch.regs_dirty = 0;
>>>
>>> -   get_debugreg(vcpu->arch.dr6, 6);
>>> +   if (vcpu->arch.switch_db_regs)
>>> +   get_debugreg(vcpu->arch.dr6, 6);
>>>
>>> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>>> if (vmx->rmode.irq.pending)
>>>  
>> ...I wonder why we cannot drop this save/restore? The guest is not able
>> to access dr6 without causing a trap, thus will never see dr6 as stored
>> in the hardware.
>>
> 
> I think you're right.
> 
> btw, something else I've considered was not to do any emulation for 'mov 
> dr' but instead load the debug registers and disable the intercept.  So 
> far the only issue I've seen is that we lose support for real mode guest 
> self-debug on intel (pre-unrestricted guest).  What do you think of this?

I think you can't have both: optimized dr save/restore on vmentry/exit
and optimized dr access. If you drop on-demand dr register
readout/update, you need to deal with this on every vmentry/exit. My
feeling is that this would be awfully slower, even slower than what we
currently have without your patches.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 04:31 PM, Jan Kiszka wrote:

The hardware will clear dr7 on exit.
 

Given this fact, I wonder why you want to reload host dr0..3 on
vcpu->arch.switch_db_regs. if (unlikely(test_thread_flag(TIF_DEBUG)))
should suffice then, right?
   


I think you're right.  It would mean unsynced hardware and software 
registers, but only if debugging is disabled, which should be fine.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity

On 09/01/2009 04:15 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

Only reload debug register 6 if we're running with the guest's
debug registers.  Saves around 150 cycles from the guest lightweight
exit path.

dr6 contains a couple of bits that are updated on #DB, so intercept
that unconditionally and update those bits then.

Signed-off-by: Avi Kivity
---
v2: trap #DB so we maintain the TF related bits of DR7.

  arch/x86/kvm/vmx.c |   14 +-
  1 files changed, 9 insertions(+), 5 deletions(-)


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 05cd554..d4be978 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -540,10 +540,12 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
eb = (1u<<  PF_VECTOR) | (1u<<  UD_VECTOR) | (1u<<  MC_VECTOR);
if (!vcpu->fpu_active)
eb |= 1u<<  NM_VECTOR;
+   /*
+* Unconditionally intercept #DB so we can maintain dr6 without
+* reading it every exit.
+*/
+   eb |= 1u<<  DB_VECTOR;
 

If this is safe...

   

if (vcpu->guest_debug&  KVM_GUESTDBG_ENABLE) {
-   if (vcpu->guest_debug&
-   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
-   eb |= 1u<<  DB_VECTOR;
if (vcpu->guest_debug&  KVM_GUESTDBG_USE_SW_BP)
eb |= 1u<<  BP_VECTOR;
}
@@ -3629,7 +3631,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 */
vmcs_writel(HOST_CR0, read_cr0());

-   set_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   set_debugreg(vcpu->arch.dr6, 6);

asm(
/* Store host registers */
@@ -3731,7 +3734,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
  | (1<<  VCPU_EXREG_PDPTR));
vcpu->arch.regs_dirty = 0;

-   get_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   get_debugreg(vcpu->arch.dr6, 6);

vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
if (vmx->rmode.irq.pending)
 

...I wonder why we cannot drop this save/restore? The guest is not able
to access dr6 without causing a trap, thus will never see dr6 as stored
in the hardware.
   


I think you're right.

btw, something else I've considered was not to do any emulation for 'mov 
dr' but instead load the debug registers and disable the intercept.  So 
far the only issue I've seen is that we lose support for real mode guest 
self-debug on intel (pre-unrestricted guest).  What do you think of this?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Mohammed Gamal
On Tue, Sep 1, 2009 at 3:29 PM, Marcelo Tosatti wrote:
> On Tue, Sep 01, 2009 at 03:08:55PM +0200, Mohammed Gamal wrote:
>> On Tue, Sep 1, 2009 at 2:18 PM, Marcelo Tosatti wrote:
>> > On Tue, Sep 01, 2009 at 02:14:17PM +0200, Mohammed Gamal wrote:
>> >> On Tue, Sep 1, 2009 at 1:48 PM, Marcelo Tosatti 
>> >> wrote:
>> >> > On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
>> >> >> - Change returned handle_invalid_guest_state() to return relevant exit 
>> >> >> codes
>> >> >> - Move triggering the emulation from vmx_vcpu_run() to 
>> >> >> vmx_handle_exit()
>> >> >> - Return to userspace instead of repeatedly trying to emulate 
>> >> >> instructions that have already failed
>> >> >>
>> >> >> Signed-off-by: Mohammed Gamal 
>> >> >
>> >> > Mohammed,
>> >> >
>> >> > The handle_invalid_guest_state loop is potentially problematic. It would
>> >> > be more appropriate to use the __vcpu_run loop.
>> >> >
>> >> > Can't you set vmx->emulation_required depending on the result
>> >> > of one call to emulate_instruction and get rid of the while
>> >> > (!guest_state_valid(vcpu)) loop?
>> >> >
>> >>
>> >> Invalid state emulation is VMX-specfic, while the __vcpu_run loop is
>> >> independent of the virtualization extension (defined in x86.c), no?
>> >> AMD SVM can comforably run hosts in big-real mode and thus it doesn't
>> >> have the notion of a guest going to an invalid state because of mode
>> >> switching, so I don't think it'd be a good idea to move emulation into
>> >> a generic layer. Please correct me if I am wrong
>> >
>> > Right. But all i am asking is to emulate one instruction at a
>> > time in handle_invalid_guest_state, instead of looping until
>> > guest_state_valid(vcpu).
>> >
>> > So you get rid of schedule(), the check for signal_pending, etc.
>>
>> But we'll still need to enter the guest when it's in a valid state, so
>> we need to move that loop somewhere,
>
> Sure, just set vmx->emulation_required = guest_state_valid(vcpu). When
> the state is good, the entry handler will vmentry.
>
>> and now that we still have a loop
>> we'll also still need to do the pending signals and scheduling checks,
>> no?
>
> Point is you can use the __vcpu_run loop.
>
> In the latest patch you do:
>
> +       /* Don't enter VMX if guest state is invalid, let the exit handler
> +          start emulation until we arrive back to a valid state */
> +       if (vmx->emulation_required && emulate_invalid_guest_state)
>                return;
>
> And then emulate in the exit handler.
>
>> I'd appreciate any suggestions you have to alleviate this.
>
> I fail to see why you need an internal loop if you can use the external
> (__vcpu_run) one.

Because it's not just used by VMX. So I don't think it'd be wise to
use it for something that's VMX-specific.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> On 09/01/2009 01:42 PM, Jan Kiszka wrote:
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 891234b..9e3acbd 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>
>>> kvm_guest_enter();
>>>
>>> -   get_debugreg(vcpu->arch.host_dr6, 6);
>>> -   get_debugreg(vcpu->arch.host_dr7, 7);
>>> if (unlikely(vcpu->arch.switch_db_regs)) {
>>> -   get_debugreg(vcpu->arch.host_db[0], 0);
>>> -   get_debugreg(vcpu->arch.host_db[1], 1);
>>> -   get_debugreg(vcpu->arch.host_db[2], 2);
>>> -   get_debugreg(vcpu->arch.host_db[3], 3);
>>> -
>>>  
>> I'm fine with this for the common case, but we should switch to the safe
>> pattern on test_thread_flag(TIF_DEBUG), ie. if someone debugs qemu using
>> hardware break/watchpoints.
>>
> 
> IIUC, thread.debugregs should be synced with the hardware registers.  
> The kernel itself only writes to the debug registers, so we're safe 
> doing the same.

Yep, confirmed. There is only an unsync'ed window between debug trap
handling and signal injection, but I cannot imagine then kvm could
intercept this path with a guest entry.

> 
>>
>>> set_debugreg(0, 7);
>>> set_debugreg(vcpu->arch.eff_db[0], 0);
>>> set_debugreg(vcpu->arch.eff_db[1], 1);
>>> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>> trace_kvm_entry(vcpu->vcpu_id);
>>> kvm_x86_ops->run(vcpu);
>>>
>>> -   if (unlikely(vcpu->arch.switch_db_regs)) {
>>> -   set_debugreg(0, 7);
>>> -   set_debugreg(vcpu->arch.host_db[0], 0);
>>> -   set_debugreg(vcpu->arch.host_db[1], 1);
>>> -   set_debugreg(vcpu->arch.host_db[2], 2);
>>> -   set_debugreg(vcpu->arch.host_db[3], 3);
>>> +   if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
>>> {
>>>  
>> IIRC, you must clear dr7 before loading dr0..3 to avoid spurious effects.
>>
> 
> Yeah, I thought of it but forgot to implement it.
> 
>>> +   set_debugreg(current->thread.debugreg0, 0);
>>> +   set_debugreg(current->thread.debugreg1, 1);
>>> +   set_debugreg(current->thread.debugreg2, 2);
>>> +   set_debugreg(current->thread.debugreg3, 3);
>>> +   set_debugreg(current->thread.debugreg6, 6);
>>> +   set_debugreg(current->thread.debugreg7, 7);
>>> }
>>> -   set_debugreg(vcpu->arch.host_dr6, 6);
>>> -   set_debugreg(vcpu->arch.host_dr7, 7);
>>>  
>> Unless I miss something ATM, moving this into the if-clause should cause
>> troubles if the guest leaves dr7 armed behind. But I need to refresh my
>> memory on this.
>>
> 
> The hardware will clear dr7 on exit.

Given this fact, I wonder why you want to reload host dr0..3 on
vcpu->arch.switch_db_regs. if (unlikely(test_thread_flag(TIF_DEBUG)))
should suffice then, right?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 03:08:55PM +0200, Mohammed Gamal wrote:
> On Tue, Sep 1, 2009 at 2:18 PM, Marcelo Tosatti wrote:
> > On Tue, Sep 01, 2009 at 02:14:17PM +0200, Mohammed Gamal wrote:
> >> On Tue, Sep 1, 2009 at 1:48 PM, Marcelo Tosatti wrote:
> >> > On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
> >> >> - Change returned handle_invalid_guest_state() to return relevant exit 
> >> >> codes
> >> >> - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
> >> >> - Return to userspace instead of repeatedly trying to emulate 
> >> >> instructions that have already failed
> >> >>
> >> >> Signed-off-by: Mohammed Gamal 
> >> >
> >> > Mohammed,
> >> >
> >> > The handle_invalid_guest_state loop is potentially problematic. It would
> >> > be more appropriate to use the __vcpu_run loop.
> >> >
> >> > Can't you set vmx->emulation_required depending on the result
> >> > of one call to emulate_instruction and get rid of the while
> >> > (!guest_state_valid(vcpu)) loop?
> >> >
> >>
> >> Invalid state emulation is VMX-specfic, while the __vcpu_run loop is
> >> independent of the virtualization extension (defined in x86.c), no?
> >> AMD SVM can comforably run hosts in big-real mode and thus it doesn't
> >> have the notion of a guest going to an invalid state because of mode
> >> switching, so I don't think it'd be a good idea to move emulation into
> >> a generic layer. Please correct me if I am wrong
> >
> > Right. But all i am asking is to emulate one instruction at a
> > time in handle_invalid_guest_state, instead of looping until
> > guest_state_valid(vcpu).
> >
> > So you get rid of schedule(), the check for signal_pending, etc.
> 
> But we'll still need to enter the guest when it's in a valid state, so
> we need to move that loop somewhere, 

Sure, just set vmx->emulation_required = guest_state_valid(vcpu). When
the state is good, the entry handler will vmentry.

> and now that we still have a loop
> we'll also still need to do the pending signals and scheduling checks,
> no?

Point is you can use the __vcpu_run loop.

In the latest patch you do:

+   /* Don't enter VMX if guest state is invalid, let the exit handler
+  start emulation until we arrive back to a valid state */
+   if (vmx->emulation_required && emulate_invalid_guest_state)
return;

And then emulate in the exit handler.

> I'd appreciate any suggestions you have to alleviate this.

I fail to see why you need an internal loop if you can use the external
(__vcpu_run) one.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86 emulator: Add pusha and popa instructions

2009-09-01 Thread Mohammed Gamal
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

Signed-off-by: Mohammed Gamal 
---
 arch/x86/kvm/emulate.c |   49 +++-
 1 files changed, 48 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db0820d..7ca39f8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -139,7 +139,8 @@ static u32 opcode_table[256] = {
DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
/* 0x60 - 0x67 */
-   0, 0, 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
+   ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
+   0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
0, 0, 0, 0,
/* 0x68 - 0x6F */
SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
@@ -1225,6 +1226,44 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt 
*ctxt,
return rc;
 }
 
+static void emulate_pusha(struct x86_emulate_ctxt *ctxt)
+{
+   struct decode_cache *c = &ctxt->decode;
+   unsigned long old_esp = c->regs[VCPU_REGS_RSP];
+   int reg = VCPU_REGS_RAX;
+
+   while (reg <= VCPU_REGS_RDI) {
+   (reg == VCPU_REGS_RSP) ? 
+   (c->src.val = old_esp) : (c->src.val = c->regs[reg]);
+
+   emulate_push(ctxt);
+   ++reg;
+   }
+}
+
+static int emulate_popa(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops)
+{
+   struct decode_cache *c = &ctxt->decode;
+   unsigned long old_esp = c->regs[VCPU_REGS_RSP];
+   int rc = 0;
+   int reg = VCPU_REGS_RDI;
+
+   while (reg >= VCPU_REGS_RAX) {
+   if (reg == VCPU_REGS_RSP) {
+   register_address_increment(c, &c->regs[VCPU_REGS_RSP],
+   c->op_bytes);
+   --reg;
+   }
+
+   rc = emulate_pop(ctxt, ops, &c->regs[reg], c->op_bytes);
+   if (rc != 0)
+   break;
+   --reg;
+   }
+   return rc;
+}
+
 static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
 {
@@ -1816,6 +1855,14 @@ special_insn:
if (rc != 0)
goto done;
break;
+   case 0x60:  /* pusha */
+   emulate_pusha(ctxt);
+   break;
+   case 0x61:  /* popa */
+   rc = emulate_popa(ctxt, ops);
+   if (rc != 0)
+   goto done;
+   break; 
case 0x63:  /* movsxd */
if (ctxt->mode != X86EMUL_MODE_PROT64)
goto cannot_emulate;
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> Only reload debug register 6 if we're running with the guest's
> debug registers.  Saves around 150 cycles from the guest lightweight
> exit path.
> 
> dr6 contains a couple of bits that are updated on #DB, so intercept
> that unconditionally and update those bits then.
> 
> Signed-off-by: Avi Kivity 
> ---
> v2: trap #DB so we maintain the TF related bits of DR7.
> 
>  arch/x86/kvm/vmx.c |   14 +-
>  1 files changed, 9 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 05cd554..d4be978 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -540,10 +540,12 @@ static void update_exception_bitmap(struct kvm_vcpu 
> *vcpu)
>   eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR);
>   if (!vcpu->fpu_active)
>   eb |= 1u << NM_VECTOR;
> + /*
> +  * Unconditionally intercept #DB so we can maintain dr6 without
> +  * reading it every exit.
> +  */
> + eb |= 1u << DB_VECTOR;

If this is safe...

>   if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
> - if (vcpu->guest_debug &
> - (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> - eb |= 1u << DB_VECTOR;
>   if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
>   eb |= 1u << BP_VECTOR;
>   }
> @@ -3629,7 +3631,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>*/
>   vmcs_writel(HOST_CR0, read_cr0());
>  
> - set_debugreg(vcpu->arch.dr6, 6);
> + if (vcpu->arch.switch_db_regs)
> + set_debugreg(vcpu->arch.dr6, 6);
>  
>   asm(
>   /* Store host registers */
> @@ -3731,7 +3734,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> | (1 << VCPU_EXREG_PDPTR));
>   vcpu->arch.regs_dirty = 0;
>  
> - get_debugreg(vcpu->arch.dr6, 6);
> + if (vcpu->arch.switch_db_regs)
> + get_debugreg(vcpu->arch.dr6, 6);
>  
>   vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>   if (vmx->rmode.irq.pending)

...I wonder why we cannot drop this save/restore? The guest is not able
to access dr6 without causing a trap, thus will never see dr6 as stored
in the hardware.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Handle emulation failure in userspace

2009-09-01 Thread Mohammed Gamal
Since we return to userspace from KVM on invalid state emulation failure, let
qemu handle it.

Signed-off-by: Mohammed Gamal 
---
 qemu-kvm.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index b59e403..090a3ae 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1029,6 +1029,14 @@ int kvm_run(kvm_vcpu_context_t vcpu, void *env)
 r = kvm_s390_handle_reset(kvm, vcpu, run);
 break;
 #endif
+   case KVM_EXIT_INTERNAL_ERROR:
+   fprintf(stderr, "KVM internal error. Suberror: %d\n",
+   run->internal.suberror);
+   kvm_show_regs(vcpu);
+   if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION)
+   fprintf(stderr, "emulation failure, check dmesg for details\n");
+   abort();
+   break;
 default:
 if (kvm_arch_run(vcpu)) {
 fprintf(stderr, "unhandled vm exit: 0x%x\n", run->exit_reason);
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Mohammed Gamal
On Tue, Sep 1, 2009 at 2:18 PM, Marcelo Tosatti wrote:
> On Tue, Sep 01, 2009 at 02:14:17PM +0200, Mohammed Gamal wrote:
>> On Tue, Sep 1, 2009 at 1:48 PM, Marcelo Tosatti wrote:
>> > On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
>> >> - Change returned handle_invalid_guest_state() to return relevant exit 
>> >> codes
>> >> - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
>> >> - Return to userspace instead of repeatedly trying to emulate 
>> >> instructions that have already failed
>> >>
>> >> Signed-off-by: Mohammed Gamal 
>> >
>> > Mohammed,
>> >
>> > The handle_invalid_guest_state loop is potentially problematic. It would
>> > be more appropriate to use the __vcpu_run loop.
>> >
>> > Can't you set vmx->emulation_required depending on the result
>> > of one call to emulate_instruction and get rid of the while
>> > (!guest_state_valid(vcpu)) loop?
>> >
>>
>> Invalid state emulation is VMX-specfic, while the __vcpu_run loop is
>> independent of the virtualization extension (defined in x86.c), no?
>> AMD SVM can comforably run hosts in big-real mode and thus it doesn't
>> have the notion of a guest going to an invalid state because of mode
>> switching, so I don't think it'd be a good idea to move emulation into
>> a generic layer. Please correct me if I am wrong
>
> Right. But all i am asking is to emulate one instruction at a
> time in handle_invalid_guest_state, instead of looping until
> guest_state_valid(vcpu).
>
> So you get rid of schedule(), the check for signal_pending, etc.

But we'll still need to enter the guest when it's in a valid state, so
we need to move that loop somewhere, and now that we still have a loop
we'll also still need to do the pending signals and scheduling checks,
no?

I'd appreciate any suggestions you have to alleviate this.
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity
Only reload debug register 6 if we're running with the guest's
debug registers.  Saves around 150 cycles from the guest lightweight
exit path.

dr6 contains a couple of bits that are updated on #DB, so intercept
that unconditionally and update those bits then.

Signed-off-by: Avi Kivity 
---
v2: trap #DB so we maintain the TF related bits of DR7.

 arch/x86/kvm/vmx.c |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 05cd554..d4be978 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -540,10 +540,12 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR);
if (!vcpu->fpu_active)
eb |= 1u << NM_VECTOR;
+   /*
+* Unconditionally intercept #DB so we can maintain dr6 without
+* reading it every exit.
+*/
+   eb |= 1u << DB_VECTOR;
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
-   if (vcpu->guest_debug &
-   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
-   eb |= 1u << DB_VECTOR;
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
eb |= 1u << BP_VECTOR;
}
@@ -3629,7 +3631,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 */
vmcs_writel(HOST_CR0, read_cr0());
 
-   set_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   set_debugreg(vcpu->arch.dr6, 6);
 
asm(
/* Store host registers */
@@ -3731,7 +3734,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
  | (1 << VCPU_EXREG_PDPTR));
vcpu->arch.regs_dirty = 0;
 
-   get_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   get_debugreg(vcpu->arch.dr6, 6);
 
vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
if (vmx->rmode.irq.pending)
-- 
1.6.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] x86 emulator: Add pusha and popa instructions

2009-09-01 Thread Mohammed Gamal
On Tue, Sep 1, 2009 at 2:57 PM, Marcelo Tosatti wrote:
> On Tue, Sep 01, 2009 at 03:26:30AM +0200, Mohammed Gamal wrote:
>> This adds pusha and popa instructions (opcodes 0x60-0x61), this enables 
>> booting
>> MINIX with invalid guest state emulation on.
>>
>> Signed-off-by: Mohammed Gamal 
>> ---
>>  arch/x86/kvm/emulate.c |   52 
>> +++-
>>  1 files changed, 51 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index db0820d..9be2e6e 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -139,7 +139,8 @@ static u32 opcode_table[256] = {
>>       DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
>>       DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
>>       /* 0x60 - 0x67 */
>> -     0, 0, 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
>> +     ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
>> +     0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
>>       0, 0, 0, 0,
>>       /* 0x68 - 0x6F */
>>       SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
>> @@ -1225,6 +1226,47 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt 
>> *ctxt,
>>       return rc;
>>  }
>>
>> +static void emulate_pusha(struct x86_emulate_ctxt *ctxt)
>> +{
>> +     struct decode_cache *c = &ctxt->decode;
>> +     unsigned long old_esp = c->regs[VCPU_REGS_RSP];
>> +     int reg = VCPU_REGS_RAX;
>> +
>> +     while (reg <= VCPU_REGS_RDI) {
>> +             (reg == VCPU_REGS_RSP) ?
>> +             (c->src.val = old_esp) : (c->src.val = c->regs[reg]);
>> +
>> +             emulate_push(ctxt);
>> +             ++reg;
>> +     }
>> +}
>> +
>> +static int emulate_popa(struct x86_emulate_ctxt *ctxt,
>> +                     struct x86_emulate_ops *ops)
>> +{
>> +     struct decode_cache *c = &ctxt->decode;
>> +     unsigned long old_esp = c->regs[VCPU_REGS_RSP];
>> +     int rc = 0;
>> +     int reg = VCPU_REGS_RDI;
>> +
>> +     while (reg >= VCPU_REGS_RAX) {
>> +             if (reg == VCPU_REGS_RSP) {
>> +                     register_address_increment(c, &c->regs[VCPU_REGS_RSP],
>> +                                                     c->op_bytes);
>> +                     --reg;
>> +             }
>> +
>> +             rc = emulate_pop(ctxt, ops, &c->regs[reg], c->op_bytes);
>> +             if (rc != 0) {
>> +                     c->regs[VCPU_REGS_RSP] = old_esp;
>
> Why is it necessary to restore the local VCPU_REGS_RSP copy on failure?

Right. Perhaps this is a little too paranoid since we'll fail anyway
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] x86 emulator: Add pusha and popa instructions

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 03:26:30AM +0200, Mohammed Gamal wrote:
> This adds pusha and popa instructions (opcodes 0x60-0x61), this enables 
> booting
> MINIX with invalid guest state emulation on.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  arch/x86/kvm/emulate.c |   52 
> +++-
>  1 files changed, 51 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index db0820d..9be2e6e 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -139,7 +139,8 @@ static u32 opcode_table[256] = {
>   DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
>   DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
>   /* 0x60 - 0x67 */
> - 0, 0, 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
> + ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
> + 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
>   0, 0, 0, 0,
>   /* 0x68 - 0x6F */
>   SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
> @@ -1225,6 +1226,47 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt 
> *ctxt,
>   return rc;
>  }
>  
> +static void emulate_pusha(struct x86_emulate_ctxt *ctxt)
> +{
> + struct decode_cache *c = &ctxt->decode;
> + unsigned long old_esp = c->regs[VCPU_REGS_RSP];
> + int reg = VCPU_REGS_RAX;
> +
> + while (reg <= VCPU_REGS_RDI) {
> + (reg == VCPU_REGS_RSP) ? 
> + (c->src.val = old_esp) : (c->src.val = c->regs[reg]);
> +
> + emulate_push(ctxt);
> + ++reg;
> + }
> +}
> +
> +static int emulate_popa(struct x86_emulate_ctxt *ctxt,
> + struct x86_emulate_ops *ops)
> +{
> + struct decode_cache *c = &ctxt->decode;
> + unsigned long old_esp = c->regs[VCPU_REGS_RSP];
> + int rc = 0;
> + int reg = VCPU_REGS_RDI;
> +
> + while (reg >= VCPU_REGS_RAX) {
> + if (reg == VCPU_REGS_RSP) {
> + register_address_increment(c, &c->regs[VCPU_REGS_RSP],
> + c->op_bytes);
> + --reg;
> + }
> +
> + rc = emulate_pop(ctxt, ops, &c->regs[reg], c->op_bytes);
> + if (rc != 0) {
> + c->regs[VCPU_REGS_RSP] = old_esp;

Why is it necessary to restore the local VCPU_REGS_RSP copy on failure?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Handle emulation failure in userspace

2009-09-01 Thread Mohammed Gamal
On Tue, Sep 1, 2009 at 2:31 PM, Marcelo Tosatti wrote:
> On Fri, Aug 28, 2009 at 04:48:53PM +0200, Mohammed Gamal wrote:
>> Since we return to userspace from KVM on invalid state emulation failure, let
>> qemu handle it.
>>
>> Signed-off-by: Mohammed Gamal 
>> ---
>>  qemu-kvm.c |    8 
>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/qemu-kvm.c b/qemu-kvm.c
>> index b59e403..a1648e0 100644
>> --- a/qemu-kvm.c
>> +++ b/qemu-kvm.c
>> @@ -1029,6 +1029,14 @@ int kvm_run(kvm_vcpu_context_t vcpu, void *env)
>>              r = kvm_s390_handle_reset(kvm, vcpu, run);
>>              break;
>>  #endif
>> +     case KVM_EXIT_INTERNAL_ERROR:
>> +         kvm_show_regs(vcpu);
>> +         fprintf(stderr, "\nKVM internal error. Suberror: %d\n",
>> +                 run->internal.suberror);
>> +         if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION)
>> +             fprintf(stderr, "emulation failure, check dmesg for 
>> details\n");
>> +         abort();
>> +         break;
>>          default:
>>              if (kvm_arch_run(vcpu)) {
>>                  fprintf(stderr, "unhandled vm exit: 0x%x\n", 
>> run->exit_reason);
>> --
>> 1.6.0.4
>
> The common practice is to print msg first and then kvm_show_regs?
True. I just thought the message would be more visible this way. Will resend

> Applied the kvm.h update, thanks.
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity

On 09/01/2009 03:32 PM, Jan Kiszka wrote:

I'm worried about vm-exits that may take precedence over the #db trap.
If we skip to save/restore dr6 for them, the value that the interception
handler sees later on will be bogus. Or is this architecturally impossible?
   


I think it's architectually impossible.  Debug traps take effect after 
the instruction executed, so if we made it to that point, nothing else 
can cause the vmexit.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> On 09/01/2009 02:43 PM, Jan Kiszka wrote:
>> @@ -3731,7 +3732,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>> | (1<<  VCPU_EXREG_PDPTR));
>>> vcpu->arch.regs_dirty = 0;
>>>
>>> -   get_debugreg(vcpu->arch.dr6, 6);
>>> +   if (vcpu->arch.switch_db_regs)
>>> +   get_debugreg(vcpu->arch.dr6, 6);
>>>
>>> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>>> if (vmx->rmode.irq.pending)
>>>  
>> That reduces the emulation quality as vcpu->arch.switch_db_regs is only
>> set if some breakpoint is active while dr6 has its use also when that is
>> not the case).
>>
> 
> True - there's the TF reason reporting bits.
> 
> How about this then:
> 
> - if !switch_db_regs, trap #DB
> - on #DB trap, copy DR6.BS and DR6.BT to vcpu->arch.dr6, and reinject 
> the #DB
> 
> ?

I'm worried about vm-exits that may take precedence over the #db trap.
If we skip to save/restore dr6 for them, the value that the interception
handler sees later on will be bogus. Or is this architecturally impossible?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Handle emulation failure in userspace

2009-09-01 Thread Marcelo Tosatti
On Fri, Aug 28, 2009 at 04:48:53PM +0200, Mohammed Gamal wrote:
> Since we return to userspace from KVM on invalid state emulation failure, let
> qemu handle it.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  qemu-kvm.c |8 
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index b59e403..a1648e0 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1029,6 +1029,14 @@ int kvm_run(kvm_vcpu_context_t vcpu, void *env)
>  r = kvm_s390_handle_reset(kvm, vcpu, run);
>  break;
>  #endif
> + case KVM_EXIT_INTERNAL_ERROR:
> + kvm_show_regs(vcpu);
> + fprintf(stderr, "\nKVM internal error. Suberror: %d\n",
> + run->internal.suberror);
> + if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION)
> + fprintf(stderr, "emulation failure, check dmesg for details\n");
> + abort();
> + break;
>  default:
>  if (kvm_arch_run(vcpu)) {
>  fprintf(stderr, "unhandled vm exit: 0x%x\n", 
> run->exit_reason);
> -- 
> 1.6.0.4

The common practice is to print msg first and then kvm_show_regs?
Applied the kvm.h update, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 03:02 PM, Avi Kivity wrote:

Then we need to resolve it right from the start, not in some distant
future. We already have everything we need available, we could just try
to refactor it to a nicer interface (something that encapsulates both
compile and runtime tests).



I'll look at doing something.  It shouldn't be difficult.



btw, it looks like kgdb is already broken wrt prtace hardware breakpoints.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 02:14:17PM +0200, Mohammed Gamal wrote:
> On Tue, Sep 1, 2009 at 1:48 PM, Marcelo Tosatti wrote:
> > On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
> >> - Change returned handle_invalid_guest_state() to return relevant exit 
> >> codes
> >> - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
> >> - Return to userspace instead of repeatedly trying to emulate instructions 
> >> that have already failed
> >>
> >> Signed-off-by: Mohammed Gamal 
> >
> > Mohammed,
> >
> > The handle_invalid_guest_state loop is potentially problematic. It would
> > be more appropriate to use the __vcpu_run loop.
> >
> > Can't you set vmx->emulation_required depending on the result
> > of one call to emulate_instruction and get rid of the while
> > (!guest_state_valid(vcpu)) loop?
> >
> 
> Invalid state emulation is VMX-specfic, while the __vcpu_run loop is
> independent of the virtualization extension (defined in x86.c), no?
> AMD SVM can comforably run hosts in big-real mode and thus it doesn't
> have the notion of a guest going to an invalid state because of mode
> switching, so I don't think it'd be a good idea to move emulation into
> a generic layer. Please correct me if I am wrong

Right. But all i am asking is to emulate one instruction at a
time in handle_invalid_guest_state, instead of looping until
guest_state_valid(vcpu).

So you get rid of schedule(), the check for signal_pending, etc.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Mohammed Gamal
On Tue, Sep 1, 2009 at 1:48 PM, Marcelo Tosatti wrote:
> On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
>> - Change returned handle_invalid_guest_state() to return relevant exit codes
>> - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
>> - Return to userspace instead of repeatedly trying to emulate instructions 
>> that have already failed
>>
>> Signed-off-by: Mohammed Gamal 
>
> Mohammed,
>
> The handle_invalid_guest_state loop is potentially problematic. It would
> be more appropriate to use the __vcpu_run loop.
>
> Can't you set vmx->emulation_required depending on the result
> of one call to emulate_instruction and get rid of the while
> (!guest_state_valid(vcpu)) loop?
>

Invalid state emulation is VMX-specfic, while the __vcpu_run loop is
independent of the virtualization extension (defined in x86.c), no?
AMD SVM can comforably run hosts in big-real mode and thus it doesn't
have the notion of a guest going to an invalid state because of mode
switching, so I don't think it'd be a good idea to move emulation into
a generic layer. Please correct me if I am wrong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 03:01 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 09/01/2009 02:45 PM, Jan Kiszka wrote:
 

Avi Kivity wrote:

   

On 09/01/2009 02:33 PM, Jan Kiszka wrote:

 

For the time being, falling back to conservative save/restore if kgdb is
configured in and kgdb_connected != 0 should allow us to apply this
optimization otherwise.


   

I prefer a real fix instead of spaghettiing the code this way (and in
the meanwhile I'm fine with kgdb hardware breakpoints not working while
kvm is running).

 

Then file a patch that makes CONFIG_KVM depend on !CONFIG_KGDB (or vice
versa). Silent breakages are evil.

   

Distros typically enable both.
 

Then we need to resolve it right from the start, not in some distant
future. We already have everything we need available, we could just try
to refactor it to a nicer interface (something that encapsulates both
compile and runtime tests).
   


I'll look at doing something.  It shouldn't be difficult.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> On 09/01/2009 02:45 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> On 09/01/2009 02:33 PM, Jan Kiszka wrote:
>>>  
 For the time being, falling back to conservative save/restore if kgdb is
 configured in and kgdb_connected != 0 should allow us to apply this
 optimization otherwise.


>>> I prefer a real fix instead of spaghettiing the code this way (and in
>>> the meanwhile I'm fine with kgdb hardware breakpoints not working while
>>> kvm is running).
>>>  
>> Then file a patch that makes CONFIG_KVM depend on !CONFIG_KGDB (or vice
>> versa). Silent breakages are evil.
>>
> 
> Distros typically enable both.

Then we need to resolve it right from the start, not in some distant
future. We already have everything we need available, we could just try
to refactor it to a nicer interface (something that encapsulates both
compile and runtime tests).

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:45 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 09/01/2009 02:33 PM, Jan Kiszka wrote:
 

For the time being, falling back to conservative save/restore if kgdb is
configured in and kgdb_connected != 0 should allow us to apply this
optimization otherwise.

   

I prefer a real fix instead of spaghettiing the code this way (and in
the meanwhile I'm fine with kgdb hardware breakpoints not working while
kvm is running).
 

Then file a patch that makes CONFIG_KVM depend on !CONFIG_KGDB (or vice
versa). Silent breakages are evil.
   


Distros typically enable both.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:43 PM, Jan Kiszka wrote:

@@ -3731,7 +3732,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)

| (1<<  VCPU_EXREG_PDPTR));
vcpu->arch.regs_dirty = 0;

-   get_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   get_debugreg(vcpu->arch.dr6, 6);

vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
if (vmx->rmode.irq.pending)
 

That reduces the emulation quality as vcpu->arch.switch_db_regs is only
set if some breakpoint is active while dr6 has its use also when that is
not the case).
   


True - there's the TF reason reporting bits.

How about this then:

- if !switch_db_regs, trap #DB
- on #DB trap, copy DR6.BS and DR6.BT to vcpu->arch.dr6, and reinject 
the #DB


?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 12:48:18PM +0200, Mohammed Gamal wrote:
> - Change returned handle_invalid_guest_state() to return relevant exit codes
> - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
> - Return to userspace instead of repeatedly trying to emulate instructions 
> that have already failed
> 
> Signed-off-by: Mohammed Gamal 

Mohammed,

The handle_invalid_guest_state loop is potentially problematic. It would
be more appropriate to use the __vcpu_run loop.

Can't you set vmx->emulation_required depending on the result
of one call to emulate_instruction and get rid of the while
(!guest_state_valid(vcpu)) loop?

> ---
>  arch/x86/kvm/vmx.c |   44 
>  1 files changed, 20 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 78101dd..6265098 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -107,7 +107,6 @@ struct vcpu_vmx {
>   } rmode;
>   int vpid;
>   bool emulation_required;
> - enum emulation_result invalid_state_emulation_result;
>  
>   /* Support for vnmi-less CPUs */
>   int soft_vnmi_blocked;
> @@ -3318,35 +3317,37 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
>   return 1;
>  }
>  
> -static void handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> +static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  {
>   struct vcpu_vmx *vmx = to_vmx(vcpu);
>   enum emulation_result err = EMULATE_DONE;
> -
> - local_irq_enable();
> - preempt_enable();
> + int ret = 1;
>  
>   while (!guest_state_valid(vcpu)) {
>   err = emulate_instruction(vcpu, 0, 0, 0);
>  
> - if (err == EMULATE_DO_MMIO)
> - break;
> + if (err == EMULATE_DO_MMIO) {
> + ret = 0;
> + goto out;
> + }
>  
>   if (err != EMULATE_DONE) {
>   kvm_report_emulation_failure(vcpu, "emulation failure");
> - break;
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = 
> KVM_INTERNAL_ERROR_EMULATION;
> + ret = 0;
> + goto out;
>   }
>  
>   if (signal_pending(current))
> - break;
> + goto out;
>   if (need_resched())
>   schedule();
>   }
>  
> - preempt_disable();
> - local_irq_disable();
> -
> - vmx->invalid_state_emulation_result = err;
> + vmx->emulation_required = 0;
> +out:
> + return ret;
>  }
>  
>  /*
> @@ -3402,13 +3403,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  
>   trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
>  
> - /* If we need to emulate an MMIO from handle_invalid_guest_state
> -  * we just return 0 */
> - if (vmx->emulation_required && emulate_invalid_guest_state) {
> - if (guest_state_valid(vcpu))
> - vmx->emulation_required = 0;
> - return vmx->invalid_state_emulation_result != EMULATE_DO_MMIO;
> - }
> + /* If guest state is invalid, start emulating */
> + if (vmx->emulation_required && emulate_invalid_guest_state)
> + return handle_invalid_guest_state(vcpu);
>  
>   /* Access CR3 don't cause VMExit in paging mode, so we need
>* to sync with guest real CR3. */
> @@ -3603,11 +3600,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
>   vmx->entry_time = ktime_get();
>  
> - /* Handle invalid guest state instead of entering VMX */
> - if (vmx->emulation_required && emulate_invalid_guest_state) {
> - handle_invalid_guest_state(vcpu);
> + /* Don't enter VMX if guest state is invalid, let the exit handler
> +start emulation until we arrive back to a valid state */
> + if (vmx->emulation_required && emulate_invalid_guest_state)
>   return;
> - }
>  
>   if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
>   vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
> -- 
> 1.6.0.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> On 09/01/2009 02:33 PM, Jan Kiszka wrote:
>> For the time being, falling back to conservative save/restore if kgdb is
>> configured in and kgdb_connected != 0 should allow us to apply this
>> optimization otherwise.
>>
> 
> I prefer a real fix instead of spaghettiing the code this way (and in 
> the meanwhile I'm fine with kgdb hardware breakpoints not working while 
> kvm is running).

Then file a patch that makes CONFIG_KVM depend on !CONFIG_KGDB (or vice
versa). Silent breakages are evil.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> Only reload debug register 6 if we're running with the guest's
> debug registers.  Saves around 150 cycles from the guest lightweight
> exit path.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/vmx.c |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 05cd554..70b0c54 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3629,7 +3629,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>*/
>   vmcs_writel(HOST_CR0, read_cr0());
>  
> - set_debugreg(vcpu->arch.dr6, 6);
> + if (vcpu->arch.switch_db_regs)
> + set_debugreg(vcpu->arch.dr6, 6);
>  
>   asm(
>   /* Store host registers */
> @@ -3731,7 +3732,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> | (1 << VCPU_EXREG_PDPTR));
>   vcpu->arch.regs_dirty = 0;
>  
> - get_debugreg(vcpu->arch.dr6, 6);
> + if (vcpu->arch.switch_db_regs)
> + get_debugreg(vcpu->arch.dr6, 6);
>  
>   vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>   if (vmx->rmode.irq.pending)

That reduces the emulation quality as vcpu->arch.switch_db_regs is only
set if some breakpoint is active while dr6 has its use also when that is
not the case).

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:33 PM, Jan Kiszka wrote:

For the time being, falling back to conservative save/restore if kgdb is
configured in and kgdb_connected != 0 should allow us to apply this
optimization otherwise.
   


I prefer a real fix instead of spaghettiing the code this way (and in 
the meanwhile I'm fine with kgdb hardware breakpoints not working while 
kvm is running).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Conditionally reload debug register 6

2009-09-01 Thread Avi Kivity
Only reload debug register 6 if we're running with the guest's
debug registers.  Saves around 150 cycles from the guest lightweight
exit path.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 05cd554..70b0c54 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3629,7 +3629,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 */
vmcs_writel(HOST_CR0, read_cr0());
 
-   set_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   set_debugreg(vcpu->arch.dr6, 6);
 
asm(
/* Store host registers */
@@ -3731,7 +3732,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
  | (1 << VCPU_EXREG_PDPTR));
vcpu->arch.regs_dirty = 0;
 
-   get_debugreg(vcpu->arch.dr6, 6);
+   if (vcpu->arch.switch_db_regs)
+   get_debugreg(vcpu->arch.dr6, 6);
 
vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
if (vmx->rmode.irq.pending)
-- 
1.6.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:32 PM, Marcelo Tosatti wrote:



Nope, kgdb writes directly to the debug registers.

I vaguely recall someone trying to push a debug register management
framework. Did it hit mainline in the meantime? I do not find any trace
on quick glance, at least not in kgdb.
 

A simple kgdb_enabled sort of flag, in addition to TIF_DEBUG would do
it?
   


kgdb is per-cpu whereas user debugging is per-task, so we'd need a 
per-cpu debug register cache and flag.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:22 PM, Marcelo Tosatti wrote:

On Tue, Sep 01, 2009 at 12:44:08PM +0300, Avi Kivity wrote:
   

Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
 

Is this kgdb safe?
   


I don't think so - kgdb seems to access the debug registers directly.  I 
think we can live with it at the moment, and later add an API to 
arbitrate debug register usage among the various users (kgdb seems to be 
unsafe wrt userspace debug as well).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Jan Kiszka wrote:
> Marcelo Tosatti wrote:
>> On Tue, Sep 01, 2009 at 12:44:08PM +0300, Avi Kivity wrote:
>>> Instead of saving the debug registers from the processor to a kvm data
>>> structure, rely in the debug registers stored in the thread structure.
>>> This allows us not to save dr6 and dr7.
>>>
>>> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
>> Is this kgdb safe?
> 
> Nope, kgdb writes directly to the debug registers.
> 
> I vaguely recall someone trying to push a debug register management
> framework. Did it hit mainline in the meantime? I do not find any trace
> on quick glance, at least not in kgdb.

For the time being, falling back to conservative save/restore if kgdb is
configured in and kgdb_connected != 0 should allow us to apply this
optimization otherwise.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 01:28:46PM +0200, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Tue, Sep 01, 2009 at 12:44:08PM +0300, Avi Kivity wrote:
> >> Instead of saving the debug registers from the processor to a kvm data
> >> structure, rely in the debug registers stored in the thread structure.
> >> This allows us not to save dr6 and dr7.
> >>
> >> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
> > 
> > Is this kgdb safe?
> 
> Nope, kgdb writes directly to the debug registers.
> 
> I vaguely recall someone trying to push a debug register management
> framework. Did it hit mainline in the meantime? I do not find any trace
> on quick glance, at least not in kgdb.

A simple kgdb_enabled sort of flag, in addition to TIF_DEBUG would do
it?

> Jan
> 
> > 
> >> Signed-off-by: Avi Kivity 
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |3 ---
> >>  arch/x86/kvm/x86.c  |   22 +++---
> >>  2 files changed, 7 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h 
> >> b/arch/x86/include/asm/kvm_host.h
> >> index 6046e6f..45226f0 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
> >>u32 pat;
> >>  
> >>int switch_db_regs;
> >> -  unsigned long host_db[KVM_NR_DB_REGS];
> >> -  unsigned long host_dr6;
> >> -  unsigned long host_dr7;
> >>unsigned long db[KVM_NR_DB_REGS];
> >>unsigned long dr6;
> >>unsigned long dr7;
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 891234b..9e3acbd 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >>  
> >>kvm_guest_enter();
> >>  
> >> -  get_debugreg(vcpu->arch.host_dr6, 6);
> >> -  get_debugreg(vcpu->arch.host_dr7, 7);
> >>if (unlikely(vcpu->arch.switch_db_regs)) {
> >> -  get_debugreg(vcpu->arch.host_db[0], 0);
> >> -  get_debugreg(vcpu->arch.host_db[1], 1);
> >> -  get_debugreg(vcpu->arch.host_db[2], 2);
> >> -  get_debugreg(vcpu->arch.host_db[3], 3);
> >> -
> >>set_debugreg(0, 7);
> >>set_debugreg(vcpu->arch.eff_db[0], 0);
> >>set_debugreg(vcpu->arch.eff_db[1], 1);
> >> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >>trace_kvm_entry(vcpu->vcpu_id);
> >>kvm_x86_ops->run(vcpu);
> >>  
> >> -  if (unlikely(vcpu->arch.switch_db_regs)) {
> >> -  set_debugreg(0, 7);
> >> -  set_debugreg(vcpu->arch.host_db[0], 0);
> >> -  set_debugreg(vcpu->arch.host_db[1], 1);
> >> -  set_debugreg(vcpu->arch.host_db[2], 2);
> >> -  set_debugreg(vcpu->arch.host_db[3], 3);
> >> +  if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
> >> {
> >> +  set_debugreg(current->thread.debugreg0, 0);
> >> +  set_debugreg(current->thread.debugreg1, 1);
> >> +  set_debugreg(current->thread.debugreg2, 2);
> >> +  set_debugreg(current->thread.debugreg3, 3);
> >> +  set_debugreg(current->thread.debugreg6, 6);
> >> +  set_debugreg(current->thread.debugreg7, 7);
> >>}
> >> -  set_debugreg(vcpu->arch.host_dr6, 6);
> >> -  set_debugreg(vcpu->arch.host_dr7, 7);
> >>  
> >>set_bit(KVM_REQ_KICK, &vcpu->requests);
> >>local_irq_enable();
> >> -- 
> >> 1.6.4.1
> 
> -- 
> Siemens AG, Corporate Technology, CT SE 2
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Marcelo Tosatti wrote:
> On Tue, Sep 01, 2009 at 12:44:08PM +0300, Avi Kivity wrote:
>> Instead of saving the debug registers from the processor to a kvm data
>> structure, rely in the debug registers stored in the thread structure.
>> This allows us not to save dr6 and dr7.
>>
>> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
> 
> Is this kgdb safe?

Nope, kgdb writes directly to the debug registers.

I vaguely recall someone trying to push a debug register management
framework. Did it hit mainline in the meantime? I do not find any trace
on quick glance, at least not in kgdb.

Jan

> 
>> Signed-off-by: Avi Kivity 
>> ---
>>  arch/x86/include/asm/kvm_host.h |3 ---
>>  arch/x86/kvm/x86.c  |   22 +++---
>>  2 files changed, 7 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 6046e6f..45226f0 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
>>  u32 pat;
>>  
>>  int switch_db_regs;
>> -unsigned long host_db[KVM_NR_DB_REGS];
>> -unsigned long host_dr6;
>> -unsigned long host_dr7;
>>  unsigned long db[KVM_NR_DB_REGS];
>>  unsigned long dr6;
>>  unsigned long dr7;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 891234b..9e3acbd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  
>>  kvm_guest_enter();
>>  
>> -get_debugreg(vcpu->arch.host_dr6, 6);
>> -get_debugreg(vcpu->arch.host_dr7, 7);
>>  if (unlikely(vcpu->arch.switch_db_regs)) {
>> -get_debugreg(vcpu->arch.host_db[0], 0);
>> -get_debugreg(vcpu->arch.host_db[1], 1);
>> -get_debugreg(vcpu->arch.host_db[2], 2);
>> -get_debugreg(vcpu->arch.host_db[3], 3);
>> -
>>  set_debugreg(0, 7);
>>  set_debugreg(vcpu->arch.eff_db[0], 0);
>>  set_debugreg(vcpu->arch.eff_db[1], 1);
>> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  trace_kvm_entry(vcpu->vcpu_id);
>>  kvm_x86_ops->run(vcpu);
>>  
>> -if (unlikely(vcpu->arch.switch_db_regs)) {
>> -set_debugreg(0, 7);
>> -set_debugreg(vcpu->arch.host_db[0], 0);
>> -set_debugreg(vcpu->arch.host_db[1], 1);
>> -set_debugreg(vcpu->arch.host_db[2], 2);
>> -set_debugreg(vcpu->arch.host_db[3], 3);
>> +if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
>> {
>> +set_debugreg(current->thread.debugreg0, 0);
>> +set_debugreg(current->thread.debugreg1, 1);
>> +set_debugreg(current->thread.debugreg2, 2);
>> +set_debugreg(current->thread.debugreg3, 3);
>> +set_debugreg(current->thread.debugreg6, 6);
>> +set_debugreg(current->thread.debugreg7, 7);
>>  }
>> -set_debugreg(vcpu->arch.host_dr6, 6);
>> -set_debugreg(vcpu->arch.host_dr7, 7);
>>  
>>  set_bit(KVM_REQ_KICK, &vcpu->requests);
>>  local_irq_enable();
>> -- 
>> 1.6.4.1

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] add sysctl for kvm wallclock sync

2009-09-01 Thread Glauber Costa
This patch introduces a new sysctl called kvm_sync_wallclock.

It controls the behaviour of the worker that updates guest wallclock time.
The worker will fire in periods specified by its value, if it is greater than 
zero,
and not fire at all otherwise.

Signed-off-by: Glauber Costa 
---
 arch/x86/include/asm/kvm_para.h |6 ++
 arch/x86/kernel/kvmclock.c  |   17 -
 kernel/sysctl.c |   13 +
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index b8a3305..3a3f38f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -47,8 +47,14 @@ struct kvm_mmu_op_release_pt {
 
 #ifdef __KERNEL__
 #include 
+#include 
 
 extern void kvmclock_init(void);
+extern unsigned int kvm_wall_update_interval;
+extern int kvm_sync_wall_handler(struct ctl_table *table, int write,
+  struct file *filp, void __user *buffer,
+  size_t *lenp, loff_t *ppos);
+
 
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index fc409e9..3a4e1bd 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,7 +27,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
-static unsigned int kvm_wall_update_interval = 5;
+unsigned int kvm_wall_update_interval = 5;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -91,6 +91,21 @@ static __init int init_updates(void)
  */
 late_initcall(init_updates);
 
+int kvm_sync_wall_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+   int ret  = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+
+   if (ret || !write)
+   return ret;
+
+   cancel_delayed_work_sync(&kvm_sync_wall_work);
+
+   schedule_next_update();
+   return 0;
+}
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 98e0232..b787c81 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -989,6 +990,18 @@ static struct ctl_table kern_table[] = {
.proc_handler   = &proc_dointvec,
},
 #endif
+#ifdef CONFIG_KVM_CLOCK
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = "kvm_sync_wallclock",
+   .data   = &kvm_wall_update_interval,
+   .maxlen = sizeof(kvm_wall_update_interval),
+   .mode   = 0644,
+   .proc_handler   = &kvm_sync_wall_handler,
+   .strategy   = &sysctl_intvec,
+   .extra1 = &zero,
+   },
+#endif
 
 /*
  * NOTE: do not add new entries to this table unless you have read
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] keep guest wallclock in sync with host clock

2009-09-01 Thread Glauber Costa
KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
However, the current mechanism will not propagate changes in wallclock value
upwards. This effectively means that in a large pool of VMs that need accurate 
timing,
all of them has to run NTP, instead of just the host doing it.

Since the host updates information in the shared memory area upon msr writes,
this patch introduces a worker that writes to that msr, and calls 
do_settimeofday
at fixed intervals, with second resolution. A interval of 0 determines that we
are not interested in this behaviour. A later patch will make this optional at
runtime

Signed-off-by: Glauber Costa 
---
 arch/x86/kernel/kvmclock.c |   62 +--
 1 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index e5efcdc..fc409e9 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static unsigned int kvm_wall_update_interval = 5;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -39,24 +40,67 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
-/*
- * The wallclock is the time of day when we booted. Since then, some time may
- * have elapsed since the hypervisor wrote the data. So we try to account for
- * that with system time
- */
-static unsigned long kvm_get_wallclock(void)
+static void kvm_get_wall_ts(struct timespec *ts)
 {
-   struct pvclock_vcpu_time_info *vcpu_time;
-   struct timespec ts;
int low, high;
+   struct pvclock_vcpu_time_info *vcpu_time;
 
low = (int)__pa_symbol(&wall_clock);
high = ((u64)__pa_symbol(&wall_clock) >> 32);
native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
 
vcpu_time = &get_cpu_var(hv_clock);
-   pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
+   pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
put_cpu_var(hv_clock);
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
+
+static void schedule_next_update(void)
+{
+   struct timespec next;
+
+   if (kvm_wall_update_interval == 0)
+   return;
+
+   next.tv_sec = kvm_wall_update_interval;
+   next.tv_nsec = 0;
+
+   schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work)
+{
+   struct timespec now;
+
+   kvm_get_wall_ts(&now);
+
+   do_settimeofday(&now);
+   schedule_next_update();
+}
+
+static __init int init_updates(void)
+{
+   schedule_next_update();
+   return 0;
+}
+/*
+ * It has to be run after workqueues are initialized, since we call
+ * schedule_delayed_work. Other than that, we have no specific requirements
+ */
+late_initcall(init_updates);
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that with system time
+ */
+static unsigned long kvm_get_wallclock(void)
+{
+   struct timespec ts;
+
+   kvm_get_wall_ts(&ts);
 
return ts.tv_sec;
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Automatically grab wallclock time updates from hypervisor

2009-09-01 Thread Glauber Costa
Hi folks,

In this proposed patch, I am introducing a worker fired by kvmclock that updates
guest wallclock periodically to reflect changes in the host's wallclock. With 
this
patch, a large pool of VMs will no longer have to run NTP in all of its guests.

The worker does that at a configurable interval, with a minimum granularity of 1
second. So, although not exactly cheap, the msr write needed to get an updated
wallclock value won't pose a heavy burden on the system.

It is also possible to disable it completely if this behaviour is undesired for
a specific scenario.

diffstat follows:

 arch/x86/include/asm/kvm_para.h |6 +++
 arch/x86/kernel/kvmclock.c  |   77 ++-
 kernel/sysctl.c |   13 +++
 3 files changed, 87 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Marcelo Tosatti
On Tue, Sep 01, 2009 at 12:44:08PM +0300, Avi Kivity wrote:
> Instead of saving the debug registers from the processor to a kvm data
> structure, rely in the debug registers stored in the thread structure.
> This allows us not to save dr6 and dr7.
> 
> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.

Is this kgdb safe?

> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/include/asm/kvm_host.h |3 ---
>  arch/x86/kvm/x86.c  |   22 +++---
>  2 files changed, 7 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6046e6f..45226f0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
>   u32 pat;
>  
>   int switch_db_regs;
> - unsigned long host_db[KVM_NR_DB_REGS];
> - unsigned long host_dr6;
> - unsigned long host_dr7;
>   unsigned long db[KVM_NR_DB_REGS];
>   unsigned long dr6;
>   unsigned long dr7;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 891234b..9e3acbd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>   kvm_guest_enter();
>  
> - get_debugreg(vcpu->arch.host_dr6, 6);
> - get_debugreg(vcpu->arch.host_dr7, 7);
>   if (unlikely(vcpu->arch.switch_db_regs)) {
> - get_debugreg(vcpu->arch.host_db[0], 0);
> - get_debugreg(vcpu->arch.host_db[1], 1);
> - get_debugreg(vcpu->arch.host_db[2], 2);
> - get_debugreg(vcpu->arch.host_db[3], 3);
> -
>   set_debugreg(0, 7);
>   set_debugreg(vcpu->arch.eff_db[0], 0);
>   set_debugreg(vcpu->arch.eff_db[1], 1);
> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   trace_kvm_entry(vcpu->vcpu_id);
>   kvm_x86_ops->run(vcpu);
>  
> - if (unlikely(vcpu->arch.switch_db_regs)) {
> - set_debugreg(0, 7);
> - set_debugreg(vcpu->arch.host_db[0], 0);
> - set_debugreg(vcpu->arch.host_db[1], 1);
> - set_debugreg(vcpu->arch.host_db[2], 2);
> - set_debugreg(vcpu->arch.host_db[3], 3);
> + if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
> {
> + set_debugreg(current->thread.debugreg0, 0);
> + set_debugreg(current->thread.debugreg1, 1);
> + set_debugreg(current->thread.debugreg2, 2);
> + set_debugreg(current->thread.debugreg3, 3);
> + set_debugreg(current->thread.debugreg6, 6);
> + set_debugreg(current->thread.debugreg7, 7);
>   }
> - set_debugreg(vcpu->arch.host_dr6, 6);
> - set_debugreg(vcpu->arch.host_dr7, 7);
>  
>   set_bit(KVM_REQ_KICK, &vcpu->requests);
>   local_irq_enable();
> -- 
> 1.6.4.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 02:16 PM, Avi Kivity wrote:



  set_debugreg(0, 7);
  set_debugreg(vcpu->arch.eff_db[0], 0);
  set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu 
*vcpu)

  trace_kvm_entry(vcpu->vcpu_id);
  kvm_x86_ops->run(vcpu);

-if (unlikely(vcpu->arch.switch_db_regs)) {
-set_debugreg(0, 7);
-set_debugreg(vcpu->arch.host_db[0], 0);
-set_debugreg(vcpu->arch.host_db[1], 1);
-set_debugreg(vcpu->arch.host_db[2], 2);
-set_debugreg(vcpu->arch.host_db[3], 3);
+if (unlikely(vcpu->arch.switch_db_regs || 
test_thread_flag(TIF_DEBUG))) {
IIRC, you must clear dr7 before loading dr0..3 to avoid spurious 
effects.


Yeah, I thought of it but forgot to implement it.


Actually on the entry path it is done correctly, and on the exit path 
dr7 is cleared, so this patch is correct.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 01:42 PM, Jan Kiszka wrote:



diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 891234b..9e3acbd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

kvm_guest_enter();

-   get_debugreg(vcpu->arch.host_dr6, 6);
-   get_debugreg(vcpu->arch.host_dr7, 7);
if (unlikely(vcpu->arch.switch_db_regs)) {
-   get_debugreg(vcpu->arch.host_db[0], 0);
-   get_debugreg(vcpu->arch.host_db[1], 1);
-   get_debugreg(vcpu->arch.host_db[2], 2);
-   get_debugreg(vcpu->arch.host_db[3], 3);
-
 

I'm fine with this for the common case, but we should switch to the safe
pattern on test_thread_flag(TIF_DEBUG), ie. if someone debugs qemu using
hardware break/watchpoints.
   


IIUC, thread.debugregs should be synced with the hardware registers.  
The kernel itself only writes to the debug registers, so we're safe 
doing the same.


   

set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0);
set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
kvm_x86_ops->run(vcpu);

-   if (unlikely(vcpu->arch.switch_db_regs)) {
-   set_debugreg(0, 7);
-   set_debugreg(vcpu->arch.host_db[0], 0);
-   set_debugreg(vcpu->arch.host_db[1], 1);
-   set_debugreg(vcpu->arch.host_db[2], 2);
-   set_debugreg(vcpu->arch.host_db[3], 3);
+   if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
{
 

IIRC, you must clear dr7 before loading dr0..3 to avoid spurious effects.
   


Yeah, I thought of it but forgot to implement it.


+   set_debugreg(current->thread.debugreg0, 0);
+   set_debugreg(current->thread.debugreg1, 1);
+   set_debugreg(current->thread.debugreg2, 2);
+   set_debugreg(current->thread.debugreg3, 3);
+   set_debugreg(current->thread.debugreg6, 6);
+   set_debugreg(current->thread.debugreg7, 7);
}
-   set_debugreg(vcpu->arch.host_dr6, 6);
-   set_debugreg(vcpu->arch.host_dr7, 7);
 

Unless I miss something ATM, moving this into the if-clause should cause
troubles if the guest leaves dr7 armed behind. But I need to refresh my
memory on this.
   


The hardware will clear dr7 on exit.

Updated patch to follow.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Jan Kiszka wrote:
> Avi Kivity wrote:
>> Instead of saving the debug registers from the processor to a kvm data
>> structure, rely in the debug registers stored in the thread structure.
>> This allows us not to save dr6 and dr7.
>>
>> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
>>
>> Signed-off-by: Avi Kivity 
>> ---
>>  arch/x86/include/asm/kvm_host.h |3 ---
>>  arch/x86/kvm/x86.c  |   22 +++---
>>  2 files changed, 7 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 6046e6f..45226f0 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
>>  u32 pat;
>>  
>>  int switch_db_regs;
>> -unsigned long host_db[KVM_NR_DB_REGS];
>> -unsigned long host_dr6;
>> -unsigned long host_dr7;
>>  unsigned long db[KVM_NR_DB_REGS];
>>  unsigned long dr6;
>>  unsigned long dr7;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 891234b..9e3acbd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  
>>  kvm_guest_enter();
>>  
>> -get_debugreg(vcpu->arch.host_dr6, 6);
>> -get_debugreg(vcpu->arch.host_dr7, 7);
>>  if (unlikely(vcpu->arch.switch_db_regs)) {
>> -get_debugreg(vcpu->arch.host_db[0], 0);
>> -get_debugreg(vcpu->arch.host_db[1], 1);
>> -get_debugreg(vcpu->arch.host_db[2], 2);
>> -get_debugreg(vcpu->arch.host_db[3], 3);
>> -
> 
> I'm fine with this for the common case, but we should switch to the safe
> pattern on test_thread_flag(TIF_DEBUG), ie. if someone debugs qemu using
> hardware break/watchpoints.
> 
>>  set_debugreg(0, 7);
>>  set_debugreg(vcpu->arch.eff_db[0], 0);
>>  set_debugreg(vcpu->arch.eff_db[1], 1);
>> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>  trace_kvm_entry(vcpu->vcpu_id);
>>  kvm_x86_ops->run(vcpu);
>>  
>> -if (unlikely(vcpu->arch.switch_db_regs)) {
>> -set_debugreg(0, 7);
>> -set_debugreg(vcpu->arch.host_db[0], 0);
>> -set_debugreg(vcpu->arch.host_db[1], 1);
>> -set_debugreg(vcpu->arch.host_db[2], 2);
>> -set_debugreg(vcpu->arch.host_db[3], 3);
>> +if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
>> {
> 
> IIRC, you must clear dr7 before loading dr0..3 to avoid spurious effects.
> 
>> +set_debugreg(current->thread.debugreg0, 0);
>> +set_debugreg(current->thread.debugreg1, 1);
>> +set_debugreg(current->thread.debugreg2, 2);
>> +set_debugreg(current->thread.debugreg3, 3);
>> +set_debugreg(current->thread.debugreg6, 6);
>> +set_debugreg(current->thread.debugreg7, 7);
>>  }
>> -set_debugreg(vcpu->arch.host_dr6, 6);
>> -set_debugreg(vcpu->arch.host_dr7, 7);
> 
> Unless I miss something ATM, moving this into the if-clause should cause
> troubles if the guest leaves dr7 armed behind. But I need to refresh my
> memory on this.

Looks like VMX loads 0x400 unconditionally into dr7 on vmexit, thus all
hw debugging is initially disabled, I must have missed this while
reworking the dr switch code. So we can safely load dr0..3 without
clearing dr7 first, and we can skip restoring of dr6 and dr7 unless the
current process (the hypervisor) is debugged itself.

> 
>>  
>>  set_bit(KVM_REQ_KICK, &vcpu->requests);
>>  local_irq_enable();
> 
> Jan
> 

Recent Intel CPUs seem to provide control over dr7/debugctl msr
saving/restoring (bit 2 in vmexit and vmentry control). Allready thought
about if this may further help to reduce the common case (ie. no host
and guest drX usage)?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3] VMX: Enhance invalid guest state emulation

2009-09-01 Thread Mohammed Gamal
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that 
have already failed

Signed-off-by: Mohammed Gamal 
---
 arch/x86/kvm/vmx.c |   44 
 1 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 78101dd..6265098 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -107,7 +107,6 @@ struct vcpu_vmx {
} rmode;
int vpid;
bool emulation_required;
-   enum emulation_result invalid_state_emulation_result;
 
/* Support for vnmi-less CPUs */
int soft_vnmi_blocked;
@@ -3318,35 +3317,37 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
return 1;
 }
 
-static void handle_invalid_guest_state(struct kvm_vcpu *vcpu)
+static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
enum emulation_result err = EMULATE_DONE;
-
-   local_irq_enable();
-   preempt_enable();
+   int ret = 1;
 
while (!guest_state_valid(vcpu)) {
err = emulate_instruction(vcpu, 0, 0, 0);
 
-   if (err == EMULATE_DO_MMIO)
-   break;
+   if (err == EMULATE_DO_MMIO) {
+   ret = 0;
+   goto out;
+   }
 
if (err != EMULATE_DONE) {
kvm_report_emulation_failure(vcpu, "emulation failure");
-   break;
+   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   vcpu->run->internal.suberror = 
KVM_INTERNAL_ERROR_EMULATION;
+   ret = 0;
+   goto out;
}
 
if (signal_pending(current))
-   break;
+   goto out;
if (need_resched())
schedule();
}
 
-   preempt_disable();
-   local_irq_disable();
-
-   vmx->invalid_state_emulation_result = err;
+   vmx->emulation_required = 0;
+out:
+   return ret;
 }
 
 /*
@@ -3402,13 +3403,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
 
-   /* If we need to emulate an MMIO from handle_invalid_guest_state
-* we just return 0 */
-   if (vmx->emulation_required && emulate_invalid_guest_state) {
-   if (guest_state_valid(vcpu))
-   vmx->emulation_required = 0;
-   return vmx->invalid_state_emulation_result != EMULATE_DO_MMIO;
-   }
+   /* If guest state is invalid, start emulating */
+   if (vmx->emulation_required && emulate_invalid_guest_state)
+   return handle_invalid_guest_state(vcpu);
 
/* Access CR3 don't cause VMExit in paging mode, so we need
 * to sync with guest real CR3. */
@@ -3603,11 +3600,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
vmx->entry_time = ktime_get();
 
-   /* Handle invalid guest state instead of entering VMX */
-   if (vmx->emulation_required && emulate_invalid_guest_state) {
-   handle_invalid_guest_state(vcpu);
+   /* Don't enter VMX if guest state is invalid, let the exit handler
+  start emulation until we arrive back to a valid state */
+   if (vmx->emulation_required && emulate_invalid_guest_state)
return;
-   }
 
if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Jan Kiszka
Avi Kivity wrote:
> Instead of saving the debug registers from the processor to a kvm data
> structure, rely in the debug registers stored in the thread structure.
> This allows us not to save dr6 and dr7.
> 
> Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/include/asm/kvm_host.h |3 ---
>  arch/x86/kvm/x86.c  |   22 +++---
>  2 files changed, 7 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6046e6f..45226f0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
>   u32 pat;
>  
>   int switch_db_regs;
> - unsigned long host_db[KVM_NR_DB_REGS];
> - unsigned long host_dr6;
> - unsigned long host_dr7;
>   unsigned long db[KVM_NR_DB_REGS];
>   unsigned long dr6;
>   unsigned long dr7;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 891234b..9e3acbd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>   kvm_guest_enter();
>  
> - get_debugreg(vcpu->arch.host_dr6, 6);
> - get_debugreg(vcpu->arch.host_dr7, 7);
>   if (unlikely(vcpu->arch.switch_db_regs)) {
> - get_debugreg(vcpu->arch.host_db[0], 0);
> - get_debugreg(vcpu->arch.host_db[1], 1);
> - get_debugreg(vcpu->arch.host_db[2], 2);
> - get_debugreg(vcpu->arch.host_db[3], 3);
> -

I'm fine with this for the common case, but we should switch to the safe
pattern on test_thread_flag(TIF_DEBUG), ie. if someone debugs qemu using
hardware break/watchpoints.

>   set_debugreg(0, 7);
>   set_debugreg(vcpu->arch.eff_db[0], 0);
>   set_debugreg(vcpu->arch.eff_db[1], 1);
> @@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   trace_kvm_entry(vcpu->vcpu_id);
>   kvm_x86_ops->run(vcpu);
>  
> - if (unlikely(vcpu->arch.switch_db_regs)) {
> - set_debugreg(0, 7);
> - set_debugreg(vcpu->arch.host_db[0], 0);
> - set_debugreg(vcpu->arch.host_db[1], 1);
> - set_debugreg(vcpu->arch.host_db[2], 2);
> - set_debugreg(vcpu->arch.host_db[3], 3);
> + if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
> {

IIRC, you must clear dr7 before loading dr0..3 to avoid spurious effects.

> + set_debugreg(current->thread.debugreg0, 0);
> + set_debugreg(current->thread.debugreg1, 1);
> + set_debugreg(current->thread.debugreg2, 2);
> + set_debugreg(current->thread.debugreg3, 3);
> + set_debugreg(current->thread.debugreg6, 6);
> + set_debugreg(current->thread.debugreg7, 7);
>   }
> - set_debugreg(vcpu->arch.host_dr6, 6);
> - set_debugreg(vcpu->arch.host_dr7, 7);

Unless I miss something ATM, moving this into the if-clause should cause
troubles if the guest leaves dr7 armed behind. But I need to refresh my
memory on this.

>  
>   set_bit(KVM_REQ_KICK, &vcpu->requests);
>   local_irq_enable();

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity

On 09/01/2009 12:44 PM, Avi Kivity wrote:

Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
   


Andrew, this is now available as the 'debugreg' branch of kvm.git.  
Given the massive performance improvement, it will be interesting to see 
how the test results change.


Marcelo, please queue this for 2.6.32, and I think it's even suitable 
for -stable.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Use thread debug register storage instead of kvm specific data

2009-09-01 Thread Avi Kivity
Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |3 ---
 arch/x86/kvm/x86.c  |   22 +++---
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6046e6f..45226f0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -362,9 +362,6 @@ struct kvm_vcpu_arch {
u32 pat;
 
int switch_db_regs;
-   unsigned long host_db[KVM_NR_DB_REGS];
-   unsigned long host_dr6;
-   unsigned long host_dr7;
unsigned long db[KVM_NR_DB_REGS];
unsigned long dr6;
unsigned long dr7;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 891234b..9e3acbd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3627,14 +3627,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
kvm_guest_enter();
 
-   get_debugreg(vcpu->arch.host_dr6, 6);
-   get_debugreg(vcpu->arch.host_dr7, 7);
if (unlikely(vcpu->arch.switch_db_regs)) {
-   get_debugreg(vcpu->arch.host_db[0], 0);
-   get_debugreg(vcpu->arch.host_db[1], 1);
-   get_debugreg(vcpu->arch.host_db[2], 2);
-   get_debugreg(vcpu->arch.host_db[3], 3);
-
set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0);
set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -3645,15 +3638,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
kvm_x86_ops->run(vcpu);
 
-   if (unlikely(vcpu->arch.switch_db_regs)) {
-   set_debugreg(0, 7);
-   set_debugreg(vcpu->arch.host_db[0], 0);
-   set_debugreg(vcpu->arch.host_db[1], 1);
-   set_debugreg(vcpu->arch.host_db[2], 2);
-   set_debugreg(vcpu->arch.host_db[3], 3);
+   if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) 
{
+   set_debugreg(current->thread.debugreg0, 0);
+   set_debugreg(current->thread.debugreg1, 1);
+   set_debugreg(current->thread.debugreg2, 2);
+   set_debugreg(current->thread.debugreg3, 3);
+   set_debugreg(current->thread.debugreg6, 6);
+   set_debugreg(current->thread.debugreg7, 7);
}
-   set_debugreg(vcpu->arch.host_dr6, 6);
-   set_debugreg(vcpu->arch.host_dr7, 7);
 
set_bit(KVM_REQ_KICK, &vcpu->requests);
local_irq_enable();
-- 
1.6.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Check cpl before emulating debug register access

2009-09-01 Thread Avi Kivity
Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: sta...@kernel.org
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |2 ++
 arch/x86/kvm/x86.c  |   13 +
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3f637f..6046e6f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -621,6 +621,7 @@ void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned 
nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
   u32 error_code);
+bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 int kvm_pic_set_irq(void *opaque, int irq, int level);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 78101dd..05cd554 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2931,6 +2931,8 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
int dr, reg;
 
+   if (!kvm_require_cpl(vcpu, 0))
+   return 1;
dr = vmcs_readl(GUEST_DR7);
if (dr & DR7_GD) {
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b3a169..891234b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -223,6 +223,19 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned 
nr, u32 error_code)
 EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
 
 /*
+ * Checks if cpl <= required_cpl; if true, return true.  Otherwise queue
+ * a #GP and return false.
+ */
+bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl)
+{
+   if (kvm_x86_ops->get_cpl(vcpu) <= required_cpl)
+   return true;
+   kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
+   return false;
+}
+EXPORT_SYMBOL_GPL(kvm_require_cpl);
+
+/*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
-- 
1.6.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH 2/2] Add KSM test

2009-09-01 Thread Lukáš Doktor
I'm sorry but thunderbird apparently crippled the path. Resending as the 
attachment.
diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 4930e80..b9839df 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -53,6 +53,8 @@ class kvm(test.test):
 "yum_update":   test_routine("kvm_tests", "run_yum_update"),
 "autotest": test_routine("kvm_tests", "run_autotest"),
 "kvm_install":  test_routine("kvm_install", "run_kvm_install"),
+"ksm":
+test_routine("kvm_tests", "run_ksm"),
 "linux_s3": test_routine("kvm_tests", "run_linux_s3"),
 "stress_boot":  test_routine("kvm_tests", "run_stress_boot"),
 "timedrift":test_routine("kvm_tests", "run_timedrift"),
diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample
index a83ef9b..f4a41b9 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,6 +100,23 @@ variants:
 test_name = disktest
 test_control_file = disktest.control
 
+- ksm:
+# Don't preprocess any vms as we need to change it's params
+vms = ''
+image_snapshot = yes
+kill_vm_gracefully = no
+type = ksm
+variants:
+- ratio_3:
+ksm_ratio = 3
+- ratio_10:
+ksm_ratio = 10
+variants:
+- serial 
+ksm_test_size = "serial"
+- paralel
+ksm_test_size = "paralel"
+
 - linux_s3: install setup
 type = linux_s3
 
diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
index b100269..ada4c6b 100644
--- a/client/tests/kvm/kvm_tests.py
+++ b/client/tests/kvm/kvm_tests.py
@@ -462,6 +462,554 @@ def run_yum_update(test, params, env):
 
 session.close()
 
+def run_ksm(test, params, env):
+"""
+Test how KSM (Kernel Shared Memory) act with more than physical memory is
+used. In second part is also tested, how KVM can handle the situation,
+when the host runs out of memory (expected is to pause the guest system,
+wait until some process returns the memory and bring the guest back to life)
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test wnvironment.
+"""
+# We are going to create the main VM so we use kvm_preprocess functions
+# FIXME: not a nice thing
+import kvm_preprocessing
+import random
+import socket
+import select
+import math
+
+class allocator_com:
+"""
+This class is used for communication with the allocator
+"""
+def __init__(self, vm, _port, _host='127.0.0.1'):
+self.vm = vm
+self.PORT = _port
+self.HOST = _host
+self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+self.isConnect = False
+
+def __str__(self):
+return self.vm + ":" + self.HOST + ":" + str(self.PORT)
+
+def connect(self):
+print self
+logging.debug("ALLOC: connect to %s", self.vm)
+try:
+self.socket.connect((self.HOST, self.PORT))
+except:
+raise error.TestFail("ALLOC: Could not establish the "\
+ "communication with %s" % (self.vm))
+self.isConnect = True
+
+def isConnected(self):
+return self.isConnect;
+
+def readsize(self):
+read,write,error = select.select([self.socket.fileno()],[],[],0.5)
+size = 0
+if (self.socket.fileno() in read):
+data = self.socket.recv(1);
+size = "";
+while data[0] != ':':
+size = size + data[0]
+data = self.socket.recv(1)
+return int(size)
+
+def _recv(self):
+msg = ""
+read, write, error = select.select([self.socket.fileno()],\
+   [], [], 0.5)
+if (self.socket.fileno() in read):
+size = self.readsize()
+msg = self.socket.recv(size)
+if (len(msg) < size):
+raise error.TestFail("ALLOC: Could not recive the message")
+
+logging.debug("ALLOC: output '%s' from %s" % (msg, self.vm))
+return msg
+
+def recv(self, wait=1, loops=20):
+out = ""
+log = ""
+while not out.startswith("PASS") and not out.startswith("FAIL"):
+logging.debug("Sleep(%d)" % (wait))
+time.sleep(wait)
+log += out
+out = self._recv()
+
+if loops == 0:
+logging.error(repr(ou

Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test

2009-09-01 Thread Lukáš Doktor
I'm sorry but thunderbird apparently crippled the path. Resending as the 
attachment.
diff --git a/client/tests/kvm/allocator.c b/client/tests/kvm/allocator.c
new file mode 100644
index 000..89e8ce4
--- /dev/null
+++ b/client/tests/kvm/allocator.c
@@ -0,0 +1,571 @@
+/*
+ * KSM test program.
+ * Copyright(C) 2009 Redhat
+ * Jason Wang (jasow...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+//socket linux
+#include 
+#include 
+#include 
+#include 
+//TODO: socket windows
+
+
+
+#define PS (4096)
+long PAGE_SIZE = PS;
+long intInPage = PS/sizeof(int);
+#define MAP_FLAGS ( MAP_ANON | MAP_SHARED )
+#define PROT_FLAGS ( PROT_WRITE )
+#define FILE_MODE ( O_RDWR | O_CREAT )
+#define LOG_FILE "/var/log/vksmd"
+#define FIFO_FILE "/tmp/vksmd"
+#define MODE 0666
+#define FILE_BASE "/tmp/ksm_file"
+#define MAX_SIZESIZE 6
+#define MAX_COMMANDSIZE 50
+#define BLOCK_COUNT 8
+
+int log_fd = -1;
+int base_fd = -1;
+int checkvalue = 0;
+
+
+//Socket
+struct sockaddr_in sockName;
+struct sockaddr_in clientInfo;
+int mainSocket,clientSocket;
+int port;
+
+socklen_t addrlen;
+
+
+
+
+const uint32_t random_mask = UINT32_MAX>>1;
+uint32_t random_x = 0;
+const uint32_t random_a = 1103515245;
+const uint32_t random_m = 2^32;
+const uint32_t random_c = 12345;
+
+int statickey = 0;
+int dynamickey = 0;
+
+typedef enum _COMMANDS
+{
+  wrongcommad,
+  ninit,
+  nrandom,
+  nexit,
+  nsrandom,
+  nsrverify,
+  nfillzero,
+  nfillvalue,
+  ndfill,
+  nverify
+} COMMANDS;
+
+void sigpipe (int param)
+{
+  fprintf(stderr,"write error\n");
+  //exit(-1); //uncomment end if network connetion is down
+}
+
+int writefull(int socket,char * data,int size){
+  int sz = 0;
+  while (sz < size)
+sz += write(socket, data+sz, size-sz);
+  return sz;
+}
+
+
+int write_message(int s,char * message){
+  size_t len = strlen(message);
+  char buf[10];
+  sprintf(buf,"%d:",(unsigned int)len);
+  size_t size = strlen(buf);
+
+  struct timeval tv;
+  fd_set writeset;
+  fd_set errorset;
+  FD_ZERO(&writeset);
+  FD_ZERO(&errorset);
+  FD_SET(clientSocket, &writeset);
+  FD_SET(clientSocket, &errorset);
+  tv.tv_sec = 0;
+  tv.tv_usec = 100;
+  int max = s+1;
+  tv.tv_sec = 10;
+  tv.tv_usec = 0;
+  int ret = select(max, NULL, &writeset, NULL, &tv);
+  if (ret == -1)
+  {
+return -1;
+  }
+  if (ret == 0)
+  {
+return -1;
+  }
+  if (FD_ISSET(s, &writeset))
+  {
+if (writefull(s, buf, size) != size){
+  return -1;
+}
+if (writefull(s, message, len) != len){
+  return -1;
+}
+  }
+  return 0;
+}
+
+void log_info(char *str)
+{
+  if (write_message(clientSocket, str) != 0){
+fprintf(stderr,"write error\n");
+  }
+}
+
+/* fill pages with zero */
+void zero_pages(void **page_array,int npages)
+{
+  int n = 0;
+  for(n=0;n 0);
+
+  close(clientSocket);
+  close(mainSocket);
+
+  if (prev_fn==SIG_IGN) signal (SIGTERM,SIG_IGN);
+
+  return 0;
+}
+
+