Re: [PATCH 8/9] coalesce mmio regions with an explicit call

2008-09-25 Thread Avi Kivity
Glauber Costa wrote:
>
> Any ideas about what's up for the other hypervisors that may (we hope) be 
> integrated
> in the future? Xen?
>   

Xen should benefit even more (much more).  IIRC Windows wouldn't boot
since it was spending all its time context switching when the splash
screen with its KITT bar was displayed, so they hacked something for
vga, but nothing generic.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Migration fails

2008-09-25 Thread jd
Hi Chris, 
 
Thanks for the response.

It errors out bit after the command is given.. so I guess... it must have 
transferred some pages/bits.

What kind of information do you want me to collect ?
Is there a way to turn on logging..debugging for kvm process ?

Will also try tap option.

thanks.
/Jd

--- On Wed, 9/24/08, Chris Lalancette <[EMAIL PROTECTED]> wrote:

> From: Chris Lalancette <[EMAIL PROTECTED]>
> Subject: Re: KVM Migration fails
> To: [EMAIL PROTECTED]
> Cc: "KVM List" 
> Date: Wednesday, September 24, 2008, 10:56 PM
> jd wrote:
> > Hi I have a setup using shared nfs disks. When
> migration is attempted, it
> > fails... any ideas on how to debug this..?
> > 
> > /Jd
> > 
> > Details ===
> > 
> > migration: write failed (Connection reset by peer)^M
> Migration failed! ret=0
> > error=9
> 
> Not a lot of detail here, but when does it fail during the
> process?
> Immediately?  After transferring some of the image?  Right
> at the end?
> Connection reset by peer sort of sounds like there is a
> firewall or something in
> the way, although if it transfers some data before
> crashing, then it's probably
> something else.  The only other thing I can think of based
> on your command-line
> below is something to do with your -net options; I've
> only successfully migrated
> with -net tap before, I've never tried -net user.
> 
> Chris Lalancette
> 
> > 
> > Source : KVM-73, Cent OS 5.2, 64 bit.
> > 
> > qemu-system-x86_64 -net
> nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net user,vlan=0
> > -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm -boot c -m
> 1024 -no-acpi  -vnc :22
> > -name XPSP2-KVM -smp 2 -monitor
> > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> -pidfile
> > /var/run/kvm/pids/XPSP2-KVM -daemonize
> > 
> > 
> > 
> > Dest   : KVM-70, Fedora 8, 64bit
> > 
> > qemu-system-x86_64 -net
> nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net user,vlan=0
> > -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm -boot c -m
> 1024 -no-acpi  -vnc :23
> > -incoming tcp://0:8002 -name XPSP2-KVM -smp 2 -monitor
> > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> -pidfile
> > /var/run/kvm/pids/XPSP2-KVM -daemonize
> --
> To unsubscribe from this list: send the line
> "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html


  
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Migration fails

2008-09-25 Thread jd
The error code and the messages seem bit different here.

/Jd


--- On Wed, 9/24/08, Yang, Sheng <[EMAIL PROTECTED]> wrote:

> From: Yang, Sheng <[EMAIL PROTECTED]>
> Subject: Re: KVM Migration fails
> To: kvm@vger.kernel.org, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Wednesday, September 24, 2008, 11:23 PM
> On Thursday 25 September 2008 12:22:42 jd wrote:
> > Hi
> >   I have a setup using shared nfs disks. When
> migration is attempted, it
> > fails... any ideas on how to debug this..?
> 
> It's a regression bug recently. 
> 
> Please refer to 
> 
> https://sourceforge.net/tracker/index.php?func=detail&aid=2106661&group_id=180599&atid=893831
> 
> I think a git bisect can also help. 
> --
> regards
> Yang, Sheng
> >
> > /Jd
> >
> > Details
> > ===
> >
> > migration: write failed (Connection reset by peer)^M
> > Migration failed! ret=0 error=9
> >
> > Source : KVM-73, Cent OS 5.2, 64 bit.
> >
> > qemu-system-x86_64 -net
> nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net
> > user,vlan=0 -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm
> -boot c -m 1024
> > -no-acpi  -vnc :22 -name XPSP2-KVM -smp 2 -monitor
> > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> -pidfile 
> > /var/run/kvm/pids/XPSP2-KVM -daemonize
> >
> >
> >
> > Dest   : KVM-70, Fedora 8, 64bit
> >
> > qemu-system-x86_64 -net
> nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net
> > user,vlan=0 -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm
> -boot c -m 1024
> > -no-acpi  -vnc :23 -incoming tcp://0:8002 -name
> XPSP2-KVM -smp 2 -monitor
> > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> -pidfile 
> > /var/run/kvm/pids/XPSP2-KVM -daemonize
> >
> >
> >
> >
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line
> "unsubscribe kvm" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line
> "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html


  
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Migration fails

2008-09-25 Thread Yang, Sheng
On Thursday 25 September 2008 15:23:22 jd wrote:
> The error code and the messages seem bit different here.
>

Um... At least I believe it's a regression, and our migration test never 
success after that. So I think it's worth to look into it. If you are lucky 
enough, you would got two bugs. :)

--
regards
Yang, Sheng

> /Jd
>
> --- On Wed, 9/24/08, Yang, Sheng <[EMAIL PROTECTED]> wrote:
> > From: Yang, Sheng <[EMAIL PROTECTED]>
> > Subject: Re: KVM Migration fails
> > To: kvm@vger.kernel.org, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Date: Wednesday, September 24, 2008, 11:23 PM
> >
> > On Thursday 25 September 2008 12:22:42 jd wrote:
> > > Hi
> > >   I have a setup using shared nfs disks. When
> >
> > migration is attempted, it
> >
> > > fails... any ideas on how to debug this..?
> >
> > It's a regression bug recently.
> >
> > Please refer to
> >
> > https://sourceforge.net/tracker/index.php?func=detail&aid=2106661&group_i
> >d=180599&atid=893831
> >
> > I think a git bisect can also help.
> > --
> > regards
> > Yang, Sheng
> >
> > > /Jd
> > >
> > > Details
> > > ===
> > >
> > > migration: write failed (Connection reset by peer)^M
> > > Migration failed! ret=0 error=9
> > >
> > > Source : KVM-73, Cent OS 5.2, 64 bit.
> > >
> > > qemu-system-x86_64 -net
> >
> > nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net
> >
> > > user,vlan=0 -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm
> >
> > -boot c -m 1024
> >
> > > -no-acpi  -vnc :22 -name XPSP2-KVM -smp 2 -monitor
> > > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> >
> > -pidfile
> >
> > > /var/run/kvm/pids/XPSP2-KVM -daemonize
> > >
> > >
> > >
> > > Dest   : KVM-70, Fedora 8, 64bit
> > >
> > > qemu-system-x86_64 -net
> >
> > nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net
> >
> > > user,vlan=0 -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm
> >
> > -boot c -m 1024
> >
> > > -no-acpi  -vnc :23 -incoming tcp://0:8002 -name
> >
> > XPSP2-KVM -smp 2 -monitor
> >
> > > unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait
> >
> > -pidfile
> >
> > > /var/run/kvm/pids/XPSP2-KVM -daemonize
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line
> >
> > "unsubscribe kvm" in
> >
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at
> >
> > http://vger.kernel.org/majordomo-info.html
> >
> >
> > --
> > To unsubscribe from this list: send the line
> > "unsubscribe kvm" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Migration fails

2008-09-25 Thread Uri Lublin

jd wrote:

Hi
  I have a setup using shared nfs disks. When migration is attempted, it fails... any ideas on how to debug this..? 
  
/Jd


Details
===

migration: write failed (Connection reset by peer)^M
Migration failed! ret=0 error=9

Source : KVM-73, Cent OS 5.2, 64 bit.

qemu-system-x86_64 -net nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net user,vlan=0 
-hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm -boot c -m 1024 -no-acpi  -vnc :22 
-name XPSP2-KVM -smp 2 -monitor 
unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait -pidfile  
/var/run/kvm/pids/XPSP2-KVM -daemonize



Dest   : KVM-70, Fedora 8, 64bit

qemu-system-x86_64 -net nic,vlan=0,macaddr=00:16:3e:16:f4:f0 -net user,vlan=0 -hda /mnt/nfs/vmdisks/XPSP2-KVM.disk.xm -boot c -m 1024 -no-acpi  -vnc :23 -incoming tcp://0:8002 -name XPSP2-KVM -smp 2 -monitor unix:/var/run/kvm/monitors/XPSP2-KVM,server,nowait -pidfile  /var/run/kvm/pids/XPSP2-KVM -daemonize 




Hi,

The error on the source is "write failed ...". Many times that means the 
destination has exited due to an error on its side. Do you see any error message 
on the destination ?


Uri.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VPN connection from Windows 2000 guest to remote server.

2008-09-25 Thread Avi Kivity
Colin Alie wrote:
> Good morning,
>
> Is it possible to establish a PPTP connection from a guest running Windows 
> 2000 Professional SP4 to a remote machine running Windows Server 2003 using 
> user-mode networking?
>
> It appears to me that the problem is that GRE protocol packets are not being 
> successfully transmitted.  I can create a PPTP connection to the VPN server 
> from the host machine and, using iptraf on the external interface, I see GRE 
> packets being transmitted.  However, when I initiate the connection from the 
> guest machine, using iptraf, I see the initial connection using TCP to port 
> 1723 on the server but no GRE protocol packets are detected.
>
>   

GRE is not supported by user mode networking.  Try bridging or VDE.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] VMX: work around lacking VNMI support

2008-09-25 Thread Jan Kiszka
Jan Kiszka wrote:
...
> Index: b/arch/x86/kvm/vmx.c
> ===
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -90,6 +90,11 @@ struct vcpu_vmx {
>   } rmode;
>   int vpid;
>   bool emulation_required;
> +
> + /* Support for vnmi-less CPUs */
> + int soft_vnmi_blocked;
> + ktime_t entry_time;
> + s64 vnmi_blocked_time;

I meanwhile realized that these states (except entry_time) and probably
also arch.nmi_pending/injected are things that should be considered when
the vcpu state is saved and restored, right? What is the right interface
for this? An extension of kvm_sregs?

BTW, via which channel is GUEST_INTERRUPTIBILITY_INFO from the vmcs
saved/restored? I'm currently not seeing any related, CPU-specific code.
For NMI code, the virtual blocking bit would be relevant (if the CPU
supports it, of course), but I guess the other bits are also important
enough to let them survive.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: PIC: enhance IPI avoidance

2008-09-25 Thread Avi Kivity

Marcelo Tosatti wrote:

True. Anything other potential problem you could think of?

  


No, so applied the patch.  Thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Avi Kivity

Han, Weidong wrote:

Don't need to map mmio pages for iommu. When find mmio pages in
kvm_iommu_map_pages(), don't map them, and shouldn't return error due to
it's not an error. If return error (such as -EINVAL), device assigment
will fail.

  



I don't understand.  Why don't we need to map mmio pages?  We certainly 
don't want them emulated.



@@ -36,14 +36,13 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 {
gfn_t gfn = base_gfn;
pfn_t pfn;
-   int i, r;
+   int i, r = 0;
struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
 
 	/* check if iommu exists and in use */

if (!domain)
return 0;
 
-	r = -EINVAL;

for (i = 0; i < npages; i++) {
/* check if already mapped */
pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
@@ -60,13 +59,14 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 DMA_PTE_READ |
 DMA_PTE_WRITE);
if (r) {
-   printk(KERN_DEBUG "kvm_iommu_map_pages:"
+   printk(KERN_ERR "kvm_iommu_map_pages:"
   "iommu failed to map pfn=%lx\n",
pfn);
goto unmap_pages;
}
} else {
-   printk(KERN_DEBUG "kvm_iommu_map_page:"
-  "invalid pfn=%lx\n", pfn);
+   printk(KERN_DEBUG "kvm_iommu_map_pages:"
+  "invalid pfn=%lx, iommu needn't map "
+  "MMIO pages!\n", pfn);
goto unmap_pages;
}


If a slot has a mix of mmio and non-mmio pages, you will unmap the 
non-mmio pages, yet return no error.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] VMX: work around lacking VNMI support

2008-09-25 Thread Avi Kivity

Jan Kiszka wrote:

Jan Kiszka wrote:
..
  

Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -90,6 +90,11 @@ struct vcpu_vmx {
} rmode;
int vpid;
bool emulation_required;
+
+   /* Support for vnmi-less CPUs */
+   int soft_vnmi_blocked;
+   ktime_t entry_time;
+   s64 vnmi_blocked_time;



I meanwhile realized that these states (except entry_time) and probably
also arch.nmi_pending/injected are things that should be considered when
the vcpu state is saved and restored, right? What is the right interface
for this? An extension of kvm_sregs?

  


kvm_sregs can't be extended because that would break the ABI, so we have 
to add a new ioctl.


I have some patches that allow ioctls to be extended, so if that's 
accepted, we can avoid the new ioctl.



BTW, via which channel is GUEST_INTERRUPTIBILITY_INFO from the vmcs
saved/restored? I'm currently not seeing any related, CPU-specific code.
  


Looks like it's missing.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/9] coalesce mmio regions with an explicit call

2008-09-25 Thread Glauber Costa
On Thu, Sep 25, 2008 at 10:08:53AM +0300, Avi Kivity wrote:
> Glauber Costa wrote:
> >
> > Any ideas about what's up for the other hypervisors that may (we hope) be 
> > integrated
> > in the future? Xen?
> >   
> 
> Xen should benefit even more (much more).  IIRC Windows wouldn't boot
> since it was spending all its time context switching when the splash
> screen with its KITT bar was displayed, so they hacked something for
> vga, but nothing generic.
That's the point. It's not just a word play between qemu and kvm, because
if we introduce generic hooks that kvm happens to fill, but qemu not, other 
hypervirors
may (we hope) fill it in the future.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Allow enabling kvm_trace on external module

2008-09-25 Thread Avi Kivity

Eduardo Habkost wrote:

From: Eduardo Habkost <[EMAIL PROTECTED]>
Date: Wed, 24 Sep 2008 14:11:42 -0300
Subject: Always generate config.kbuild

When implementing --with-kvm-trace, I supposed make would never enter
the 'kernel' directory when compiling with --with-patched-kernel. I was
wrong and broke --with-patched-kernel.

Change configure to always generate config.kbuild on the kernel
directory. Otherwise make will explode on 'make header-sync', that runs
even when --with-patched-kernel was used.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/39] KVM: VMX: Clean up magic number 0x66 in init_rmode_tss

2008-09-25 Thread Avi Kivity
From: Sheng Yang <[EMAIL PROTECTED]>

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ddb49e3..229e2d0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1732,7 +1732,8 @@ static int init_rmode_tss(struct kvm *kvm)
if (r < 0)
goto out;
data = TSS_BASE_SIZE + TSS_REDIRECTION_SIZE;
-   r = kvm_write_guest_page(kvm, fn++, &data, 0x66, sizeof(u16));
+   r = kvm_write_guest_page(kvm, fn++, &data,
+   TSS_IOPB_BASE_OFFSET, sizeof(u16));
if (r < 0)
goto out;
r = kvm_clear_guest_page(kvm, fn++, 0, PAGE_SIZE);
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/39] KVM Updates for 2.6.28 merge window (part 2 of 3)

2008-09-25 Thread Avi Kivity
Here is the second batch of the KVM updates for the 2.6.28 merge window.

Linux 2.6.28 KVM will introduce support for pci device assignment and will
improve overall emulation accuracy.

Amit Shah (3):
  KVM: Device assignment: Check for privileges before assigning irq
  KVM: SVM: Fix typo
  KVM: Use kvm_set_irq to inject interrupts

Avi Kivity (20):
  KVM: VMX: Use interrupt queue for !irqchip_in_kernel
  KVM: Simplify exception entries by using __ASM_SIZE and _ASM_PTR
  KVM: Handle spurious acks for PIT interrupts
  KVM: VMX: Change cs reset state to be a data segment
  KVM: VMX: Change segment dpl at reset to 3
  KVM: Load real mode segments correctly
  KVM: x86 emulator: remove bad ByteOp specifier from NEG descriptor
  KVM: MMU: Move SHADOW_PT_INDEX to mmu.c
  KVM: MMU: Unify direct map 4K and large page paths
  KVM: MMU: Infer shadow root level in direct_map()
  KVM: MMU: Add generic shadow walker
  KVM: MMU: Convert direct maps to use the generic shadow walker
  KVM: MMU: Convert the paging mode shadow walk to use the generic
walker
  KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED
  KVM: Don't call get_user_pages(.force = 1)
  KVM: MMU: Account for npt/ept/realmode page faults
  KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()
  KVM: MMU: Flush tlbs after clearing write permission when accessing
dirty log
  KVM: MMU: Fix setting the accessed bit on non-speculative sptes
  KVM: SVM: No need to unprotect memory during event injection when
using npt

Ben-Ami Yassour (1):
  KVM: remove unused field from the assigned dev struct

Christian Borntraeger (2):
  KVM: s390: Make facility bits future-proof
  KVM: s390: change help text of guest Kconfig

Harvey Harrison (1):
  KVM: make irq ack notifier functions static

Joerg Roedel (1):
  KVM: add MC5_MISC msr read support

Marcelo Tosatti (2):
  KVM: set debug registers after "schedulable" section
  KVM: fix i8259 reset irq acking

Mohammed Gamal (5):
  KVM: VMX: Add Guest State Validity Checks
  KVM: VMX: Add module parameter and emulation flag.
  KVM: VMX: Add invalid guest state handler
  KVM: VMX: Modify mode switching and vmentry functions
  KVM: x86 emulator: Add mov r, imm instructions (opcodes 0xb0-0xbf)

Sheng Yang (1):
  KVM: VMX: Clean up magic number 0x66 in init_rmode_tss

Xiantao Zhang (2):
  KVM: ia64: add a dummy irq ack notification
  KVM: ia64: Enable virtio driver for ia64 in Kconfig

roel kluin (1):
  KVM: x86 emulator: remove duplicate SrcImm

 arch/ia64/kvm/Kconfig  |2 +
 arch/ia64/kvm/irq.h|   32 ++
 arch/s390/Kconfig  |7 +-
 arch/s390/kvm/priv.c   |4 +-
 arch/x86/kvm/i8254.c   |   10 +-
 arch/x86/kvm/i8259.c   |   16 ++-
 arch/x86/kvm/mmu.c |  145 +
 arch/x86/kvm/paging_tmpl.h |  161 +++-
 arch/x86/kvm/svm.c |4 +-
 arch/x86/kvm/vmx.c |  260 ++--
 arch/x86/kvm/x86.c |   49 ++---
 arch/x86/kvm/x86_emulate.c |   19 ++-
 include/asm-x86/kvm_host.h |   14 +--
 virt/kvm/ioapic.c  |2 +-
 virt/kvm/kvm_main.c|2 +-
 15 files changed, 544 insertions(+), 183 deletions(-)
 create mode 100644 arch/ia64/kvm/irq.h

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/39] KVM: set debug registers after "schedulable" section

2008-09-25 Thread Avi Kivity
From: Marcelo Tosatti <[EMAIL PROTECTED]>

The vcpu thread can be preempted after the guest_debug_pre() callback,
resulting in invalid debug registers on the new vcpu.

Move it inside the non-preemptable section.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1b0223..4a03375 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3113,10 +3113,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
down_read(&vcpu->kvm->slots_lock);
vapic_enter(vcpu);
 
-preempted:
-   if (vcpu->guest_debug.enabled)
-   kvm_x86_ops->guest_debug_pre(vcpu);
-
 again:
if (vcpu->requests)
if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests))
@@ -3170,6 +3166,9 @@ again:
goto out;
}
 
+   if (vcpu->guest_debug.enabled)
+   kvm_x86_ops->guest_debug_pre(vcpu);
+
vcpu->guest_mode = 1;
/*
 * Make sure that guest_mode assignment won't happen after
@@ -3244,7 +3243,7 @@ out:
if (r > 0) {
kvm_resched(vcpu);
down_read(&vcpu->kvm->slots_lock);
-   goto preempted;
+   goto again;
}
 
post_kvm_run_save(vcpu, kvm_run);
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/39] KVM: VMX: Use interrupt queue for !irqchip_in_kernel

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   11 +--
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 229e2d0..81db7d4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2173,7 +2173,7 @@ static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
clear_bit(bit_index, &vcpu->arch.irq_pending[word_index]);
if (!vcpu->arch.irq_pending[word_index])
clear_bit(word_index, &vcpu->arch.irq_summary);
-   vmx_inject_irq(vcpu, irq);
+   kvm_queue_interrupt(vcpu, irq);
 }
 
 
@@ -2187,13 +2187,12 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
 
if (vcpu->arch.interrupt_window_open &&
-   vcpu->arch.irq_summary &&
-   !(vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK))
-   /*
-* If interrupts enabled, and not blocked by sti or mov ss. 
Good.
-*/
+   vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
kvm_do_inject_irq(vcpu);
 
+   if (vcpu->arch.interrupt_window_open && vcpu->arch.interrupt.pending)
+   vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
if (!vcpu->arch.interrupt_window_open &&
(vcpu->arch.irq_summary || kvm_run->request_interrupt_window))
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/39] KVM: fix i8259 reset irq acking

2008-09-25 Thread Avi Kivity
From: Marcelo Tosatti <[EMAIL PROTECTED]>

The irq ack during pic reset has three problems:

- Ignores slave/master PIC, using gsi 0-8 for both.
- Generates an ACK even if the APIC is in control.
- Depends upon IMR being clear, which is broken if the irq was masked
at the time it was generated.

The last one causes the BIOS to hang after the first reboot of
Windows installation, since PIT interrupts stop.

[avi: fix check whether pic interrupts are seen by cpu]

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8259.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index de70499..71e3eee 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -195,13 +195,19 @@ int kvm_pic_read_irq(struct kvm *kvm)
 
 void kvm_pic_reset(struct kvm_kpic_state *s)
 {
-   int irq;
+   int irq, irqbase;
struct kvm *kvm = s->pics_state->irq_request_opaque;
+   struct kvm_vcpu *vcpu0 = kvm->vcpus[0];
 
-   for (irq = 0; irq < PIC_NUM_PINS; irq++) {
-   if (!(s->imr & (1 << irq)) && (s->irr & (1 << irq) ||
-   s->isr & (1 << irq)))
-   kvm_notify_acked_irq(kvm, irq);
+   if (s == &s->pics_state->pics[0])
+   irqbase = 0;
+   else
+   irqbase = 8;
+
+   for (irq = 0; irq < PIC_NUM_PINS/2; irq++) {
+   if (vcpu0 && kvm_apic_accept_pic_intr(vcpu0))
+   if (s->irr & (1 << irq) || s->isr & (1 << irq))
+   kvm_notify_acked_irq(kvm, irq+irqbase);
}
s->last_irr = 0;
s->irr = 0;
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/39] KVM: Handle spurious acks for PIT interrupts

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Spurious acks can be generated, for example if the PIC is being reset.
Handle those acks gracefully rather than flooding the log with warnings.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8254.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 7d04dd3..c842060 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -228,7 +228,7 @@ void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 irq_ack_notifier);
spin_lock(&ps->inject_lock);
if (atomic_dec_return(&ps->pit_timer.pending) < 0)
-   WARN_ON(1);
+   atomic_inc(&ps->pit_timer.pending);
ps->irq_ack = 1;
spin_unlock(&ps->inject_lock);
 }
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/39] KVM: Simplify exception entries by using __ASM_SIZE and _ASM_PTR

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm_host.h |   13 ++---
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 1161af1..982b6b2 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -734,15 +734,6 @@ enum {
TASK_SWITCH_GATE = 3,
 };
 
-
-#ifdef CONFIG_64BIT
-# define KVM_EX_ENTRY ".quad"
-# define KVM_EX_PUSH "pushq"
-#else
-# define KVM_EX_ENTRY ".long"
-# define KVM_EX_PUSH "pushl"
-#endif
-
 /*
  * Hardware virtualization extension instructions may fault if a
  * reboot turns off virtualization while processes are running.
@@ -754,11 +745,11 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
"666: " insn "\n\t" \
".pushsection .fixup, \"ax\" \n" \
"667: \n\t" \
-   KVM_EX_PUSH " $666b \n\t" \
+   __ASM_SIZE(push) " $666b \n\t"\
"jmp kvm_handle_fault_on_reboot \n\t" \
".popsection \n\t" \
".pushsection __ex_table, \"a\" \n\t" \
-   KVM_EX_ENTRY " 666b, 667b \n\t" \
+   _ASM_PTR " 666b, 667b \n\t" \
".popsection"
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/39] KVM: Device assignment: Check for privileges before assigning irq

2008-09-25 Thread Avi Kivity
From: Amit Shah <[EMAIL PROTECTED]>

Even though we don't share irqs at the moment, we should ensure
regular user processes don't try to allocate system resources.

We check for capability to access IO devices (CAP_SYS_RAWIO) before
we request_irq on behalf of the guest.

Noticed by Avi.

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4a03375..fffdf4f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -191,6 +191,11 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
  kvm_assigned_dev_interrupt_work_handler);
 
if (irqchip_in_kernel(kvm)) {
+   if (!capable(CAP_SYS_RAWIO)) {
+   return -EPERM;
+   goto out;
+   }
+
if (assigned_irq->host_irq)
match->host_irq = assigned_irq->host_irq;
else
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/39] KVM: VMX: Add Guest State Validity Checks

2008-09-25 Thread Avi Kivity
From: Mohammed Gamal <[EMAIL PROTECTED]>

This patch adds functions to check whether guest state is VMX compliant.

Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |  180 
 1 files changed, 180 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81db7d4..e889b76 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1721,6 +1721,186 @@ static void vmx_set_gdt(struct kvm_vcpu *vcpu, struct 
descriptor_table *dt)
vmcs_writel(GUEST_GDTR_BASE, dt->base);
 }
 
+static bool rmode_segment_valid(struct kvm_vcpu *vcpu, int seg)
+{
+   struct kvm_segment var;
+   u32 ar;
+
+   vmx_get_segment(vcpu, &var, seg);
+   ar = vmx_segment_access_rights(&var);
+
+   if (var.base != (var.selector << 4))
+   return false;
+   if (var.limit != 0x)
+   return false;
+   if (ar != 0xf3)
+   return false;
+
+   return true;
+}
+
+static bool code_segment_valid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs;
+   unsigned int cs_rpl;
+
+   vmx_get_segment(vcpu, &cs, VCPU_SREG_CS);
+   cs_rpl = cs.selector & SELECTOR_RPL_MASK;
+
+   if (~cs.type & (AR_TYPE_CODE_MASK|AR_TYPE_ACCESSES_MASK))
+   return false;
+   if (!cs.s)
+   return false;
+   if (!(~cs.type & (AR_TYPE_CODE_MASK|AR_TYPE_WRITEABLE_MASK))) {
+   if (cs.dpl > cs_rpl)
+   return false;
+   } else if (cs.type & AR_TYPE_CODE_MASK) {
+   if (cs.dpl != cs_rpl)
+   return false;
+   }
+   if (!cs.present)
+   return false;
+
+   /* TODO: Add Reserved field check, this'll require a new member in the 
kvm_segment_field structure */
+   return true;
+}
+
+static bool stack_segment_valid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment ss;
+   unsigned int ss_rpl;
+
+   vmx_get_segment(vcpu, &ss, VCPU_SREG_SS);
+   ss_rpl = ss.selector & SELECTOR_RPL_MASK;
+
+   if ((ss.type != 3) || (ss.type != 7))
+   return false;
+   if (!ss.s)
+   return false;
+   if (ss.dpl != ss_rpl) /* DPL != RPL */
+   return false;
+   if (!ss.present)
+   return false;
+
+   return true;
+}
+
+static bool data_segment_valid(struct kvm_vcpu *vcpu, int seg)
+{
+   struct kvm_segment var;
+   unsigned int rpl;
+
+   vmx_get_segment(vcpu, &var, seg);
+   rpl = var.selector & SELECTOR_RPL_MASK;
+
+   if (!var.s)
+   return false;
+   if (!var.present)
+   return false;
+   if (~var.type & (AR_TYPE_CODE_MASK|AR_TYPE_WRITEABLE_MASK)) {
+   if (var.dpl < rpl) /* DPL < RPL */
+   return false;
+   }
+
+   /* TODO: Add other members to kvm_segment_field to allow checking for 
other access
+* rights flags
+*/
+   return true;
+}
+
+static bool tr_valid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment tr;
+
+   vmx_get_segment(vcpu, &tr, VCPU_SREG_TR);
+
+   if (tr.selector & SELECTOR_TI_MASK) /* TI = 1 */
+   return false;
+   if ((tr.type != 3) || (tr.type != 11)) /* TODO: Check if guest is in 
IA32e mode */
+   return false;
+   if (!tr.present)
+   return false;
+
+   return true;
+}
+
+static bool ldtr_valid(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment ldtr;
+
+   vmx_get_segment(vcpu, &ldtr, VCPU_SREG_LDTR);
+
+   if (ldtr.selector & SELECTOR_TI_MASK)   /* TI = 1 */
+   return false;
+   if (ldtr.type != 2)
+   return false;
+   if (!ldtr.present)
+   return false;
+
+   return true;
+}
+
+static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs, ss;
+
+   vmx_get_segment(vcpu, &cs, VCPU_SREG_CS);
+   vmx_get_segment(vcpu, &ss, VCPU_SREG_SS);
+
+   return ((cs.selector & SELECTOR_RPL_MASK) ==
+(ss.selector & SELECTOR_RPL_MASK));
+}
+
+/*
+ * Check if guest state is valid. Returns true if valid, false if
+ * not.
+ * We assume that registers are always usable
+ */
+static bool guest_state_valid(struct kvm_vcpu *vcpu)
+{
+   /* real mode guest state checks */
+   if (!(vcpu->arch.cr0 & X86_CR0_PE)) {
+   if (!rmode_segment_valid(vcpu, VCPU_SREG_CS))
+   return false;
+   if (!rmode_segment_valid(vcpu, VCPU_SREG_SS))
+   return false;
+   if (!rmode_segment_valid(vcpu, VCPU_SREG_DS))
+   return false;
+   if (!rmode_segment_valid(vcpu, VCPU_SREG_ES))
+   return false;
+   if (!rmode_segment_valid(vcpu, VCPU_SREG_FS))
+   return false;
+   if (!rmode_segment_valid(vcpu,

[PATCH 10/39] KVM: VMX: Add module parameter and emulation flag.

2008-09-25 Thread Avi Kivity
From: Mohammed Gamal <[EMAIL PROTECTED]>

The patch adds the module parameter required to enable emulating invalid
guest state, as well as the emulation_required flag used to drive
emulation whenever needed.

Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e889b76..7c5f611 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -49,6 +49,9 @@ module_param(flexpriority_enabled, bool, 0);
 static int enable_ept = 1;
 module_param(enable_ept, bool, 0);
 
+static int emulate_invalid_guest_state = 0;
+module_param(emulate_invalid_guest_state, bool, 0);
+
 struct vmcs {
u32 revision_id;
u32 abort;
@@ -86,6 +89,7 @@ struct vcpu_vmx {
} irq;
} rmode;
int vpid;
+   bool emulation_required;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/39] KVM: VMX: Add invalid guest state handler

2008-09-25 Thread Avi Kivity
From: Mohammed Gamal <[EMAIL PROTECTED]>

This adds the invalid guest state handler function which invokes the x86
emulator until getting the guest to a VMX-friendly state.

[avi: leave atomic context if scheduling]
[guillaume: return to atomic context correctly]

Signed-off-by: Laurent Vivier <[EMAIL PROTECTED]>
Signed-off-by: Guillaume Thouvenin <[EMAIL PROTECTED]>
Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   37 +
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7c5f611..eae1f2c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2892,6 +2892,43 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static void handle_invalid_guest_state(struct kvm_vcpu *vcpu,
+   struct kvm_run *kvm_run)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int err;
+
+   preempt_enable();
+   local_irq_enable();
+
+   while (!guest_state_valid(vcpu)) {
+   err = emulate_instruction(vcpu, kvm_run, 0, 0, 0);
+
+   switch (err) {
+   case EMULATE_DONE:
+   break;
+   case EMULATE_DO_MMIO:
+   kvm_report_emulation_failure(vcpu, "mmio");
+   /* TODO: Handle MMIO */
+   return;
+   default:
+   kvm_report_emulation_failure(vcpu, "emulation 
failure");
+   return;
+   }
+
+   if (signal_pending(current))
+   break;
+   if (need_resched())
+   schedule();
+   }
+
+   local_irq_disable();
+   preempt_disable();
+
+   /* Guest state should be valid now, no more emulation should be needed 
*/
+   vmx->emulation_required = 0;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/39] KVM: Use kvm_set_irq to inject interrupts

2008-09-25 Thread Avi Kivity
From: Amit Shah <[EMAIL PROTECTED]>

... instead of using the pic and ioapic variants

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8254.c |6 ++
 arch/x86/kvm/x86.c   |8 +---
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c842060..fdaa0f0 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -596,10 +596,8 @@ void kvm_free_pit(struct kvm *kvm)
 static void __inject_pit_timer_intr(struct kvm *kvm)
 {
mutex_lock(&kvm->lock);
-   kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 1);
-   kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 0);
-   kvm_pic_set_irq(pic_irqchip(kvm), 0, 1);
-   kvm_pic_set_irq(pic_irqchip(kvm), 0, 0);
+   kvm_set_irq(kvm, 0, 1);
+   kvm_set_irq(kvm, 0, 0);
mutex_unlock(&kvm->lock);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fffdf4f..5b3c882 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1956,13 +1956,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
goto out;
if (irqchip_in_kernel(kvm)) {
mutex_lock(&kvm->lock);
-   if (irq_event.irq < 16)
-   kvm_pic_set_irq(pic_irqchip(kvm),
-   irq_event.irq,
-   irq_event.level);
-   kvm_ioapic_set_irq(kvm->arch.vioapic,
-   irq_event.irq,
-   irq_event.level);
+   kvm_set_irq(kvm, irq_event.irq, irq_event.level);
mutex_unlock(&kvm->lock);
r = 0;
}
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/39] KVM: VMX: Modify mode switching and vmentry functions

2008-09-25 Thread Avi Kivity
From: Mohammed Gamal <[EMAIL PROTECTED]>

This patch modifies mode switching and vmentry function in order to
drive invalid guest state emulation.

Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index eae1f2c..9840f37 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1298,7 +1298,9 @@ static void fix_pmode_dataseg(int seg, struct 
kvm_save_segment *save)
 static void enter_pmode(struct kvm_vcpu *vcpu)
 {
unsigned long flags;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+   vmx->emulation_required = 1;
vcpu->arch.rmode.active = 0;
 
vmcs_writel(GUEST_TR_BASE, vcpu->arch.rmode.tr.base);
@@ -1315,6 +1317,9 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
 
update_exception_bitmap(vcpu);
 
+   if (emulate_invalid_guest_state)
+   return;
+
fix_pmode_dataseg(VCPU_SREG_ES, &vcpu->arch.rmode.es);
fix_pmode_dataseg(VCPU_SREG_DS, &vcpu->arch.rmode.ds);
fix_pmode_dataseg(VCPU_SREG_GS, &vcpu->arch.rmode.gs);
@@ -1355,7 +1360,9 @@ static void fix_rmode_seg(int seg, struct 
kvm_save_segment *save)
 static void enter_rmode(struct kvm_vcpu *vcpu)
 {
unsigned long flags;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+   vmx->emulation_required = 1;
vcpu->arch.rmode.active = 1;
 
vcpu->arch.rmode.tr.base = vmcs_readl(GUEST_TR_BASE);
@@ -1377,6 +1384,9 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
vmcs_writel(GUEST_CR4, vmcs_readl(GUEST_CR4) | X86_CR4_VME);
update_exception_bitmap(vcpu);
 
+   if (emulate_invalid_guest_state)
+   goto continue_rmode;
+
vmcs_write16(GUEST_SS_SELECTOR, vmcs_readl(GUEST_SS_BASE) >> 4);
vmcs_write32(GUEST_SS_LIMIT, 0x);
vmcs_write32(GUEST_SS_AR_BYTES, 0xf3);
@@ -1392,6 +1402,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
fix_rmode_seg(VCPU_SREG_GS, &vcpu->arch.rmode.gs);
fix_rmode_seg(VCPU_SREG_FS, &vcpu->arch.rmode.fs);
 
+continue_rmode:
kvm_mmu_reset_context(vcpu);
init_rmode(vcpu->kvm);
 }
@@ -2317,6 +2328,9 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
ret = 0;
 
+   /* HACK: Don't enable emulation on guest boot/reset */
+   vmx->emulation_required = 0;
+
 out:
up_read(&vcpu->kvm->slots_lock);
return ret;
@@ -3190,6 +3204,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 intr_info;
 
+   /* Handle invalid guest state instead of entering VMX */
+   if (vmx->emulation_required && emulate_invalid_guest_state) {
+   handle_invalid_guest_state(vcpu, kvm_run);
+   return;
+   }
+
if (test_bit(VCPU_REGS_RSP, (unsigned long *)&vcpu->arch.regs_dirty))
vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/39] KVM: make irq ack notifier functions static

2008-09-25 Thread Avi Kivity
From: Harvey Harrison <[EMAIL PROTECTED]>

sparse says:

arch/x86/kvm/x86.c:107:32: warning: symbol 'kvm_find_assigned_dev' was not 
declared. Should it be static?
arch/x86/kvm/i8254.c:225:6: warning: symbol 'kvm_pit_ack_irq' was not declared. 
Should it be static?

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8254.c |2 +-
 arch/x86/kvm/x86.c   |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fdaa0f0..4cb4430 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -222,7 +222,7 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu)
return 0;
 }
 
-void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
+static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 irq_ack_notifier);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b3c882..22edd95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -104,7 +104,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
-struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
+static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
  int assigned_dev_id)
 {
struct list_head *ptr;
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/39] KVM: ia64: add a dummy irq ack notification

2008-09-25 Thread Avi Kivity
From: Xiantao Zhang <[EMAIL PROTECTED]>

Before enabling notify_acked_irq for ia64, leave the related APIs as
nop-op first.

Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/ia64/kvm/irq.h |   32 
 virt/kvm/ioapic.c   |2 +-
 2 files changed, 33 insertions(+), 1 deletions(-)
 create mode 100644 arch/ia64/kvm/irq.h

diff --git a/arch/ia64/kvm/irq.h b/arch/ia64/kvm/irq.h
new file mode 100644
index 000..f2e6545
--- /dev/null
+++ b/arch/ia64/kvm/irq.h
@@ -0,0 +1,32 @@
+/*
+ * irq.h: In-kernel interrupt controller related definitions
+ * Copyright (c) 2008, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Xiantao Zhang <[EMAIL PROTECTED]>
+ *
+ */
+
+#ifndef __IRQ_H
+#define __IRQ_H
+
+struct kvm;
+
+static inline void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
+{
+}
+
+#endif
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 515cd7c..53772bb 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -386,7 +386,7 @@ static void ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
break;
 #ifdef CONFIG_IA64
case IOAPIC_REG_EOI:
-   kvm_ioapic_update_eoi(ioapic->kvm, data);
+   kvm_ioapic_update_eoi(ioapic->kvm, data, IOAPIC_LEVEL_TRIG);
break;
 #endif
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/39] KVM: VMX: Change cs reset state to be a data segment

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Real mode cs is a data segment, not a code segment.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9840f37..6aa305a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2239,6 +2239,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
fx_init(&vmx->vcpu);
 
+   seg_setup(VCPU_SREG_CS);
/*
 * GUEST_CS_BASE should really be 0x, but VT vm86 mode
 * insists on having GUEST_CS_BASE == GUEST_CS_SELECTOR << 4.  Sigh.
@@ -2250,8 +2251,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 
8);
vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
}
-   vmcs_write32(GUEST_CS_LIMIT, 0x);
-   vmcs_write32(GUEST_CS_AR_BYTES, 0x9b);
 
seg_setup(VCPU_SREG_DS);
seg_setup(VCPU_SREG_ES);
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/39] KVM: VMX: Change segment dpl at reset to 3

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

This is more emulation friendly, if not 100% correct.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6aa305a..71e57ae 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1991,7 +1991,7 @@ static void seg_setup(int seg)
vmcs_write16(sf->selector, 0);
vmcs_writel(sf->base, 0);
vmcs_write32(sf->limit, 0x);
-   vmcs_write32(sf->ar_bytes, 0x93);
+   vmcs_write32(sf->ar_bytes, 0xf3);
 }
 
 static int alloc_apic_access_page(struct kvm *kvm)
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/39] KVM: Load real mode segments correctly

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Real mode segments to not reference the GDT or LDT; they simply compute
base = selector * 16.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 22edd95..bfc7c33 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3588,11 +3588,33 @@ static int load_segment_descriptor_to_kvm_desct(struct 
kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg)
+{
+   struct kvm_segment segvar = {
+   .base = selector << 4,
+   .limit = 0x,
+   .selector = selector,
+   .type = 3,
+   .present = 1,
+   .dpl = 3,
+   .db = 0,
+   .s = 1,
+   .l = 0,
+   .g = 0,
+   .avl = 0,
+   .unusable = 0,
+   };
+   kvm_x86_ops->set_segment(vcpu, &segvar, seg);
+   return 0;
+}
+
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
int type_bits, int seg)
 {
struct kvm_segment kvm_seg;
 
+   if (!(vcpu->arch.cr0 & X86_CR0_PE))
+   return kvm_load_realmode_segment(vcpu, selector, seg);
if (load_segment_descriptor_to_kvm_desct(vcpu, selector, &kvm_seg))
return 1;
kvm_seg.type |= type_bits;
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/39] KVM: x86 emulator: remove bad ByteOp specifier from NEG descriptor

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86_emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 5d6c144..ae30435 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -270,7 +270,7 @@ static u16 group_table[] = {
0, 0, 0, 0,
[Group3*8] =
DstMem | SrcImm | ModRM, 0,
-   DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
+   DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM,
0, 0, 0, 0,
[Group4*8] =
ByteOp | DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/39] KVM: x86 emulator: remove duplicate SrcImm

2008-09-25 Thread Avi Kivity
From: roel kluin <[EMAIL PROTECTED]>

Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86_emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index d5da7f1..5d6c144 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -269,7 +269,7 @@ static u16 group_table[] = {
ByteOp | DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
0, 0, 0, 0,
[Group3*8] =
-   DstMem | SrcImm | ModRM | SrcImm, 0,
+   DstMem | SrcImm | ModRM, 0,
DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
0, 0, 0, 0,
[Group4*8] =
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/39] KVM: MMU: Move SHADOW_PT_INDEX to mmu.c

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

It is not specific to the paging mode, so can be made global (and reusable).

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |2 ++
 arch/x86/kvm/paging_tmpl.h |3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 171bcea..51d4cd7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -135,6 +135,8 @@ module_param(dbg, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
+
 struct kvm_rmap_desc {
u64 *shadow_ptes[RMAP_EXT];
struct kvm_rmap_desc *more;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4a814bf..ebb26a0 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -29,7 +29,6 @@
#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
#define PT_DIR_BASE_ADDR_MASK PT64_DIR_BASE_ADDR_MASK
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
-   #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
#define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level)
#define PT_LEVEL_BITS PT64_LEVEL_BITS
#ifdef CONFIG_X86_64
@@ -46,7 +45,6 @@
#define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK
#define PT_DIR_BASE_ADDR_MASK PT32_DIR_BASE_ADDR_MASK
#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
-   #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
#define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level)
#define PT_LEVEL_BITS PT32_LEVEL_BITS
#define PT_MAX_FULL_LEVELS 2
@@ -504,7 +502,6 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 #undef FNAME
 #undef PT_BASE_ADDR_MASK
 #undef PT_INDEX
-#undef SHADOW_PT_INDEX
 #undef PT_LEVEL_MASK
 #undef PT_DIR_BASE_ADDR_MASK
 #undef PT_LEVEL_BITS
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/39] KVM: MMU: Infer shadow root level in direct_map()

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

In all cases the shadow root level is available in mmu.shadow_root_level,
so there is no need to pass it as a parameter.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3ee856f..72f739a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1227,11 +1227,11 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-  int largepage, gfn_t gfn, pfn_t pfn,
-  int level)
+   int largepage, gfn_t gfn, pfn_t pfn)
 {
hpa_t table_addr = vcpu->arch.mmu.root_hpa;
int pt_write = 0;
+   int level = vcpu->arch.mmu.shadow_root_level;
 
for (; ; level--) {
u32 index = PT64_INDEX(v, level);
@@ -1299,8 +1299,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn)
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
-   r = __direct_map(vcpu, v, write, largepage, gfn, pfn,
-PT32E_ROOT_LEVEL);
+   r = __direct_map(vcpu, v, write, largepage, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
 
 
@@ -1455,7 +1454,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa,
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
-largepage, gfn, pfn, kvm_x86_ops->get_tdp_level());
+largepage, gfn, pfn);
spin_unlock(&vcpu->kvm->mmu_lock);
 
return r;
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/39] KVM: MMU: Add generic shadow walker

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

We currently walk the shadow page tables in two places: direct map (for
real mode and two dimensional paging) and paging mode shadow.  Since we
anticipate requiring a third walk (for invlpg), it makes sense to have
a generic facility for shadow walk.

This patch adds such a shadow walker, walks the page tables and calls a
method for every spte encountered.  The method can examine the spte,
modify it, or even instantiate it.  The walk can be aborted by returning
nonzero from the method.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |   34 ++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 72f739a..8b95cf7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -142,6 +142,11 @@ struct kvm_rmap_desc {
struct kvm_rmap_desc *more;
 };
 
+struct kvm_shadow_walk {
+   int (*entry)(struct kvm_shadow_walk *walk, struct kvm_vcpu *vcpu,
+gva_t addr, u64 *spte, int level);
+};
+
 static struct kmem_cache *pte_chain_cache;
 static struct kmem_cache *rmap_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
@@ -935,6 +940,35 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
return sp;
 }
 
+static int walk_shadow(struct kvm_shadow_walk *walker,
+  struct kvm_vcpu *vcpu, gva_t addr)
+{
+   hpa_t shadow_addr;
+   int level;
+   int r;
+   u64 *sptep;
+   unsigned index;
+
+   shadow_addr = vcpu->arch.mmu.root_hpa;
+   level = vcpu->arch.mmu.shadow_root_level;
+   if (level == PT32E_ROOT_LEVEL) {
+   shadow_addr = vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
+   shadow_addr &= PT64_BASE_ADDR_MASK;
+   --level;
+   }
+
+   while (level >= PT_PAGE_TABLE_LEVEL) {
+   index = SHADOW_PT_INDEX(addr, level);
+   sptep = ((u64 *)__va(shadow_addr)) + index;
+   r = walker->entry(walker, vcpu, addr, sptep, level);
+   if (r)
+   return r;
+   shadow_addr = *sptep & PT64_BASE_ADDR_MASK;
+   --level;
+   }
+   return 0;
+}
+
 static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 struct kvm_mmu_page *sp)
 {
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/39] KVM: ia64: Enable virtio driver for ia64 in Kconfig

2008-09-25 Thread Avi Kivity
From: Xiantao Zhang <[EMAIL PROTECTED]>

kvm/ia64 uses the virtio drivers to optimize its I/O subsytem.

Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/ia64/kvm/Kconfig |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index 7914e48..8e99fed 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -46,4 +46,6 @@ config KVM_INTEL
 config KVM_TRACE
bool
 
+source drivers/virtio/Kconfig
+
 endif # VIRTUALIZATION
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 28/39] KVM: MMU: Convert the paging mode shadow walk to use the generic walker

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/paging_tmpl.h |  158 
 1 files changed, 86 insertions(+), 72 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index ebb26a0..b7064e1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -25,6 +25,7 @@
 #if PTTYPE == 64
#define pt_element_t u64
#define guest_walker guest_walker64
+   #define shadow_walker shadow_walker64
#define FNAME(name) paging##64_##name
#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
#define PT_DIR_BASE_ADDR_MASK PT64_DIR_BASE_ADDR_MASK
@@ -41,6 +42,7 @@
 #elif PTTYPE == 32
#define pt_element_t u32
#define guest_walker guest_walker32
+   #define shadow_walker shadow_walker32
#define FNAME(name) paging##32_##name
#define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK
#define PT_DIR_BASE_ADDR_MASK PT32_DIR_BASE_ADDR_MASK
@@ -71,6 +73,17 @@ struct guest_walker {
u32 error_code;
 };
 
+struct shadow_walker {
+   struct kvm_shadow_walk walker;
+   struct guest_walker *guest_walker;
+   int user_fault;
+   int write_fault;
+   int largepage;
+   int *ptwrite;
+   pfn_t pfn;
+   u64 *sptep;
+};
+
 static gfn_t gpte_to_gfn(pt_element_t gpte)
 {
return (gpte & PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -272,86 +285,86 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
 /*
  * Fetch a shadow pte for a specific level in the paging hierarchy.
  */
-static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
-struct guest_walker *walker,
-int user_fault, int write_fault, int largepage,
-int *ptwrite, pfn_t pfn)
+static int FNAME(shadow_walk_entry)(struct kvm_shadow_walk *_sw,
+   struct kvm_vcpu *vcpu, gva_t addr,
+   u64 *sptep, int level)
 {
-   hpa_t shadow_addr;
-   int level;
-   u64 *shadow_ent;
-   unsigned access = walker->pt_access;
-
-   if (!is_present_pte(walker->ptes[walker->level - 1]))
-   return NULL;
-
-   shadow_addr = vcpu->arch.mmu.root_hpa;
-   level = vcpu->arch.mmu.shadow_root_level;
-   if (level == PT32E_ROOT_LEVEL) {
-   shadow_addr = vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
-   shadow_addr &= PT64_BASE_ADDR_MASK;
-   --level;
+   struct shadow_walker *sw =
+   container_of(_sw, struct shadow_walker, walker);
+   struct guest_walker *gw = sw->guest_walker;
+   unsigned access = gw->pt_access;
+   struct kvm_mmu_page *shadow_page;
+   u64 spte;
+   int metaphysical;
+   gfn_t table_gfn;
+   int r;
+   pt_element_t curr_pte;
+
+   if (level == PT_PAGE_TABLE_LEVEL
+   || (sw->largepage && level == PT_DIRECTORY_LEVEL)) {
+   mmu_set_spte(vcpu, sptep, access, gw->pte_access & access,
+sw->user_fault, sw->write_fault,
+gw->ptes[gw->level-1] & PT_DIRTY_MASK,
+sw->ptwrite, sw->largepage, gw->gfn, sw->pfn,
+false);
+   sw->sptep = sptep;
+   return 1;
}
 
-   for (; ; level--) {
-   u32 index = SHADOW_PT_INDEX(addr, level);
-   struct kvm_mmu_page *shadow_page;
-   u64 shadow_pte;
-   int metaphysical;
-   gfn_t table_gfn;
-
-   shadow_ent = ((u64 *)__va(shadow_addr)) + index;
-   if (level == PT_PAGE_TABLE_LEVEL)
-   break;
-
-   if (largepage && level == PT_DIRECTORY_LEVEL)
-   break;
-
-   if (is_shadow_present_pte(*shadow_ent)
-   && !is_large_pte(*shadow_ent)) {
-   shadow_addr = *shadow_ent & PT64_BASE_ADDR_MASK;
-   continue;
-   }
+   if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+   return 0;
 
-   if (is_large_pte(*shadow_ent))
-   rmap_remove(vcpu->kvm, shadow_ent);
-
-   if (level - 1 == PT_PAGE_TABLE_LEVEL
-   && walker->level == PT_DIRECTORY_LEVEL) {
-   metaphysical = 1;
-   if (!is_dirty_pte(walker->ptes[level - 1]))
-   access &= ~ACC_WRITE_MASK;
-   table_gfn = gpte_to_gfn(walker->ptes[level - 1]);
-   } else {
-   metaphysical = 0;
-   table_gfn = walker->table_gfn[level - 2];
-   }
-   shadow_page = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1,
-  metaphysical, access,
-   

[PATCH 29/39] KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

There is no reason to share internal memory slots with fork()ed instances.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfc7c33..675d010 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4296,7 +4296,7 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
userspace_addr = do_mmap(NULL, 0,
 npages * PAGE_SIZE,
 PROT_READ | PROT_WRITE,
-MAP_SHARED | MAP_ANONYMOUS,
+MAP_PRIVATE | MAP_ANONYMOUS,
 0);
up_write(¤t->mm->mmap_sem);
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 32/39] KVM: MMU: Account for npt/ept/realmode page faults

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Now that two-dimensional paging is becoming common, account for tdp page
faults.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a1ca4ff..a24da8f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1283,6 +1283,7 @@ static int direct_map_entry(struct kvm_shadow_walk *_walk,
mmu_set_spte(vcpu, sptep, ACC_ALL, ACC_ALL,
 0, walk->write, 1, &walk->pt_write,
 walk->largepage, gfn, walk->pfn, false);
+   ++vcpu->stat.pf_fixed;
return 1;
}
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 30/39] KVM: Don't call get_user_pages(.force = 1)

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

This is esoteric and only needed to break COW on MAP_SHARED mappings.  Since
KVM no longer does these sorts of mappings, breaking COW on them is no longer
necessary.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 virt/kvm/kvm_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0309571..de3b029 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -734,7 +734,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
return page_to_pfn(bad_page);
}
 
-   npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
+   npages = get_user_pages(current, current->mm, addr, 1, 1, 0, page,
NULL);
 
if (unlikely(npages != 1)) {
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 33/39] KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

It was generally safe due to slots_lock being held for write, but it wasn't
very nice.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a24da8f..5052acd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2097,6 +2097,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
 {
struct kvm_mmu_page *sp;
 
+   spin_lock(&kvm->mmu_lock);
list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
int i;
u64 *pt;
@@ -2110,6 +2111,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
if (pt[i] & PT_WRITABLE_MASK)
pt[i] &= ~PT_WRITABLE_MASK;
}
+   spin_unlock(&kvm->mmu_lock);
 }
 
 void kvm_mmu_zap_all(struct kvm *kvm)
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 31/39] KVM: x86 emulator: Add mov r, imm instructions (opcodes 0xb0-0xbf)

2008-09-25 Thread Avi Kivity
From: Mohammed Gamal <[EMAIL PROTECTED]>

The emulator only supported one instance of mov r, imm instruction
(opcode 0xb8), this adds the rest of these instructions.

Signed-off-by: Mohammed Gamal <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86_emulate.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index ae30435..66e0bd6 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -154,9 +154,16 @@ static u16 opcode_table[256] = {
0, 0, ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
ByteOp | ImplicitOps | String, ImplicitOps | String,
-   /* 0xB0 - 0xBF */
-   0, 0, 0, 0, 0, 0, 0, 0,
-   DstReg | SrcImm | Mov, 0, 0, 0, 0, 0, 0, 0,
+   /* 0xB0 - 0xB7 */
+   ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
+   ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
+   ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
+   ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
+   /* 0xB8 - 0xBF */
+   DstReg | SrcImm | Mov, DstReg | SrcImm | Mov,
+   DstReg | SrcImm | Mov, DstReg | SrcImm | Mov,
+   DstReg | SrcImm | Mov, DstReg | SrcImm | Mov,
+   DstReg | SrcImm | Mov, DstReg | SrcImm | Mov,
/* 0xC0 - 0xC7 */
ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImmByte | ModRM,
0, ImplicitOps | Stack, 0, 0,
@@ -1660,7 +1667,7 @@ special_insn:
case 0xae ... 0xaf: /* scas */
DPRINTF("Urk! I don't handle SCAS.\n");
goto cannot_emulate;
-   case 0xb8: /* mov r, imm */
+   case 0xb0 ... 0xbf: /* mov r, imm */
goto mov;
case 0xc0 ... 0xc1:
emulate_grp2(ctxt);
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 34/39] KVM: MMU: Flush tlbs after clearing write permission when accessing dirty log

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Otherwise, the cpu may allow writes to the tracked pages, and we lose
some display bits or fail to migrate correctly.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5052acd..853a288 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2111,6 +2111,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
if (pt[i] & PT_WRITABLE_MASK)
pt[i] &= ~PT_WRITABLE_MASK;
}
+   kvm_flush_remote_tlbs(kvm);
spin_unlock(&kvm->mmu_lock);
 }
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/39] KVM: MMU: Convert direct maps to use the generic shadow walker

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |   93 ++-
 1 files changed, 55 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8b95cf7..a1ca4ff 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1260,49 +1260,66 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 {
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int largepage, gfn_t gfn, pfn_t pfn)
-{
-   hpa_t table_addr = vcpu->arch.mmu.root_hpa;
-   int pt_write = 0;
-   int level = vcpu->arch.mmu.shadow_root_level;
-
-   for (; ; level--) {
-   u32 index = PT64_INDEX(v, level);
-   u64 *table;
+struct direct_shadow_walk {
+   struct kvm_shadow_walk walker;
+   pfn_t pfn;
+   int write;
+   int largepage;
+   int pt_write;
+};
 
-   ASSERT(VALID_PAGE(table_addr));
-   table = __va(table_addr);
+static int direct_map_entry(struct kvm_shadow_walk *_walk,
+   struct kvm_vcpu *vcpu,
+   gva_t addr, u64 *sptep, int level)
+{
+   struct direct_shadow_walk *walk =
+   container_of(_walk, struct direct_shadow_walk, walker);
+   struct kvm_mmu_page *sp;
+   gfn_t pseudo_gfn;
+   gfn_t gfn = addr >> PAGE_SHIFT;
+
+   if (level == PT_PAGE_TABLE_LEVEL
+   || (walk->largepage && level == PT_DIRECTORY_LEVEL)) {
+   mmu_set_spte(vcpu, sptep, ACC_ALL, ACC_ALL,
+0, walk->write, 1, &walk->pt_write,
+walk->largepage, gfn, walk->pfn, false);
+   return 1;
+   }
 
-   if (level == 1 || (largepage && level == 2)) {
-   mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
-0, write, 1, &pt_write, largepage,
-gfn, pfn, false);
-   return pt_write;
+   if (*sptep == shadow_trap_nonpresent_pte) {
+   pseudo_gfn = (addr & PT64_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
+   sp = kvm_mmu_get_page(vcpu, pseudo_gfn, addr, level - 1,
+ 1, ACC_ALL, sptep);
+   if (!sp) {
+   pgprintk("nonpaging_map: ENOMEM\n");
+   kvm_release_pfn_clean(walk->pfn);
+   return -ENOMEM;
}
 
-   if (table[index] == shadow_trap_nonpresent_pte) {
-   struct kvm_mmu_page *new_table;
-   gfn_t pseudo_gfn;
-
-   pseudo_gfn = (v & PT64_DIR_BASE_ADDR_MASK)
-   >> PAGE_SHIFT;
-   new_table = kvm_mmu_get_page(vcpu, pseudo_gfn,
-v, level - 1,
-1, ACC_ALL, &table[index]);
-   if (!new_table) {
-   pgprintk("nonpaging_map: ENOMEM\n");
-   kvm_release_pfn_clean(pfn);
-   return -ENOMEM;
-   }
-
-   set_shadow_pte(&table[index],
-  __pa(new_table->spt)
-  | PT_PRESENT_MASK | PT_WRITABLE_MASK
-  | shadow_user_mask | shadow_x_mask);
-   }
-   table_addr = table[index] & PT64_BASE_ADDR_MASK;
+   set_shadow_pte(sptep,
+  __pa(sp->spt)
+  | PT_PRESENT_MASK | PT_WRITABLE_MASK
+  | shadow_user_mask | shadow_x_mask);
}
+   return 0;
+}
+
+static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
+   int largepage, gfn_t gfn, pfn_t pfn)
+{
+   int r;
+   struct direct_shadow_walk walker = {
+   .walker = { .entry = direct_map_entry, },
+   .pfn = pfn,
+   .largepage = largepage,
+   .write = write,
+   .pt_write = 0,
+   };
+
+   r = walk_shadow(&walker.walker, vcpu, (gva_t)gfn << PAGE_SHIFT);
+   if (r < 0)
+   return r;
+   return walker.pt_write;
 }
 
 static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/39] KVM: SVM: Fix typo

2008-09-25 Thread Avi Kivity
From: Amit Shah <[EMAIL PROTECTED]>

Fix typo in as-yet unused macro definition.

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/svm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 179c2e0..be86c09 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -44,7 +44,7 @@ MODULE_LICENSE("GPL");
 
 #define SVM_FEATURE_NPT  (1 << 0)
 #define SVM_FEATURE_LBRV (1 << 1)
-#define SVM_DEATURE_SVML (1 << 2)
+#define SVM_FEATURE_SVML (1 << 2)
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 35/39] KVM: MMU: Fix setting the accessed bit on non-speculative sptes

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

The accessed bit was accidentally turned on in a random flag word, rather
than, the spte itself, which was lucky, since it used the non-EPT compatible
PT_ACCESSED_MASK.

Fix by turning the bit on in the spte and changing it to use the portable
accessed mask.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 853a288..866d713 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1192,7 +1192,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 */
spte = shadow_base_present_pte | shadow_dirty_mask;
if (!speculative)
-   pte_access |= PT_ACCESSED_MASK;
+   spte |= shadow_accessed_mask;
if (!dirty)
pte_access &= ~ACC_WRITE_MASK;
if (pte_access & ACC_EXEC_MASK)
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/39] KVM: MMU: Unify direct map 4K and large page paths

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

The two paths are equivalent except for one argument, which is already
available.  Merge the two codepaths.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/mmu.c |   11 +++
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 51d4cd7..3ee856f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1240,15 +1240,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
ASSERT(VALID_PAGE(table_addr));
table = __va(table_addr);
 
-   if (level == 1) {
+   if (level == 1 || (largepage && level == 2)) {
mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
-0, write, 1, &pt_write, 0, gfn, pfn, 
false);
-   return pt_write;
-   }
-
-   if (largepage && level == 2) {
-   mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
-0, write, 1, &pt_write, 1, gfn, pfn, 
false);
+0, write, 1, &pt_write, largepage,
+gfn, pfn, false);
return pt_write;
}
 
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 36/39] KVM: SVM: No need to unprotect memory during event injection when using npt

2008-09-25 Thread Avi Kivity
From: Avi Kivity <[EMAIL PROTECTED]>

No memory is protected anyway.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/svm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index be86c09..6022888 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1021,7 +1021,7 @@ static int pf_interception(struct vcpu_svm *svm, struct 
kvm_run *kvm_run)
if (npt_enabled)
svm_flush_tlb(&svm->vcpu);
 
-   if (event_injection)
+   if (!npt_enabled && event_injection)
kvm_mmu_unprotect_page_virt(&svm->vcpu, fault_address);
return kvm_mmu_page_fault(&svm->vcpu, fault_address, error_code);
 }
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 37/39] KVM: add MC5_MISC msr read support

2008-09-25 Thread Avi Kivity
From: Joerg Roedel <[EMAIL PROTECTED]>

Currently KVM implements MC0-MC4_MISC read support. When booting Linux this
results in KVM warnings in the kernel log when the guest tries to read
MC5_MISC. Fix this warnings with this patch.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 675d010..e3b8966 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -991,6 +991,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
case MSR_IA32_MC0_MISC+8:
case MSR_IA32_MC0_MISC+12:
case MSR_IA32_MC0_MISC+16:
+   case MSR_IA32_MC0_MISC+20:
case MSR_IA32_UCODE_REV:
case MSR_IA32_EBL_CR_POWERON:
case MSR_IA32_DEBUGCTLMSR:
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/39] KVM: remove unused field from the assigned dev struct

2008-09-25 Thread Avi Kivity
From: Ben-Ami Yassour <[EMAIL PROTECTED]>

Remove unused field: struct kvm_assigned_pci_dev assigned_dev
from struct: struct kvm_assigned_dev_kernel

Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/asm-x86/kvm_host.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index fb7d7b7..1161af1 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -341,7 +341,6 @@ struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct work_struct interrupt_work;
struct list_head list;
-   struct kvm_assigned_pci_dev assigned_dev;
int assigned_dev_id;
int host_busnr;
int host_devfn;
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 38/39] KVM: s390: Make facility bits future-proof

2008-09-25 Thread Avi Kivity
From: Christian Borntraeger <[EMAIL PROTECTED]>

Heiko Carstens pointed out, that its safer to activate working facilities
instead of disabling problematic facilities. The new code uses the host
facility bits and masks it with known good ones.

Signed-off-by: Christian Borntraeger <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/s390/kvm/priv.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index d1faf5c..cce40ff 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -157,8 +157,8 @@ static int handle_stfl(struct kvm_vcpu *vcpu)
int rc;
 
vcpu->stat.instruction_stfl++;
-   facility_list &= ~(1UL<<24); /* no stfle */
-   facility_list &= ~(1UL<<23); /* no large pages */
+   /* only pass the facility bits, which we can handle */
+   facility_list &= 0xfe00fff3;
 
rc = copy_to_guest(vcpu, offsetof(struct _lowcore, stfl_fac_list),
   &facility_list, sizeof(facility_list));
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 39/39] KVM: s390: change help text of guest Kconfig

2008-09-25 Thread Avi Kivity
From: Christian Borntraeger <[EMAIL PROTECTED]>

The current help text for CONFIG_S390_GUEST is not very helpful.
Lets add more text.

Signed-off-by: Christian Borntraeger <[EMAIL PROTECTED]>
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 arch/s390/Kconfig |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 8d41908..c9bfed9 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -564,13 +564,16 @@ config ZFCPDUMP
  Refer to  for more details on 
this.
 
 config S390_GUEST
-bool "s390 guest support (EXPERIMENTAL)"
+bool "s390 guest support for KVM (EXPERIMENTAL)"
depends on 64BIT && EXPERIMENTAL
select VIRTIO
select VIRTIO_RING
select VIRTIO_CONSOLE
help
- Select this option if you want to run the kernel under s390 linux
+ Select this option if you want to run the kernel as a guest under
+ the KVM hypervisor. This will add detection for KVM as well  as a
+ virtio transport. If KVM is detected, the virtio console will be
+ the default console.
 endmenu
 
 source "net/Kconfig"
-- 
1.6.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Han, Weidong
Avi Kivity wrote:
> Han, Weidong wrote:
>> Don't need to map mmio pages for iommu. When find mmio pages in
>> kvm_iommu_map_pages(), don't map them, and shouldn't return error
>> due to it's not an error. If return error (such as -EINVAL), device
>> assigment will fail. 
>> 
>> 
> 
> 
> I don't understand.  Why don't we need to map mmio pages?  We
> certainly don't want them emulated.

mmio pages need not to be mapped in VT-d page table, which only
translate DMA addresses. Amit's userspace patch register memslot for
mmios of assigned devices, it doesn't emulate them.

> 
>> @@ -36,14 +36,13 @@ int kvm_iommu_map_pages(struct kvm *kvm,  {
>>  gfn_t gfn = base_gfn;
>>  pfn_t pfn;
>> -int i, r;
>> +int i, r = 0;
>>  struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
>> 
>>  /* check if iommu exists and in use */
>>  if (!domain)
>>  return 0;
>> 
>> -r = -EINVAL;
>>  for (i = 0; i < npages; i++) {
>>  /* check if already mapped */
>>  pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
>> @@ -60,13 +59,14 @@ int kvm_iommu_map_pages(struct kvm *kvm,

>>   DMA_PTE_READ |
DMA_PTE_WRITE);
>>  if (r) {
>> -printk(KERN_DEBUG "kvm_iommu_map_pages:"
>> +printk(KERN_ERR "kvm_iommu_map_pages:"
>> "iommu failed to map pfn=%lx\n",
>> pfn);
>>  goto unmap_pages;
>>  }
>>  } else {
>> -printk(KERN_DEBUG "kvm_iommu_map_page:"
>> -   "invalid pfn=%lx\n", pfn);
>> +printk(KERN_DEBUG "kvm_iommu_map_pages:"
>> +   "invalid pfn=%lx, iommu needn't map "
>> +   "MMIO pages!\n", pfn);
>>  goto unmap_pages;
>>  }
> 
> If a slot has a mix of mmio and non-mmio pages, you will unmap the
> non-mmio pages, yet return no error.
> 

I didn't consider this mix case. In this mix case, we don't goto
unmap_pages, actually we should remove else {} block. That maps non-mmio
pages while don't map mmio pages. I will resend the patch.

Randy (Weidong)

> --
> error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Avi Kivity

Han, Weidong wrote:

Avi Kivity wrote:
  

Han, Weidong wrote:


Don't need to map mmio pages for iommu. When find mmio pages in
kvm_iommu_map_pages(), don't map them, and shouldn't return error
due to it's not an error. If return error (such as -EINVAL), device
assigment will fail. 



  

I don't understand.  Why don't we need to map mmio pages?  We
certainly don't want them emulated.



mmio pages need not to be mapped in VT-d page table, which only
translate DMA addresses. 


Right, I forgot the iommu is only for dma, not cpu accesses.

I suppose one could DMA into an mmio page.  Is there a reason not to map?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Han, Weidong
Avi Kivity wrote:
> Han, Weidong wrote:
>> Avi Kivity wrote:
>> 
>>> Han, Weidong wrote:
>>> 
 Don't need to map mmio pages for iommu. When find mmio pages in
 kvm_iommu_map_pages(), don't map them, and shouldn't return error
 due to it's not an error. If return error (such as -EINVAL),
 device assigment will fail. 
 
 
 
>>> I don't understand.  Why don't we need to map mmio pages?  We
>>> certainly don't want them emulated.
>>> 
>> 
>> mmio pages need not to be mapped in VT-d page table, which only
>> translate DMA addresses.
> 
> Right, I forgot the iommu is only for dma, not cpu accesses.
> 
> I suppose one could DMA into an mmio page.  Is there a reason not to
> map? 

Is it possible DMA into an mmio page? If yes, we also need to map mmio
pages, and is_mmio_pfn() check is not neccessary here.

Randy (Weidong)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] VMX: work around lacking VNMI support

2008-09-25 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>> ..
>>  
>>> Index: b/arch/x86/kvm/vmx.c
>>> ===
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -90,6 +90,11 @@ struct vcpu_vmx {
>>>  } rmode;
>>>  int vpid;
>>>  bool emulation_required;
>>> +
>>> +/* Support for vnmi-less CPUs */
>>> +int soft_vnmi_blocked;
>>> +ktime_t entry_time;
>>> +s64 vnmi_blocked_time;
>>> 
>>
>> I meanwhile realized that these states (except entry_time) and probably
>> also arch.nmi_pending/injected are things that should be considered when
>> the vcpu state is saved and restored, right? What is the right interface
>> for this? An extension of kvm_sregs?
>>
>>   
> 
> kvm_sregs can't be extended because that would break the ABI, so we have
> to add a new ioctl.
> 
> I have some patches that allow ioctls to be extended, so if that's
> accepted, we can avoid the new ioctl.

OK.

> 
>> BTW, via which channel is GUEST_INTERRUPTIBILITY_INFO from the vmcs
>> saved/restored? I'm currently not seeing any related, CPU-specific code.
>>   
> 
> Looks like it's missing.

As a workaround (or safety bag), is it imaginable to delay or deny VCPU
snapshots at not yet fully restorable points (like
GUEST_INTERRUPTIBILITY_INFO != 0)? Or stick-your-head-into-the-sand for now?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Avi Kivity

Han, Weidong wrote:
Is it possible DMA into an mmio page? 


I don't see why not.


If yes, we also need to map mmio
pages, and is_mmio_pfn() check is not neccessary here.
  


So we get simpler code as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Anthony Liguori

Avi Kivity wrote:

Han, Weidong wrote:
Is it possible DMA into an mmio page? 


I don't see why not.


Yeah, it is.  I mentioned this a long time ago.  We definitely need to 
map mmio pages into the VT-d mapping.


Regards,

Anthony Liguori


If yes, we also need to map mmio
pages, and is_mmio_pfn() check is not neccessary here.
  


So we get simpler code as well.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Han, Weidong
Avi and Anthony,

I will resend the patch soon. Thanks.

Randy (Weidong)

Anthony Liguori wrote:
> Avi Kivity wrote:
>> Han, Weidong wrote:
>>> Is it possible DMA into an mmio page?
>> 
>> I don't see why not.
> 
> Yeah, it is.  I mentioned this a long time ago.  We definitely need to
> map mmio pages into the VT-d mapping.
> 
> Regards,
> 
> Anthony Liguori
> 
>>> If yes, we also need to map mmio
>>> pages, and is_mmio_pfn() check is not neccessary here.
>>> 
>> 
>> So we get simpler code as well.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RESEND] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Han, Weidong
>From 61028d958dc7c57ee02de32ea89b025dccb9650d Mon Sep 17 00:00:00 2001
From: Weidong Han <[EMAIL PROTECTED]>
Date: Thu, 25 Sep 2008 23:32:02 +0800
Subject: [PATCH] Map mmio pages into VT-d page table

Assigned device could DMA to mmio pages, so also need to map mmio pages
into VT-d page table.

Signed-off-by: Weidong Han <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vtd.c |   29 +++--
 include/asm-x86/kvm_host.h |2 --
 virt/kvm/kvm_main.c|2 +-
 3 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index 667bf3f..a770874 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -36,37 +36,30 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 {
gfn_t gfn = base_gfn;
pfn_t pfn;
-   int i, r;
+   int i, r = 0;
struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
 
/* check if iommu exists and in use */
if (!domain)
return 0;
 
-   r = -EINVAL;
for (i = 0; i < npages; i++) {
/* check if already mapped */
pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
 gfn_to_gpa(gfn));
-   if (pfn && !is_mmio_pfn(pfn))
+   if (pfn)
continue;
 
pfn = gfn_to_pfn(kvm, gfn);
-   if (!is_mmio_pfn(pfn)) {
-   r = intel_iommu_page_mapping(domain,
-gfn_to_gpa(gfn),
-pfn_to_hpa(pfn),
-PAGE_SIZE,
-DMA_PTE_READ |
-DMA_PTE_WRITE);
-   if (r) {
-   printk(KERN_DEBUG "kvm_iommu_map_pages:"
-  "iommu failed to map pfn=%lx\n",
pfn);
-   goto unmap_pages;
-   }
-   } else {
-   printk(KERN_DEBUG "kvm_iommu_map_page:"
-  "invalid pfn=%lx\n", pfn);
+   r = intel_iommu_page_mapping(domain,
+gfn_to_gpa(gfn),
+pfn_to_hpa(pfn),
+PAGE_SIZE,
+DMA_PTE_READ |
+DMA_PTE_WRITE);
+   if (r) {
+   printk(KERN_ERR "kvm_iommu_map_pages:"
+  "iommu failed to map pfn=%lx\n", pfn);
goto unmap_pages;
}
gfn++;
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index d1175b8..357dd20 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -495,8 +495,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t
gpa,
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
  gpa_t addr, unsigned long *ret);
 
-int is_mmio_pfn(pfn_t pfn);
-
 extern bool tdp_enabled;
 
 enum emulation_result {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6cf0427..98cd916 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -76,7 +76,7 @@ static inline int valid_vcpu(int n)
return likely(n >= 0 && n < KVM_MAX_VCPUS);
 }
 
-inline int is_mmio_pfn(pfn_t pfn)
+static inline int is_mmio_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));
-- 
1.5.1


0001-Map-mmio-pages-into-VT-d-page-table.patch
Description: 0001-Map-mmio-pages-into-VT-d-page-table.patch


Re: [kvm] Re: [PATCH 0/5] bios: >4G updates

2008-09-25 Thread Alex Williamson
On Wed, 2008-09-24 at 10:17 -0600, Alex Williamson wrote:
> On Wed, 2008-09-24 at 14:07 +0300, Avi Kivity wrote:
> > 
> > The patches all look good, however renaming and reformatting will lead 
> > to merge headaches later on.  We haven't been good at working with bochs 
> > bios upstream.
> > 
> > Can you peek in bochs upstream and see if it's worth merging?  If not, 
> > I'll just merge these patches.
> 
> I'll take a look.  It seemed like they added support for putting the
> ACPI processor objects in an SSDT last I checked, but the AML for their
> processors is fairly trivial.  I'll see if there's anything else
> worthwhile.  Thanks,

I guess the SSDT support was prior to the last merge, and appropriately
dropped.  The most interesting new feature is a boot menu to allow the
user to override the boot device.  That seems pretty useful.  Other
things like better printing of the devices and support for PIIX4 could
come in handy too.  So yeah, it looks worth merging.  I can drop my
first two patches and rework the others so we don't cause unnecessary
merge problems.

Alex

-- 
Alex Williamson HP Open Source & Linux Org.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM for Sparc?

2008-09-25 Thread Blue Swirl
On 9/24/08, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Blue Swirl" <[EMAIL PROTECTED]>
>
> Date: Wed, 24 Sep 2008 21:06:21 +0300
>
>
>  > Now I found the relevant part in the manuals. The extra sun4v bit is
>  > not taken into account from user mode, so we can't catch privileged to
>  > hyperprivileged mode traps easily.
>
>
> That's right, the top bit is ignored in user mode.

The hypervisor uses traps 0x80, 0x83, 0x84, 0x85, and 0xff. Looking at
how these alias to low number traps: first four are unused or used for
resets (SIR, RED, XIR), so they are not in the fast path. 0xff aliases
to 0x7f, which is part of Fill 7 otherwin trap. Maybe that is not
performance critical? The Fill 7 trap entry could be shortened with
off-table jumps.

I'm thinking that we could disassemble the calling instruction on
entry to the lower traps and detect what was the true cause of the
trap.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-25 Thread Alexander Graf





Am 19.09.2008 um 17:59 schrieb Joerg Roedel <[EMAIL PROTECTED]>:


On Wed, Sep 17, 2008 at 03:41:24PM +0200, Alexander Graf wrote:

This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do  
it.


We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.

v2 implements the following improvements:

- fixes the CPL check
- does not allocate iopm when not used
- remembers the host's IF in the HIF bit in the hflags

v3:

- make use of the new permission checking
- add support for V_INTR_MASKING_MASK

Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
---
arch/x86/kvm/kvm_svm.h |9 ++
arch/x86/kvm/svm.c |  198 ++ 
+-

include/asm-x86/kvm_host.h |2 +
3 files changed, 207 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 76ad107..2afe0ce 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -43,6 +43,15 @@ struct vcpu_svm {
   u32 *msrpm;

   u64 nested_hsave;
+u64 nested_vmcb;
+
+/* These are the merged vectors */
+u32 *nested_msrpm;
+u32 *nested_iopm;
+
+/* gpa pointers to the real vectors */
+u64 nested_vmcb_msrpm;
+u64 nested_vmcb_iopm;
};

#endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0aa22e5..3601e75 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
/* Turn on to get debugging output*/
/* #define NESTED_DEBUG */

+/* Not needed until device passthrough */
+/* #define NESTED_KVM_MERGE_IOPM */
+
#ifdef NESTED_DEBUG
#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
#else
@@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct  
kvm_vcpu *vcpu)

   return container_of(vcpu, struct vcpu_svm, vcpu);
}

+static inline bool is_nested(struct vcpu_svm *svm)
+{
+return svm->nested_vmcb;
+}
+
static unsigned long iopm_base;

struct kvm_ldttss_desc {
@@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm)
   force_new_asid(&svm->vcpu);

   svm->nested_hsave = 0;
+svm->nested_vmcb = 0;
   svm->vcpu.arch.hflags = HF_GIF_MASK;
}

@@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct  
kvm *kvm, unsigned int id)

   struct vcpu_svm *svm;
   struct page *page;
   struct page *msrpm_pages;
+struct page *nested_msrpm_pages;
+#ifdef NESTED_KVM_MERGE_IOPM
+struct page *nested_iopm_pages;
+#endif
   int err;

   svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct  
kvm *kvm, unsigned int id)

   msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
   if (!msrpm_pages)
   goto uninit;
+
+nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
+if (!nested_msrpm_pages)
+goto uninit;
+
+#ifdef NESTED_KVM_MERGE_IOPM
+nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+if (!nested_iopm_pages)
+goto uninit;
+#endif
+
   svm->msrpm = page_address(msrpm_pages);
   svm_vcpu_init_msrpm(svm->msrpm);

+svm->nested_msrpm = page_address(nested_msrpm_pages);
+#ifdef NESTED_KVM_MERGE_IOPM
+svm->nested_iopm = page_address(nested_iopm_pages);
+#endif
+
   svm->vmcb = page_address(page);
   clear_page(svm->vmcb);
   svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)

   __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
   __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
+__free_pages(virt_to_page(svm->nested_msrpm),  
MSRPM_ALLOC_ORDER);

+#ifdef NESTED_KVM_MERGE_IOPM
+__free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER);
+#endif
   kvm_vcpu_uninit(vcpu);
   kmem_cache_free(kvm_vcpu_cache, svm);
}
@@ -1230,6 +1263,138 @@ static int nested_svm_do(struct vcpu_svm  
*svm,

   return retval;
}

+
+static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
+  void *arg2, void *opaque)
+{
+int i;
+u32 *nested_msrpm = (u32*)arg1;
+for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
+svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
+svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
+
+return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1,
+ void *arg2, void *opaque)
+{
+int i;
+u32 *nested_iopm = (u32*)arg1;
+u32 *iopm = (u32*)__va(iopm_base);
+for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++)
+svm->nested_iopm[i] = iopm[i] | nested_iopm[i];
+svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm);
+
+return 0;
+}
+#endif
+
+static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1,
+void *arg2, void *o

Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-25 Thread Joerg Roedel
On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote:
> >This is a big security hole. With this we give the guest access to its
> >own VMCB. The guest can take over or crash the whole host machine by
> >rewriting its VMCB. We should be more selective what we save in the
> >hsave area.
> 
> Oh, right. I didn't even think of a case where the nested guest would
> have acvess to the hsave of itself. Since the hsave can never be used
> twice on one vcpu, we could just allocate our own memory for the hsave
> in the vcpu context and leave the nested hsave empty.

I think we could also gain performance by only saving the important
parts of the VMCB and not the whole page.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add USB sys file-system support (v6)

2008-09-25 Thread Anthony Liguori

TJ wrote:

This patch adds support for host USB devices discovered via:

/sys/bus/usb/devices/* and opened from /dev/bus/usb/*/*
/dev/bus/usb/devices and opened from /dev/bus/usb/*/*

in addition to the existing discovery via:

/proc/bus/usb/devices and opened from /proc/bus/usb/*/*

Signed-off-by: TJ <[EMAIL PROTECTED]>
---
--- a/usb-linux.c   2008-09-17 22:39:38.0 +0100
+++ b/usb-linux.c   2008-09-23 02:28:48.0 +0100
@@ -7,6 +7,10 @@
  *  Support for host device auto connect & disconnect
  *  Major rewrite to support fully async operation
  *
+ * Copyright 2008 TJ <[EMAIL PROTECTED]>
+ *  Added flexible support for /dev/bus/usb /sys/bus/usb/devices in 
addition
+ *  to the legacy /proc/bus/usb USB device discovery and handling
+ *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
  * in the Software without restriction, including without limitation the rights
@@ -72,9 +76,20 @@
 #define dprintf(...)
 #endif
 
-#define USBDEVFS_PATH "/proc/bus/usb"

+#define USBPROCBUS_PATH "/proc/bus/usb"
 #define PRODUCT_NAME_SZ 32
 #define MAX_ENDPOINTS 16
+#define USBDEVBUS_PATH "/dev/bus/usb"
+#define USBSYSBUS_PATH "/sys/bus/usb"
+
+static char *usb_host_device_path;
+
+#define USB_FS_NONE 0
+#define USB_FS_PROC 1
+#define USB_FS_DEV 2
+#define USB_FS_SYS 3
+
+static int usb_fs_type = 0;
 
 /* endpoint association data */

 struct endp_data {
@@ -890,13 +905,18 @@
 
 printf("husb: open device %d.%d\n", bus_num, addr);
 
-snprintf(buf, sizeof(buf), USBDEVFS_PATH "/%03d/%03d",

+   if (!usb_host_device_path) {
+   perror("husb: USB Host Device Path not set");
+   goto fail;
+   }
+snprintf(buf, sizeof(buf), "%s/%03d/%03d", usb_host_device_path,
  bus_num, addr);
  


You have tabs here.


 fd = open(buf, O_RDWR | O_NONBLOCK);
 if (fd < 0) {
 perror(buf);
 goto fail;
 }
+dprintf("husb: opened %s\n", buf);
 
 /* read the device description */

 dev->descr_len = read(fd, dev->descr, sizeof(dev->descr));
@@ -1038,23 +1058,29 @@
 return q - buf;
 }
 
-static int usb_host_scan(void *opaque, USBScanFunc *func)

+/*
+ Use /proc/bus/usb/devices or /dev/bus/usb/devices file to determine
+ host's USB devices. This is legacy support since many distributions
+ are moving to /sys/bus/usb
+*/
+static int usb_host_scan_dev(void *opaque, USBScanFunc *func)
 {
-FILE *f;
+FILE *f = 0;
 char line[1024];
 char buf[1024];
 int bus_num, addr, speed, device_count, class_id, product_id, vendor_id;
-int ret;
 char product_name[512];
+int ret = 0;
 
-f = fopen(USBDEVFS_PATH "/devices", "r");

+snprintf(line, sizeof(line), "%s/devices", usb_host_device_path);
+f = fopen(line, "r");
 if (!f) {
-term_printf("husb: could not open %s\n", USBDEVFS_PATH "/devices");
-return 0;
+   perror("husb: cannot open devices file");
+   goto the_end;
 }
  


And here and almost everywhere.


+
 device_count = 0;
 bus_num = addr = speed = class_id = product_id = vendor_id = 0;
-ret = 0;
 for(;;) {
 if (fgets(line, sizeof(line), f) == NULL)
 break;
@@ -1106,12 +1132,191 @@
 fail: ;
 }
 if (device_count && (vendor_id || product_id)) {
-/* Add the last device.  */
-ret = func(opaque, bus_num, addr, class_id, vendor_id,
-   product_id, product_name, speed);
+   /* Add the last device.  */
+   ret = func(opaque, bus_num, addr, class_id, vendor_id,
+   product_id, product_name, speed);
+}
+ the_end:
+if (f) fclose(f);
+return ret;
+}
+
+/*
+ Use /sys/bus/usb/devices/ directory to determine host's USB devices.
+
+ This code is taken from Robert Schiele's original patches posted to the
+ Novell bug-tracker https://bugzilla.novell.com/show_bug.cgi?id=241950
+*/
+static int usb_host_scan_sys(void *opaque, USBScanFunc *func)
+{
+FILE *f;
+DIR *dir = 0;
+char line[1024];
+int bus_num, addr, speed, class_id, product_id, vendor_id;
+int ret = 0;
+char product_name[512];
+struct dirent* de;
+
+dir = opendir(USBSYSBUS_PATH "/devices");
+if (!dir) {
+   perror("husb: cannot open devices directory");
+   goto the_end;
+}
+
+while ((de = readdir(dir))) {
+   if (de->d_name[0] != '.' && ! strchr(de->d_name, ':')) {
+   char filename[PATH_MAX];
+   char* tmpstr = de->d_name;
+   if (!strncmp(de->d_name, "usb", 3))
+   tmpstr += 3;
  


This is indented wrong.


+
+   bus_num = atoi(tmpstr);
+   snprintf(filename, PATH_MAX, USBSYSBUS_PATH "/devices/%s/devnum", 
de->d_name);
+   f = fopen(filename, "r");
+   if (!f) {
+   term_printf("Could not open %s\n", fi

[PATCH 0/4] bios: >4G updates (take2)

2008-09-25 Thread Alex Williamson

This set removes the code churn so we don't make it extra difficult to
sync with bochs.

[1/4] Cleanup the previous kvm patch to add above 4G MTRRs
[2/4] Add SMBIOS info for memory above 4G
[3/4] Fix SMBIOS type 19 & 20 range end
[4/4] Switch default MTRR type to WB and only specify an MTRR for MMIO

Thanks,

Alex


-- 
Alex Williamson HP Open Source & Linux Org.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvm: bios: cleanup/consolidate above 4G memory parsing

2008-09-25 Thread Alex Williamson
kvm: bios: cleanup/consolidate above 4G memory parsing

Signed-off-by: Alex Williamson <[EMAIL PROTECTED]>
---

 bios/rombios32.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/bios/rombios32.c b/bios/rombios32.c
index c57e967..07c858c 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -416,7 +416,7 @@ uint32_t cpuid_signature;
 uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
-uint64_t above4g_ram_size;
+uint64_t ram_end;
 uint8_t bios_uuid[16];
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
@@ -531,9 +531,9 @@ void setup_mtrr(void)
 wrmsr_smp(MTRRphysMask_MSR(i), (~vmask & 0xfff000ull) | 0x800);
 vbase += vmask + 1;
 }
-for (vbase = 1ull << 32; i < vcnt && vbase < above4g_ram_size; ++i) {
+for (vbase = 1ull << 32; i < vcnt && vbase < ram_end; ++i) {
 vmask = (1ull << 40) - 1;
-while (vbase + vmask + 1 > above4g_ram_size)
+while (vbase + vmask + 1 > ram_end)
 vmask >>= 1;
 wrmsr_smp(MTRRphysBase_MSR(i), vbase | 6);
 wrmsr_smp(MTRRphysMask_MSR(i), (~vmask & 0xfff000ull) | 0x800);
@@ -551,17 +551,18 @@ void ram_probe(void)
 ram_size = (cmos_readb(0x17) | (cmos_readb(0x18) << 8)) * 1024;
 
   if (cmos_readb(0x5b) | cmos_readb(0x5c) | cmos_readb(0x5d))
-above4g_ram_size = ((uint64_t)cmos_readb(0x5b) << 16) |
-((uint64_t)cmos_readb(0x5c) << 24) | ((uint64_t)cmos_readb(0x5d) << 
32);
+ram_end = (((uint64_t)cmos_readb(0x5b) << 16) |
+   ((uint64_t)cmos_readb(0x5c) << 24) |
+   ((uint64_t)cmos_readb(0x5d) << 32)) + (1ull << 32);
+  else
+ram_end = ram_size;
 
-  if (above4g_ram_size)
-above4g_ram_size += 1ull << 32;
+  BX_INFO("end of ram=%ldMB\n", ram_end >> 20);
 
 #ifdef BX_USE_EBDA_TABLES
 ebda_cur_addr = ((*(uint16_t *)(0x40e)) << 4) + 0x380;
 #endif
 BX_INFO("ram_size=0x%08lx\n", ram_size);
-BX_INFO("top of ram %ldMB\n", above4g_ram_size >> 20);
   setup_mtrr();
 }
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] kvm: bios: update SMBIOS table to report memory above 4G

2008-09-25 Thread Alex Williamson
kvm: bios: update SMBIOS table to report memory above 4G

Signed-off-by: Alex Williamson <[EMAIL PROTECTED]>
---

 bios/rombios32.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/bios/rombios32.c b/bios/rombios32.c
index 07c858c..be4c25f 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -2022,7 +2022,8 @@ void smbios_init(void)
 {
 unsigned cpu_num, nr_structs = 0, max_struct_size = 0;
 char *start, *p, *q;
-int memsize = ram_size / (1024 * 1024);
+int memsize = (ram_end == ram_size) ? ram_size / (1024 * 1024) :
+  (ram_end - (1ull << 32) + ram_size) / (1024 * 1024);
 
 #ifdef BX_USE_EBDA_TABLES
 ebda_cur_addr = align(ebda_cur_addr, 16);
@@ -2049,8 +2050,8 @@ void smbios_init(void)
 add_struct(smbios_type_4_init(p, cpu_num));
 add_struct(smbios_type_16_init(p, memsize));
 add_struct(smbios_type_17_init(p, memsize));
-add_struct(smbios_type_19_init(p, memsize));
-add_struct(smbios_type_20_init(p, memsize));
+add_struct(smbios_type_19_init(p, ram_end / (1024 * 1024)));
+add_struct(smbios_type_20_init(p, ram_end / (1024 * 1024)));
 add_struct(smbios_type_32_init(p));
 add_struct(smbios_type_127_init(p));
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] kvm: bios: fix SMBIOS end address range reporting

2008-09-25 Thread Alex Williamson
kvm: bios: fix SMBIOS end address range reporting

The -1 seems to be in the wrong place here.

Signed-off-by: Alex Williamson <[EMAIL PROTECTED]>
---

 bios/rombios32.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bios/rombios32.c b/bios/rombios32.c
index be4c25f..f8edf18 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -1950,7 +1950,7 @@ smbios_type_19_init(void *start, uint32_t memory_size_mb)
 p->header.handle = 0x1300;
 
 p->starting_address = 0;
-p->ending_address = (memory_size_mb-1) * 1024;
+p->ending_address = (memory_size_mb * 1024) - 1;
 p->memory_array_handle = 0x1000;
 p->partition_width = 1;
 
@@ -1971,7 +1971,7 @@ smbios_type_20_init(void *start, uint32_t memory_size_mb)
 p->header.handle = 0x1400;
 
 p->starting_address = 0;
-p->ending_address = (memory_size_mb-1)*1024;
+p->ending_address = (memory_size_mb * 1024) - 1;
 p->memory_device_handle = 0x1100;
 p->memory_array_mapped_address_handle = 0x1300;
 p->partition_row_position = 1;


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] kvm: bios: switch MTRRs to cover only the PCI range and default to WB

2008-09-25 Thread Alex Williamson
kvm: bios: switch MTRRs to cover only the PCI range and default to WB

This matches how some bare metal machines report MTRRs and avoids
the problem of running out of MTRRs to cover all of RAM.

Signed-off-by: Alex Williamson <[EMAIL PROTECTED]>
---

 bios/rombios32.c |   24 
 1 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/bios/rombios32.c b/bios/rombios32.c
index f8edf18..592abf9 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -494,7 +494,6 @@ void setup_mtrr(void)
 uint8_t valb[8];
 uint64_t val;
 } u;
-uint64_t vbase, vmask;
 
 mtrr_cap = rdmsr(MSR_MTRRcap);
 vcnt = mtrr_cap & 0xff;
@@ -521,25 +520,10 @@ void setup_mtrr(void)
 wrmsr_smp(MSR_MTRRfix4K_E8000, 0);
 wrmsr_smp(MSR_MTRRfix4K_F, 0);
 wrmsr_smp(MSR_MTRRfix4K_F8000, 0);
-vbase = 0;
---vcnt; /* leave one mtrr for VRAM */
-for (i = 0; i < vcnt && vbase < ram_size; ++i) {
-vmask = (1ull << 40) - 1;
-while (vbase + vmask + 1 > ram_size)
-vmask >>= 1;
-wrmsr_smp(MTRRphysBase_MSR(i), vbase | 6);
-wrmsr_smp(MTRRphysMask_MSR(i), (~vmask & 0xfff000ull) | 0x800);
-vbase += vmask + 1;
-}
-for (vbase = 1ull << 32; i < vcnt && vbase < ram_end; ++i) {
-vmask = (1ull << 40) - 1;
-while (vbase + vmask + 1 > ram_end)
-vmask >>= 1;
-wrmsr_smp(MTRRphysBase_MSR(i), vbase | 6);
-wrmsr_smp(MTRRphysMask_MSR(i), (~vmask & 0xfff000ull) | 0x800);
-vbase += vmask + 1;
-}
-wrmsr_smp(MSR_MTRRdefType, 0xc00);
+/* Mark 3.5-4GB as UC, anything not specified defaults to WB */
+wrmsr_smp(MTRRphysBase_MSR(0), 0xe000ull | 0);
+wrmsr_smp(MTRRphysMask_MSR(0), ~(0x2000ull - 1) | 0x800);
+wrmsr_smp(MSR_MTRRdefType, 0xc06);
 }
 
 void ram_probe(void)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: MMU: fix largepage shadow accounting with oos

2008-09-25 Thread Marcelo Tosatti

There's no need to increase the largepage shadow count when syncing
since there's no count decrement on unsync, only on destruction.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -661,8 +661,6 @@ static void rmap_write_protect(struct kv
 
if (write_protected)
kvm_flush_remote_tlbs(kvm);
-
-   account_shadowed(kvm, gfn);
 }
 
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
@@ -1130,8 +1128,10 @@ static struct kvm_mmu_page *kvm_mmu_get_
sp->gfn = gfn;
sp->role = role;
hlist_add_head(&sp->hash_link, bucket);
-   if (!metaphysical)
+   if (!metaphysical) {
rmap_write_protect(vcpu->kvm, gfn);
+   account_shadowed(vcpu->kvm, gfn);
+   }
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
vcpu->arch.mmu.prefetch_page(vcpu, sp);
else
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] Implement GIF, clgi and stgi v3

2008-09-25 Thread Joerg Roedel
I had another possible idea for performance improvement here. Since we
only inject normal interrupts and exceptions (and not NMI and such) we
can patch clgi to cli and stgi to sti to save these two intercepts in
the guests vmrun path.
Any objections/problems with this?

On Wed, Sep 17, 2008 at 03:41:21PM +0200, Alexander Graf wrote:
> This patch implements the GIF flag and the clgi and stgi instructions that
> set this flag. Only if the flag is set (default), interrupts can be received 
> by
> the CPU.
> 
> To keep the information about that somewhere, this patch adds a new hidden
> flags vector. that is used to store information that does not go into the
> vmcb, but is SVM specific.
> 
> v2 moves the hflags to x86 generic code
> v3 makes use of the new permission helper
> 
> Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
> ---
>  arch/x86/kvm/svm.c |   42 +++---
>  include/asm-x86/kvm_host.h |3 +++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c72e728..469ecc5 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -612,6 +612,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>   save->cr4 = 0;
>   }
>   force_new_asid(&svm->vcpu);
> +
> + svm->vcpu.arch.hflags = HF_GIF_MASK;
>  }
>  
>  static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
> @@ -1227,6 +1229,36 @@ static int nested_svm_do(struct vcpu_svm *svm,
>   return retval;
>  }
>  
> +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> + if (nested_svm_check_permissions(svm))
> + return 1;
> +
> + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> + skip_emulated_instruction(&svm->vcpu);
> +
> + svm->vcpu.arch.hflags |= HF_GIF_MASK;
> +
> + return 1;
> +}
> +
> +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> + if (nested_svm_check_permissions(svm))
> + return 1;
> +
> + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> + skip_emulated_instruction(&svm->vcpu);
> +
> + svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
> +
> + /* After a CLGI no interrupts should come */
> + svm_clear_vintr(svm);
> + svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> +
> + return 1;
> +}
> +
>  static int invalid_op_interception(struct vcpu_svm *svm,
>  struct kvm_run *kvm_run)
>  {
> @@ -1521,8 +1553,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
>   [SVM_EXIT_VMMCALL]  = vmmcall_interception,
>   [SVM_EXIT_VMLOAD]   = invalid_op_interception,
>   [SVM_EXIT_VMSAVE]   = invalid_op_interception,
> - [SVM_EXIT_STGI] = invalid_op_interception,
> - [SVM_EXIT_CLGI] = invalid_op_interception,
> + [SVM_EXIT_STGI] = stgi_interception,
> + [SVM_EXIT_CLGI] = clgi_interception,
>   [SVM_EXIT_SKINIT]   = invalid_op_interception,
>   [SVM_EXIT_WBINVD]   = emulate_on_interception,
>   [SVM_EXIT_MONITOR]  = invalid_op_interception,
> @@ -1669,6 +1701,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
>   if (!kvm_cpu_has_interrupt(vcpu))
>   goto out;
>  
> + if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
> + goto out;
> +
>   if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
>   (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
>   (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
> @@ -1720,7 +1755,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>  
>   svm->vcpu.arch.interrupt_window_open =
>   (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
> -  (svm->vmcb->save.rflags & X86_EFLAGS_IF));
> +  (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
> +  (svm->vcpu.arch.hflags & HF_GIF_MASK));
>  
>   if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary)
>   /*
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index 982b6b2..3e25004 100644
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -245,6 +245,7 @@ struct kvm_vcpu_arch {
>   unsigned long cr3;
>   unsigned long cr4;
>   unsigned long cr8;
> + u32 hflags;
>   u64 pdptrs[4]; /* pae */
>   u64 shadow_efer;
>   u64 apic_base;
> @@ -734,6 +735,8 @@ enum {
>   TASK_SWITCH_GATE = 3,
>  };
>  
> +#define HF_GIF_MASK  (1 << 0)
> +
>  /*
>   * Hardware virtualization extension instructions may fault if a
>   * reboot turns off virtualization while processes are running.
> -- 
> 1.5.6
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel

Re: [PATCH 4/9] Implement GIF, clgi and stgi v3

2008-09-25 Thread Alexander Graf


On 25.09.2008, at 20:47, Joerg Roedel wrote:


I had another possible idea for performance improvement here. Since we
only inject normal interrupts and exceptions (and not NMI and such) we
can patch clgi to cli and stgi to sti to save these two intercepts in
the guests vmrun path.
Any objections/problems with this?


How do we know if we're allowed to inject interrupts with V_INTR set?  
Usually IF is on and GIF is off when entering the VM in KVM, so we  
allow interrupts to arrive even when IF is clear in the guest...





On Wed, Sep 17, 2008 at 03:41:21PM +0200, Alexander Graf wrote:
This patch implements the GIF flag and the clgi and stgi  
instructions that
set this flag. Only if the flag is set (default), interrupts can be  
received by

the CPU.

To keep the information about that somewhere, this patch adds a new  
hidden
flags vector. that is used to store information that does not go  
into the

vmcb, but is SVM specific.

v2 moves the hflags to x86 generic code
v3 makes use of the new permission helper

Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
---
arch/x86/kvm/svm.c |   42 ++ 
+---

include/asm-x86/kvm_host.h |3 +++
2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c72e728..469ecc5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -612,6 +612,8 @@ static void init_vmcb(struct vcpu_svm *svm)
save->cr4 = 0;
}
force_new_asid(&svm->vcpu);
+
+   svm->vcpu.arch.hflags = HF_GIF_MASK;
}

static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -1227,6 +1229,36 @@ static int nested_svm_do(struct vcpu_svm *svm,
return retval;
}

+static int stgi_interception(struct vcpu_svm *svm, struct kvm_run  
*kvm_run)

+{
+   if (nested_svm_check_permissions(svm))
+   return 1;
+
+   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+   skip_emulated_instruction(&svm->vcpu);
+
+   svm->vcpu.arch.hflags |= HF_GIF_MASK;
+
+   return 1;
+}
+
+static int clgi_interception(struct vcpu_svm *svm, struct kvm_run  
*kvm_run)

+{
+   if (nested_svm_check_permissions(svm))
+   return 1;
+
+   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+   skip_emulated_instruction(&svm->vcpu);
+
+   svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+
+   /* After a CLGI no interrupts should come */
+   svm_clear_vintr(svm);
+   svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+
+   return 1;
+}
+
static int invalid_op_interception(struct vcpu_svm *svm,
   struct kvm_run *kvm_run)
{
@@ -1521,8 +1553,8 @@ static int (*svm_exit_handlers[])(struct  
vcpu_svm *svm,

[SVM_EXIT_VMMCALL]  = vmmcall_interception,
[SVM_EXIT_VMLOAD]   = invalid_op_interception,
[SVM_EXIT_VMSAVE]   = invalid_op_interception,
-   [SVM_EXIT_STGI] = invalid_op_interception,
-   [SVM_EXIT_CLGI] = invalid_op_interception,
+   [SVM_EXIT_STGI] = stgi_interception,
+   [SVM_EXIT_CLGI] = clgi_interception,
[SVM_EXIT_SKINIT]   = invalid_op_interception,
[SVM_EXIT_WBINVD]   = emulate_on_interception,
[SVM_EXIT_MONITOR]  = invalid_op_interception,
@@ -1669,6 +1701,9 @@ static void svm_intr_assist(struct kvm_vcpu  
*vcpu)

if (!kvm_cpu_has_interrupt(vcpu))
goto out;

+   if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
+   goto out;
+
if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
(vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
@@ -1720,7 +1755,8 @@ static void do_interrupt_requests(struct  
kvm_vcpu *vcpu,


svm->vcpu.arch.interrupt_window_open =
(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
-(svm->vmcb->save.rflags & X86_EFLAGS_IF));
+(svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
+(svm->vcpu.arch.hflags & HF_GIF_MASK));

	if (svm->vcpu.arch.interrupt_window_open && svm- 
>vcpu.arch.irq_summary)

/*
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 982b6b2..3e25004 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -245,6 +245,7 @@ struct kvm_vcpu_arch {
unsigned long cr3;
unsigned long cr4;
unsigned long cr8;
+   u32 hflags;
u64 pdptrs[4]; /* pae */
u64 shadow_efer;
u64 apic_base;
@@ -734,6 +735,8 @@ enum {
TASK_SWITCH_GATE = 3,
};

+#define HF_GIF_MASK(1 << 0)
+
/*
 * Hardware virtualization extension instructions may fault if a
 * reboot turns off virtualization while processes are running.
--
1.5.6


--
To unsubscribe from 

Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-25 Thread Alexander Graf


On 25.09.2008, at 19:37, Joerg Roedel wrote:


On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote:
This is a big security hole. With this we give the guest access to  
its

own VMCB. The guest can take over or crash the whole host machine by
rewriting its VMCB. We should be more selective what we save in the
hsave area.


Oh, right. I didn't even think of a case where the nested guest would
have acvess to the hsave of itself. Since the hsave can never be used
twice on one vcpu, we could just allocate our own memory for the  
hsave

in the vcpu context and leave the nested hsave empty.


I think we could also gain performance by only saving the important
parts of the VMCB and not the whole page.


Is copying one page really that expensive? Is there any accelerated  
function available for that that copies it with SSE or so? :-)


Alex




Joerg

--
  |   AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System|  Register Court Dresden: HRA 4896
Research  |  General Partner authorized to represent:
Center| AMD Saxony LLC (Wilmington, Delaware, US)
  | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe,  
Thomas McCoy




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Muli Ben-Yehuda
On Thu, Sep 25, 2008 at 05:45:30PM +0300, Avi Kivity wrote:
> Han, Weidong wrote:
>> Is it possible DMA into an mmio page? 
>
> I don't see why not.

Two reasons. First it makes no sense. MMIO pages don't have RAM
backing them, they have another device's register window. So the
effect of DMA'ing into an MMIO page would be for one device to DMA
into the register window of another device, which sounds to me insane.

Second, and more importantly, I've seen systems where doing the above
caused a nice, immediate, reboot. So I think that unless someone comes
with a valid scenario where we need to support it or something breaks,
we'd better err on the side of caution and not map pages that should
not be DMA targets.

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
  xxx
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] Add VMRUN handler v3

2008-09-25 Thread [EMAIL PROTECTED]
On Thu, Sep 25, 2008 at 10:00:17PM +0200, Alexander Graf wrote:
> 
> On 25.09.2008, at 19:37, Joerg Roedel wrote:
> 
> >On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote:
> >>>This is a big security hole. With this we give the guest access to  
> >>>its
> >>>own VMCB. The guest can take over or crash the whole host machine by
> >>>rewriting its VMCB. We should be more selective what we save in the
> >>>hsave area.
> >>
> >>Oh, right. I didn't even think of a case where the nested guest would
> >>have acvess to the hsave of itself. Since the hsave can never be used
> >>twice on one vcpu, we could just allocate our own memory for the  
> >>hsave
> >>in the vcpu context and leave the nested hsave empty.
> >
> >I think we could also gain performance by only saving the important
> >parts of the VMCB and not the whole page.
> 
> Is copying one page really that expensive? Is there any accelerated  
> function available for that that copies it with SSE or so? :-)

Copying data in memory is always expensive because the accesses may miss
in the caches and data must be fetched from memory. As far as I know
this can be around 150 cycles per cache line.

Joerg

> >-- 
> >  |   AMD Saxony Limited Liability Company & Co. KG
> >Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
> >System|  Register Court Dresden: HRA 4896
> >Research  |  General Partner authorized to represent:
> >Center| AMD Saxony LLC (Wilmington, Delaware, US)
> >  | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe,  
> >Thomas McCoy
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] Implement GIF, clgi and stgi v3

2008-09-25 Thread Joerg Roedel
On Thu, Sep 25, 2008 at 09:55:27PM +0200, Alexander Graf wrote:
> 
> On 25.09.2008, at 20:47, Joerg Roedel wrote:
> 
> >I had another possible idea for performance improvement here. Since we
> >only inject normal interrupts and exceptions (and not NMI and such) we
> >can patch clgi to cli and stgi to sti to save these two intercepts in
> >the guests vmrun path.
> >Any objections/problems with this?
> 
> How do we know if we're allowed to inject interrupts with V_INTR set?  
> Usually IF is on and GIF is off when entering the VM in KVM, so we  
> allow interrupts to arrive even when IF is clear in the guest...

Hmm yes, this is a problem. So this optimization will not work. We need
other ways to optimize :)

Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VT-d: Fix iommu map page for mmio pages

2008-09-25 Thread Anthony Liguori

Muli Ben-Yehuda wrote:

On Thu, Sep 25, 2008 at 05:45:30PM +0300, Avi Kivity wrote:
  

Han, Weidong wrote:

Is it possible DMA into an mmio page? 
  

I don't see why not.



Two reasons. First it makes no sense. MMIO pages don't have RAM
backing them, they have another device's register window. So the
effect of DMA'ing into an MMIO page would be for one device to DMA
into the register window of another device, which sounds to me insane.
  


MMIO isn't just a register window.  It may be an on-device buffer.  For 
instance, all packets are stored in a buffer on the ne2k that's mapped 
via mmio.  It would seem entirely reasonable to me to program an IDE 
driver to DMA directly into the devices packet buffer.



Second, and more importantly, I've seen systems where doing the above
caused a nice, immediate, reboot. So I think that unless someone comes
with a valid scenario where we need to support it or something breaks,
we'd better err on the side of caution and not map pages that should
not be DMA targets.
  


Xen maps the MMIO pages into the VT-d table.  The system you were using 
could have just been busted.  I think the burden is to prove that this 
is illegal (via the architecture specification).


Regards,

Anthony Liguori


Cheers,
Muli
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unalias rework

2008-09-25 Thread Marcelo Tosatti

Hi Izik,

On Thu, Sep 04, 2008 at 05:13:20PM +0300, izik eidus wrote:
>
> + struct kvm_memory_slot *alias_slot = &kvm->memslots[i];
> +
> + if (alias_slot->base_gfn == slot->base_gfn)
> + return 1;
> + }
> + return 0;
> +}
> +
> +static void update_alias_slots(struct kvm *kvm, struct kvm_memory_slot *free)
> +{
> + int i;
> +
> + if (is_aliased_slot(kvm, free))
> + return;
> +
> + for (i = KVM_MEMORY_SLOTS; i < KVM_MEMORY_SLOTS + KVM_ALIAS_SLOTS;
> +  ++i) {
> + struct kvm_memory_slot *alias_memslot = &kvm->memslots[i];
> + unsigned long size = free->npages << PAGE_SHIFT;
> +
> + if (alias_memslot->userspace_addr >= free->userspace_addr &&
> + alias_memslot->userspace_addr < free->userspace_addr +
> + size) {
> + alias_memslot->flags = free->flags;
> + if (free->dirty_bitmap) {
> + unsigned long offset =
> + alias_memslot->userspace_addr -
> + free->userspace_addr;
> + unsigned dirty_offset;
> + unsigned long bitmap_addr;
> +
> + offset = offset >> PAGE_SHIFT;
> + dirty_offset = ALIGN(offset, BITS_PER_LONG) / 8;
> + bitmap_addr = (unsigned long) 
> free->dirty_bitmap;
> + bitmap_addr += dirty_offset;
> + alias_memslot->dirty_bitmap = (unsigned long *) 
> bitmap_addr;
> + } else
> + alias_memslot->dirty_bitmap = NULL;
> + }
> + }
> +}
> +
>  /*
>   * Free any memory in @free but not in @dont.
>   */
> -static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
> +static void kvm_free_physmem_slot(struct kvm *kvm,
> +   struct kvm_memory_slot *free,
> struct kvm_memory_slot *dont)
>  {
>   if (!dont || free->rmap != dont->rmap)
> @@ -385,10 +433,16 @@ static void kvm_free_physmem_slot(struct 
> kvm_memory_slot *free,
>   if (!dont || free->lpage_info != dont->lpage_info)
>   vfree(free->lpage_info);
>  
> - free->npages = 0;
>   free->dirty_bitmap = NULL;
>   free->rmap = NULL;
>   free->lpage_info = NULL;
> +
> +#ifdef CONFIG_X86
> + update_alias_slots(kvm, free);
> + if (dont)
> + update_alias_slots(kvm, dont);
> +#endif
> + free->npages = 0;

Why is this needed? I don't understand when would you free a memslot
without freeing any aliases that map it first?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-09-25 Thread Yang, Sheng
On Tuesday 23 September 2008 22:54:53 Amit Shah wrote:
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t
> address, +int len)
> +{
> +   uint32_t val = 0;
> +   int fd, r;
> +
> +   if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +   address == 0x3c || address == 0x3d) {
> +   val = pci_default_read_config(d, address, len);
> +   DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> + (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address,
> val, + len);
> +   return val;
> +   }
> +
> +   /* vga specific, remove later */
> +   if (address == 0xFC)
> +   goto do_log;
> +
> +   fd = ((AssignedDevice *)d)->real_device.config_fd;
> +   r = lseek(fd, address, SEEK_SET);
> +   if (r < 0) {
> +   fprintf(stderr, "%s: bad seek, errno = %d\n",
> +   __func__, errno);
> +   return val;
> +   }

This read from configuration space method got a little trouble: vender id and 
device id read from configuration space directly rather than "vender" 
and "device" file in the sysfs. That's cause trouble with some device that 
configuration space inconsistent with "vender" and "device" file, e.g. some 
fix up by host PCI subsystem in kernel. 

Maybe it can be delay a little for a following patch, but we should address 
this issue... Maybe we can use libpci? There are more fields than vender and 
device got this problem, like "irq".

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Separate update irq to a single function

2008-09-25 Thread Sheng Yang

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |   78 
 1 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9d15f7..43fb049 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -205,6 +205,42 @@ static void kvm_free_all_assigned_devices(struct kvm *kvm)
}
 }
 
+static int assigned_device_update_irq(struct kvm *kvm,
+   struct kvm_assigned_dev_kernel *assigned_dev,
+   struct kvm_assigned_irq *assigned_irq)
+{
+   if (assigned_dev->irq_requested) {
+   assigned_dev->guest_irq = assigned_irq->guest_irq;
+   assigned_dev->ack_notifier.gsi = assigned_irq->guest_irq;
+   return 0;
+   }
+   if (irqchip_in_kernel(kvm)) {
+   if (!capable(CAP_SYS_RAWIO))
+   return -EPERM;
+
+   if (assigned_irq->host_irq)
+   assigned_dev->host_irq = assigned_irq->host_irq;
+   else
+   assigned_dev->host_irq = assigned_dev->dev->irq;
+   assigned_dev->guest_irq = assigned_irq->guest_irq;
+   assigned_dev->ack_notifier.gsi = assigned_irq->guest_irq;
+   assigned_dev->ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;
+   kvm_register_irq_ack_notifier(kvm, &assigned_dev->ack_notifier);
+
+   /* Even though this is PCI, we don't want to use shared
+* interrupts. Sharing host devices with guest-assigned devices
+* on the same interrupt line is not a happy situation: there
+* are going to be long delays in accepting, acking, etc.
+*/
+   if (request_irq(assigned_dev->host_irq, kvm_assigned_dev_intr,
+   0, "kvm_assigned_device", (void *)assigned_dev))
+   return -EIO;
+   }
+   assigned_dev->irq_requested = true;
+
+   return 0;
+}
+
 static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
   struct kvm_assigned_irq
   *assigned_irq)
@@ -221,44 +257,14 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
return -EINVAL;
}
 
-   if (match->irq_requested) {
-   match->guest_irq = assigned_irq->guest_irq;
-   match->ack_notifier.gsi = assigned_irq->guest_irq;
-   mutex_unlock(&kvm->lock);
-   return 0;
-   }
-
-   INIT_WORK(&match->interrupt_work,
- kvm_assigned_dev_interrupt_work_handler);
-
-   if (irqchip_in_kernel(kvm)) {
-   if (!capable(CAP_SYS_RAWIO)) {
-   r = -EPERM;
-   goto out_release;
-   }
-
-   if (assigned_irq->host_irq)
-   match->host_irq = assigned_irq->host_irq;
-   else
-   match->host_irq = match->dev->irq;
-   match->guest_irq = assigned_irq->guest_irq;
-   match->ack_notifier.gsi = assigned_irq->guest_irq;
-   match->ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;
-   kvm_register_irq_ack_notifier(kvm, &match->ack_notifier);
+   if (!match->irq_requested)
+   INIT_WORK(&match->interrupt_work,
+   kvm_assigned_dev_interrupt_work_handler);
 
-   /* Even though this is PCI, we don't want to use shared
-* interrupts. Sharing host devices with guest-assigned devices
-* on the same interrupt line is not a happy situation: there
-* are going to be long delays in accepting, acking, etc.
-*/
-   if (request_irq(match->host_irq, kvm_assigned_dev_intr, 0,
-   "kvm_assigned_device", (void *)match)) {
-   r = -EIO;
-   goto out_release;
-   }
-   }
+   r = assigned_device_update_irq(kvm, match, assigned_irq);
+   if (r)
+   goto out_release;
 
-   match->irq_requested = true;
mutex_unlock(&kvm->lock);
return r;
 out_release:
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Enable MSI support for KVM VT-d

2008-09-25 Thread Sheng Yang
Hi, Avi

This patchset enable MSI support for KVM VT-d.

And here are only kernel space ones. The third patch would go to also goto x86
upstream.

The userspace code would looks like this:

assigned_irq_data.guest_msi_addr = *(uint32_t *)(d->msi_cap + 4);
assigned_irq_data.guest_msi_data = *(uint16_t *)(d->msi_cap + 8);
assigned_irq_data.flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI;
r = kvm_assign_irq(kvm_context, &assigned_irq_data);

I've test the patchset with some userspace hack, it works well.

Thanks!
--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: x86: Replace irq_requested with guest_intr_type

2008-09-25 Thread Sheng Yang

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c   |8 
 include/linux/kvm_host.h |3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43fb049..4836323 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -170,7 +170,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 struct kvm_assigned_dev_kernel
 *assigned_dev)
 {
-   if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested)
+   if (irqchip_in_kernel(kvm) && assigned_dev->guest_intr_type)
free_irq(assigned_dev->host_irq, (void *)assigned_dev);
 
kvm_unregister_irq_ack_notifier(kvm, &assigned_dev->ack_notifier);
@@ -209,7 +209,7 @@ static int assigned_device_update_irq(struct kvm *kvm,
struct kvm_assigned_dev_kernel *assigned_dev,
struct kvm_assigned_irq *assigned_irq)
 {
-   if (assigned_dev->irq_requested) {
+   if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_INTR) {
assigned_dev->guest_irq = assigned_irq->guest_irq;
assigned_dev->ack_notifier.gsi = assigned_irq->guest_irq;
return 0;
@@ -236,7 +236,7 @@ static int assigned_device_update_irq(struct kvm *kvm,
0, "kvm_assigned_device", (void *)assigned_dev))
return -EIO;
}
-   assigned_dev->irq_requested = true;
+   assigned_dev->guest_intr_type = KVM_ASSIGNED_DEV_INTR;
 
return 0;
 }
@@ -257,7 +257,7 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
return -EINVAL;
}
 
-   if (!match->irq_requested)
+   if (!match->guest_intr_type)
INIT_WORK(&match->interrupt_work,
kvm_assigned_dev_interrupt_work_handler);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6252802..e24280b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -301,7 +301,8 @@ struct kvm_assigned_dev_kernel {
int host_devfn;
int host_irq;
int guest_irq;
-   int irq_requested;
+#define KVM_ASSIGNED_DEV_INTR  1
+   int guest_intr_type;
struct pci_dev *dev;
struct kvm *kvm;
 };
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] x86: Add MSI delivery mode mask

2008-09-25 Thread Sheng Yang

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 include/asm-x86/msidef.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/msidef.h b/include/asm-x86/msidef.h
index 296f29c..fdeebbb 100644
--- a/include/asm-x86/msidef.h
+++ b/include/asm-x86/msidef.h
@@ -15,8 +15,11 @@
 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_MODE_MASK0x700
 #define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_FIXED_VAL   0
+#define  MSI_DATA_DELIVERY_LOWPRI_VAL  1
 
 #define MSI_DATA_LEVEL_SHIFT   14
 #define MSI_DATA_LEVEL_DEASSERT(0 << MSI_DATA_LEVEL_SHIFT)
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: x86: Enable MSI for assigned device

2008-09-25 Thread Sheng Yang
As well as export ioapic_get_delivery_bitmask().

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c   |  111 +++--
 include/linux/kvm.h  |4 ++
 include/linux/kvm_host.h |3 +
 virt/kvm/ioapic.c|3 +-
 virt/kvm/ioapic.h|2 +
 5 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4836323..66d96e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_IO_MSRS 256
 #define CR0_RESERVED_BITS  \
@@ -120,6 +121,50 @@ static struct kvm_assigned_dev_kernel 
*kvm_find_assigned_dev(struct list_head *h
return NULL;
 }
 
+static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel
+   *assigned_dev)
+{
+   u8 dest_id = assigned_dev->guest_msi_addr & MSI_ADDR_DEST_ID_MASK;
+   u8 vector = assigned_dev->guest_msi_data & MSI_DATA_VECTOR_MASK;
+   u8 dest_mode = assigned_dev->guest_msi_addr &
+   (1 << MSI_ADDR_DEST_MODE_SHIFT);
+   u8 trig_mode = assigned_dev->guest_msi_data &
+   (1 << MSI_DATA_TRIGGER_SHIFT);
+   u8 delivery_mode = assigned_dev->guest_msi_data &
+   MSI_DATA_DELIVERY_MODE_MASK;
+   u32 deliver_bitmask;
+   int vcpu_id;
+   struct kvm_vcpu *vcpu;
+   struct kvm_ioapic *ioapic = ioapic_irqchip(assigned_dev->kvm);
+
+   BUG_ON(!ioapic);
+
+   deliver_bitmask = ioapic_get_delivery_bitmask(ioapic,
+ dest_id, dest_mode);
+   switch (delivery_mode) {
+   case MSI_DATA_DELIVERY_LOWPRI_VAL:
+   vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
+   deliver_bitmask);
+   if (vcpu != NULL)
+   kvm_apic_set_irq(vcpu, vector, trig_mode);
+   else
+   printk(KERN_INFO "Null lowest priority vcpu!\n");
+   break;
+   case MSI_DATA_DELIVERY_FIXED_VAL:
+   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+   if (!(deliver_bitmask & (1 << vcpu_id)))
+   continue;
+   deliver_bitmask &= ~(1 << vcpu_id);
+   vcpu = ioapic->kvm->vcpus[vcpu_id];
+   if (vcpu)
+   kvm_apic_set_irq(vcpu, vector, trig_mode);
+   }
+   break;
+   default:
+   printk(KERN_INFO "Unsupported MSI delivery mode\n");
+   }
+}
+
 static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 {
struct kvm_assigned_dev_kernel *assigned_dev;
@@ -132,8 +177,12 @@ static void kvm_assigned_dev_interrupt_work_handler(struct 
work_struct *work)
 * finer-grained lock, update this
 */
mutex_lock(&assigned_dev->kvm->lock);
-   kvm_set_irq(assigned_dev->kvm,
-   assigned_dev->guest_irq, 1);
+   if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_INTR)
+   kvm_set_irq(assigned_dev->kvm, assigned_dev->guest_irq, 1);
+   else if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_MSI) {
+   assigned_device_msi_dispatch(assigned_dev);
+   enable_irq(assigned_dev->host_irq);
+   }
mutex_unlock(&assigned_dev->kvm->lock);
kvm_put_kvm(assigned_dev->kvm);
 }
@@ -172,6 +221,8 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 {
if (irqchip_in_kernel(kvm) && assigned_dev->guest_intr_type)
free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+   if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_MSI)
+   pci_disable_msi(assigned_dev->dev);
 
kvm_unregister_irq_ack_notifier(kvm, &assigned_dev->ack_notifier);
 
@@ -215,6 +266,11 @@ static int assigned_device_update_irq(struct kvm *kvm,
return 0;
}
if (irqchip_in_kernel(kvm)) {
+   if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_MSI) {
+   free_irq(assigned_dev->host_irq, (void *)kvm);
+   pci_disable_msi(assigned_dev->dev);
+   }
+
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
 
@@ -241,6 +297,40 @@ static int assigned_device_update_irq(struct kvm *kvm,
return 0;
 }
 
+static int assigned_device_update_msi(struct kvm *kvm,
+   struct kvm_assigned_dev_kernel *assigned_dev,
+   struct kvm_assigned_irq *assigned_irq)
+{
+   int r;
+
+   if (assigned_dev->guest_intr_type == KVM_ASSIGNED_DEV_MSI)
+   return 0;
+
+   if (irqchip_in_kernel(kvm)) {
+   if (assigned