KVM-Autotest: server problem

2009-05-18 Thread Michael Goldish
Hi Lucas,

Since I consider you our Autotest reference I direct the following question to 
you.

Currently our Autotest servers run tests in client mode using the same control 
file on all hosts. We want to move on to dispatching tests from the server, 
using a server control file, so that each host runs several test execution 
pipelines. As far as I know this should be straightforward using at.run_test(), 
subcommand() and parallel(), as explained in 
http://autotest.kernel.org/wiki/ServerControlHowto. Theoretically there should 
be no problem running several pipelines on each host because several 
independent copies of the Autotest client can be installed in unique temporary 
directories on each host (using set_install_in_tmpdir()).

Though me managed to run two tests in parallel on a single host, the server 
seems to have trouble parsing the results. Depending on what test tags we 
specify, the server either displays none or some of the results, but never all 
of them. Also, there seems to be a difference between what the server displays 
during execution, and what it displays after execution has completed. In one of 
the configurations we've tried the server displayed the test results while they 
were still executing, but as soon as they were completed, the results 
disappeared and the only visible results remaining were those of the Autotest 
client installation.

Is this a known issue, or is it more likely that I made a mistake somewhere? Is 
there a known fix or workaround? Could this functionality (running tests in 
parallel on the same host, from the server) be unsupported?

I haven't provided any code because I've temporarily lost contact with the 
server I was experimenting on. If you find it useful I'll provide some code as 
soon as I regain access.

Note: this message is unrelated to the one I posted yesterday to the KVM list. 
Yesterday's message refers to the possibility of running tests in parallel 
using a client control file, not a server one.

Thanks,
Michael
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2793225 ] Windows XP crashes on first boot after install on AMD64

2009-05-18 Thread SourceForge.net
Bugs item #2793225, was opened at 2009-05-18 10:04
Message generated for change (Tracker Item Submitted) made by alain_knaff
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2793225&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alain Knaff (alain_knaff)
Assigned to: Nobody/Anonymous (nobody)
Summary: Windows XP crashes on first boot after install on AMD64

Initial Comment:
In kvm-84, windows XP (32 bit) crashes on first boot in a new VM after install.
Install itself runs fine, but after the first boot it repeatedly crashes, 
displaying some BSOD, rebooting immediately (too fast to takes notes of what 
the BSOD says), crashes again etc.

Even adding -no-acpi or -no-kvm didn't help.
It used to work on older versions (i.e. before I "up"graded to Kubuntu 9.04)

Using qemu instead of kvm produces a slightly different result: here, instead 
of a crash, it freezes, showing a very faint version of the windows logo.


> kvm -h
QEMU PC emulator version 0.9.1 (kvm-84), Copyright (c) 2003-2008 Fabrice Bellard
usage: qemu [options] [disk_image]  
...

> cat /proc/cpuinfo 
processor   : 0 
vendor_id   : AuthenticAMD
cpu family  : 15  
model   : 67  
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
...

> uname -a
Linux hitchhiker 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 
x86_64 GNU/Linux

> cat /etc/issue
Ubuntu 9.04 \n \l

> file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked (uses shared libs), for GNU/Linux 2.6.8, stripped


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2793225&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: discard masking checking when initial MSI-X entries

2009-05-18 Thread Avi Kivity

Sheng Yang wrote:

This meant to be with commit: adcf3594f9580bdf9b5e71f271b6088b185e017e,
otherwise QEmu only counts the MSI-X entries, but won't fill it...
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs after some time

2009-05-18 Thread Nikola Ciprich
Hi,
what kvm module/userspace versions are You using?
n.

On Sun, May 17, 2009 at 11:32:44PM -0300, Wagner Sartori Junior wrote:
> Hello,
> 
> I'm needin' some help. The network in guests hangs(I can't ping from
> my house the guest) after some time(from 10 minutes to some hours) and
> stops in one guest and the others will stopping after. I have a host
> with ubuntu jaunty installed and the network is configured using
> bridge... If I login in the guest(via VNC) and ping anything outside
> the host, the network starts answering my pings again.
> 
> I only try the default driver and virtio for the nic and the problem persists.
> 
> Any help is welcome.
> 
> Thanks,
> 
> Wagner Sartori Junior
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Drop interrupt shadow when single stepping should be done only on VMX.

2009-05-18 Thread Gleb Natapov
The problem exists only on VMX. Also currently we skip this step if
there is pending exception. The patch fixes this too.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/vmx.c |8 
 arch/x86/kvm/x86.c |3 ---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe2ce2b..ec378c1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3388,6 +3388,14 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
 
+   /* When single-stepping over STI and MOV SS, we must clear the
+* corresponding interruptibility bits in the guest state. Otherwise
+* vmentry fails as it then expects bit 14 (BS) in pending debug
+* exceptions being set, but that's not correct for the guest debugging
+* case. */
+   if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+   vmx_set_interrupt_shadow(vcpu, 0);
+
/*
 * Loading guest fpu may have cleared host cr0.ts
 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a08d96f..82585ae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3153,9 +3153,6 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
 
 static void inject_pending_irq(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
-   if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
-   kvm_x86_ops->set_interrupt_shadow(vcpu, 0);
-
/* try to reinject previous events if any */
if (vcpu->arch.nmi_injected) {
kvm_x86_ops->set_nmi(vcpu);
-- 
1.6.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [APIC] Optimize searching for highest IRR

2009-05-18 Thread Gleb Natapov
Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much more
rare than VM entry.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |   19 ---
 arch/x86/kvm/lapic.h |1 +
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ae99d83..2c25b07 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -165,24 +165,36 @@ static int find_highest_vector(void *bitmap)
 
 static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
 {
+   apic->irr_pending = true;
return apic_test_and_set_vector(vec, apic->regs + APIC_IRR);
 }
 
-static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+static inline int apic_search_irr(struct kvm_lapic *apic)
 {
-   apic_clear_vector(vec, apic->regs + APIC_IRR);
+   return find_highest_vector(apic->regs + APIC_IRR);
 }
 
 static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 {
int result;
 
-   result = find_highest_vector(apic->regs + APIC_IRR);
+   if (!apic->irr_pending)
+   return -1;
+
+   result = apic_search_irr(apic);
ASSERT(result == -1 || result >= 16);
 
return result;
 }
 
+static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+{
+   apic->irr_pending = false;
+   apic_clear_vector(vec, apic->regs + APIC_IRR);
+   if (apic_search_irr(apic) != -1)
+   apic->irr_pending = true;
+}
+
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -842,6 +854,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
+   apic->irr_pending = false;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);
if (vcpu->vcpu_id == 0)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a587f83..3f3ecc6 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,6 +12,7 @@ struct kvm_lapic {
struct kvm_timer lapic_timer;
u32 divide_count;
struct kvm_vcpu *vcpu;
+   bool irr_pending;
struct page *regs_page;
void *regs;
gpa_t vapic_addr;
-- 
1.6.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] If interrupt injection is not possible do not scan IRR.

2009-05-18 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/i8259.c |5 +
 arch/x86/kvm/irq.c   |1 -
 arch/x86/kvm/x86.c   |9 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..d32ceac 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -218,6 +218,11 @@ int kvm_pic_read_irq(struct kvm *kvm)
struct kvm_pic *s = pic_irqchip(kvm);
 
pic_lock(s);
+   if (!s->output) {
+   pic_unlock(s);
+   return -1;
+   }
+   s->output = 0;
irq = pic_get_irq(&s->pics[0]);
if (irq >= 0) {
pic_intack(&s->pics[0], irq);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 96dfbb6..e93405a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -78,7 +78,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
if (vector == -1) {
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v->kvm);
-   s->output = 0;  /* PIC */
vector = kvm_pic_read_irq(v->kvm);
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 44e87a5..a08d96f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3174,10 +3174,11 @@ static void inject_pending_irq(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
vcpu->arch.nmi_injected = true;
kvm_x86_ops->set_nmi(vcpu);
}
-   } else if (kvm_cpu_has_interrupt(vcpu)) {
-   if (kvm_x86_ops->interrupt_allowed(vcpu)) {
-   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
-   false);
+   } else if (kvm_x86_ops->interrupt_allowed(vcpu)) {
+   int vec = kvm_cpu_get_interrupt(vcpu);
+   if (vec != -1) {
+   printk(KERN_ERR"inject %d rip=%x\n", vec, 
kvm_rip_read(vcpu));
+   kvm_queue_interrupt(vcpu, vec, false);
kvm_x86_ops->set_irq(vcpu);
}
}
-- 
1.6.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] If interrupt injection is not possible do not scan IRR.

2009-05-18 Thread Avi Kivity

Gleb Natapov wrote:

@@ -3174,10 +3174,11 @@ static void inject_pending_irq(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
vcpu->arch.nmi_injected = true;
kvm_x86_ops->set_nmi(vcpu);
}
-   } else if (kvm_cpu_has_interrupt(vcpu)) {
-   if (kvm_x86_ops->interrupt_allowed(vcpu)) {
-   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
-   false);
+   } else if (kvm_x86_ops->interrupt_allowed(vcpu)) {
+   int vec = kvm_cpu_get_interrupt(vcpu);
+   if (vec != -1) {
+   printk(KERN_ERR"inject %d rip=%x\n", vec, 
kvm_rip_read(vcpu));
+   kvm_queue_interrupt(vcpu, vec, false);
kvm_x86_ops->set_irq(vcpu);
}   }
  


Not sure if this is a win.  Usually interrupts are allowed, so we'll 
have to do both calculations.  On vmx ->interrupt_allowed() reads the 
vmcs, which is slow.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Allow to override sync source

2009-05-18 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>> In order to allow sync'ing the kmod dir against arbitrary kernels trees,
>> extend the sync script to accept alternative paths and adjust the
>> Makefile accordingly.
>>
>> @@ -17,6 +17,7 @@ ORIGMODDIR = $(patsubst %/build,%/kernel,$(KERNELDIR))
>>  
>>  rpmrelease = devel
>>  
>> +KVM_VERSION = kvm-devel
>>  LINUX = ./linux-2.6
>>   
> 
> You're overriding ./configure here.  What was the motivation?

I missed that this is provided via configure - dropped again.

> 
>>  
>> -version = 'kvm-devel'
>> -if len(sys.argv) >= 2:
>> -version = sys.argv[1]
>> +parser = OptionParser(usage='usage: %prog [-v version][-l linuxkernel]')
>> +parser.add_option('-v', action='store', type='string', dest='version')
>> +parser.add_option('-l', action='store', type='string', dest='linux')
>>   
> 
> Please add help, and spaces around '='.

OK.

> 
>> +(options, args) = parser.parse_args()
>>  
>> +version = 'kvm-devel'
>>  linux = 'linux-2.6'
>>  
>> +if options.version:
>> +version = options.version
>> +if options.linux:
>> +linux = options.linux
>> +
>>   
> 
> Can replace this with set_defaults().
> 

Nice. Done below.

-->

In order to allow sync'ing the kmod dir against arbitrary kernels trees,
extend the sync script to accept alternative paths and adjust the
Makefile accordingly.

Changes in v2:
 - drop KVM_VERSION override
 - make use of set_defaults
 - option help texts

Signed-off-by: Jan Kiszka 
---

 Makefile |2 +-
 sync |   16 +++-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index a8e8e0b..8bd5b9b 100644
--- a/Makefile
+++ b/Makefile
@@ -38,7 +38,7 @@ include $(MAKEFILE_PRE)
 .PHONY: sync
 
 sync:
-   ./sync $(KVM_VERSION)
+   ./sync -v $(KVM_VERSION) -l $(LINUX)
 
 install:
mkdir -p $(DESTDIR)/$(INSTALLDIR)
diff --git a/sync b/sync
index 4a89296..f3f4d6a 100755
--- a/sync
+++ b/sync
@@ -1,6 +1,7 @@
 #!/usr/bin/python
 
 import sys, os, glob, os.path, shutil, re
+from optparse import OptionParser
 
 glob = glob.glob
 
@@ -8,11 +9,16 @@ def cmd(c):
 if os.system(c) != 0:
 raise Exception('command execution failed: ' + c)
 
-version = 'kvm-devel'
-if len(sys.argv) >= 2:
-version = sys.argv[1]
-
-linux = 'linux-2.6'
+parser = OptionParser(usage = 'usage: %prog [-v VERSION][-l LINUX]')
+parser.add_option('-v', action = 'store', type = 'string', dest = 'version', \
+  help = 'kvm-kmod release version', default = 'kvm-devel')
+parser.add_option('-l', action = 'store', type = 'string', dest = 'linux', \
+  help = 'Linux kernel tree to sync from', \
+  default = 'linux-2.6')
+parser.set_defaults()
+(options, args) = parser.parse_args()
+version = options.version
+linux = options.linux
 
 _re_cache = {}
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/17] Use statfs to determine size of huge pages

2009-05-18 Thread Joerg Roedel
On Sun, May 17, 2009 at 08:00:40PM +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
>> From: Joerg Roedel 
>>
>> The current method of finding out the size of huge pages does not work
>> reliably anymore. Current Linux supports more than one huge page size
>> but /proc/meminfo only show one of the supported sizes.
>> To find out the real page size used can be found by calling statfs. This
>> patch changes qemu to use statfs instead of parsing /proc/meminfo.
>>   
>
> Since we don't support 1GBpages in stable-0.10, this is unneeded.

This patch is needed to run current KVM on a hugetlbfs backed with
1GB pages. Therefore I think this patch is needed. It is an improvement
over the /proc/meminfo parsing anyway and is not strictly related to
kvm kernel support for 1GB pages.

Joerg

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] If interrupt injection is not possible do not scan IRR.

2009-05-18 Thread Gleb Natapov
On Mon, May 18, 2009 at 12:00:40PM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>> @@ -3174,10 +3174,11 @@ static void inject_pending_irq(struct kvm_vcpu 
>> *vcpu, struct kvm_run *kvm_run)
>>  vcpu->arch.nmi_injected = true;
>>  kvm_x86_ops->set_nmi(vcpu);
>>  }
>> -} else if (kvm_cpu_has_interrupt(vcpu)) {
>> -if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>> -kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>> -false);
>> +} else if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>> +int vec = kvm_cpu_get_interrupt(vcpu);
>> +if (vec != -1) {
>> +printk(KERN_ERR"inject %d rip=%x\n", vec, 
>> kvm_rip_read(vcpu));
>> +kvm_queue_interrupt(vcpu, vec, false);
>>  kvm_x86_ops->set_irq(vcpu);
>>  }   }
>>   
>
> Not sure if this is a win.  Usually interrupts are allowed, so we'll  
> have to do both calculations.  On vmx ->interrupt_allowed() reads the  
> vmcs, which is slow.
>
Depend on how slow vmcs read is. In case when interrupt is available we
also scan IRR twice with current code: first time in kvm_cpu_has_interrupt()
and another in kvm_cpu_get_interrupt().

If we want to optimize vmcs access, and for nested virtualization we may
want to do it, we can read common ones on exit and use in memory copy.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/43] KVM: Split IOAPIC structure

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

Prepared for reuse ioapic_redir_entry for MSI.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 include/linux/kvm_types.h |   17 +
 virt/kvm/ioapic.c |6 +++---
 virt/kvm/ioapic.h |   17 +
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 2b8318c..b84aca3 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -40,4 +40,21 @@ typedef unsigned long  hfn_t;
 
 typedef hfn_t pfn_t;
 
+union kvm_ioapic_redirect_entry {
+   u64 bits;
+   struct {
+   u8 vector;
+   u8 delivery_mode:3;
+   u8 dest_mode:1;
+   u8 delivery_status:1;
+   u8 polarity:1;
+   u8 remote_irr:1;
+   u8 trig_mode:1;
+   u8 mask:1;
+   u8 reserve:7;
+   u8 reserved[4];
+   u8 dest_id;
+   } fields;
+};
+
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index c3b99de..8128013 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -85,7 +85,7 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic 
*ioapic,
 
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 {
-   union ioapic_redir_entry *pent;
+   union kvm_ioapic_redirect_entry *pent;
int injected = -1;
 
pent = &ioapic->redirtbl[idx];
@@ -284,7 +284,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int level)
 {
u32 old_irr = ioapic->irr;
u32 mask = 1 << irq;
-   union ioapic_redir_entry entry;
+   union kvm_ioapic_redirect_entry entry;
int ret = 1;
 
if (irq >= 0 && irq < IOAPIC_NUM_PINS) {
@@ -305,7 +305,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int level)
 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int pin,
int trigger_mode)
 {
-   union ioapic_redir_entry *ent;
+   union kvm_ioapic_redirect_entry *ent;
 
ent = &ioapic->redirtbl[pin];
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a34bd5e..008ec87 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -40,22 +40,7 @@ struct kvm_ioapic {
u32 id;
u32 irr;
u32 pad;
-   union ioapic_redir_entry {
-   u64 bits;
-   struct {
-   u8 vector;
-   u8 delivery_mode:3;
-   u8 dest_mode:1;
-   u8 delivery_status:1;
-   u8 polarity:1;
-   u8 remote_irr:1;
-   u8 trig_mode:1;
-   u8 mask:1;
-   u8 reserve:7;
-   u8 reserved[4];
-   u8 dest_id;
-   } fields;
-   } redirtbl[IOAPIC_NUM_PINS];
+   union kvm_ioapic_redirect_entry redirtbl[IOAPIC_NUM_PINS];
struct kvm_io_device dev;
struct kvm *kvm;
void (*ack_notifier)(void *opaque, int irq);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/43] KVM: VMX: Don't use highmem pages for the msr and pio bitmaps

2009-05-18 Thread Avi Kivity
Highmem pages are a pain, and saving three lowmem pages on i386 isn't worth
the extra code.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   59 ++--
 1 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb48133..b20c9e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -111,9 +111,9 @@ static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
 
-static struct page *vmx_io_bitmap_a;
-static struct page *vmx_io_bitmap_b;
-static struct page *vmx_msr_bitmap;
+static unsigned long *vmx_io_bitmap_a;
+static unsigned long *vmx_io_bitmap_b;
+static unsigned long *vmx_msr_bitmap;
 
 static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
 static DEFINE_SPINLOCK(vmx_vpid_lock);
@@ -2082,9 +2082,9 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
spin_unlock(&vmx_vpid_lock);
 }
 
-static void vmx_disable_intercept_for_msr(struct page *msr_bitmap, u32 msr)
+static void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
 {
-   void *va;
+   int f = sizeof(unsigned long);
 
if (!cpu_has_vmx_msr_bitmap())
return;
@@ -2094,16 +2094,14 @@ static void vmx_disable_intercept_for_msr(struct page 
*msr_bitmap, u32 msr)
 * have the write-low and read-high bitmap offsets the wrong way round.
 * We can control MSRs 0x-0x1fff and 0xc000-0xc0001fff.
 */
-   va = kmap(msr_bitmap);
if (msr <= 0x1fff) {
-   __clear_bit(msr, va + 0x000); /* read-low */
-   __clear_bit(msr, va + 0x800); /* write-low */
+   __clear_bit(msr, msr_bitmap + 0x000 / f); /* read-low */
+   __clear_bit(msr, msr_bitmap + 0x800 / f); /* write-low */
} else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) {
msr &= 0x1fff;
-   __clear_bit(msr, va + 0x400); /* read-high */
-   __clear_bit(msr, va + 0xc00); /* write-high */
+   __clear_bit(msr, msr_bitmap + 0x400 / f); /* read-high */
+   __clear_bit(msr, msr_bitmap + 0xc00 / f); /* write-high */
}
-   kunmap(msr_bitmap);
 }
 
 /*
@@ -2121,11 +2119,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
u32 exec_control;
 
/* I/O */
-   vmcs_write64(IO_BITMAP_A, page_to_phys(vmx_io_bitmap_a));
-   vmcs_write64(IO_BITMAP_B, page_to_phys(vmx_io_bitmap_b));
+   vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
+   vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
 
if (cpu_has_vmx_msr_bitmap())
-   vmcs_write64(MSR_BITMAP, page_to_phys(vmx_msr_bitmap));
+   vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap));
 
vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
 
@@ -3695,20 +3693,19 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
 static int __init vmx_init(void)
 {
-   void *va;
int r;
 
-   vmx_io_bitmap_a = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+   vmx_io_bitmap_a = (unsigned long *)__get_free_page(GFP_KERNEL);
if (!vmx_io_bitmap_a)
return -ENOMEM;
 
-   vmx_io_bitmap_b = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+   vmx_io_bitmap_b = (unsigned long *)__get_free_page(GFP_KERNEL);
if (!vmx_io_bitmap_b) {
r = -ENOMEM;
goto out;
}
 
-   vmx_msr_bitmap = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+   vmx_msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
if (!vmx_msr_bitmap) {
r = -ENOMEM;
goto out1;
@@ -3718,18 +3715,12 @@ static int __init vmx_init(void)
 * Allow direct access to the PC debug port (it is often used for I/O
 * delays, but the vmexits simply slow things down).
 */
-   va = kmap(vmx_io_bitmap_a);
-   memset(va, 0xff, PAGE_SIZE);
-   clear_bit(0x80, va);
-   kunmap(vmx_io_bitmap_a);
+   memset(vmx_io_bitmap_a, 0xff, PAGE_SIZE);
+   clear_bit(0x80, vmx_io_bitmap_a);
 
-   va = kmap(vmx_io_bitmap_b);
-   memset(va, 0xff, PAGE_SIZE);
-   kunmap(vmx_io_bitmap_b);
+   memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE);
 
-   va = kmap(vmx_msr_bitmap);
-   memset(va, 0xff, PAGE_SIZE);
-   kunmap(vmx_msr_bitmap);
+   memset(vmx_msr_bitmap, 0xff, PAGE_SIZE);
 
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
@@ -3762,19 +3753,19 @@ static int __init vmx_init(void)
return 0;
 
 out2:
-   __free_page(vmx_msr_bitmap);
+   free_page((unsigned long)vmx_msr_bitmap);
 out1:
-   __free_page(vmx_io_bitmap_b);
+   free_page((unsigned long)vmx_io_bitmap_b);
 out:
-   __free_page(vmx_io_bitmap_a);
+   free_page((unsigned long)vmx_io_bitmap_a);
return r;
 }
 
 static void __exit vmx_exit(void)
 {
-   __free_page(vmx_msr_bitmap);

[PATCH 04/43] KVM: Unify the delivery of IOAPIC and MSI interrupts

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 include/linux/kvm_host.h |3 +
 virt/kvm/ioapic.c|   91 ---
 virt/kvm/irq_comm.c  |   95 --
 3 files changed, 95 insertions(+), 94 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 894a56e..1a2f98f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -352,6 +352,9 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
  struct kvm_irq_mask_notifier *kimn);
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
 
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+  union kvm_ioapic_redirect_entry *entry,
+  unsigned long *deliver_bitmask);
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 8128013..883fd0d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -203,79 +203,56 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
 
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
-   u8 dest = ioapic->redirtbl[irq].fields.dest_id;
-   u8 dest_mode = ioapic->redirtbl[irq].fields.dest_mode;
-   u8 delivery_mode = ioapic->redirtbl[irq].fields.delivery_mode;
-   u8 vector = ioapic->redirtbl[irq].fields.vector;
-   u8 trig_mode = ioapic->redirtbl[irq].fields.trig_mode;
-   u32 deliver_bitmask;
+   union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
+   unsigned long deliver_bitmask;
struct kvm_vcpu *vcpu;
int vcpu_id, r = -1;
 
ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
 "vector=%x trig_mode=%x\n",
-dest, dest_mode, delivery_mode, vector, trig_mode);
+entry.fields.dest, entry.fields.dest_mode,
+entry.fields.delivery_mode, entry.fields.vector,
+entry.fields.trig_mode);
 
-   deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, dest,
- dest_mode);
+   kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
if (!deliver_bitmask) {
ioapic_debug("no target on destination\n");
return 0;
}
 
-   switch (delivery_mode) {
-   case IOAPIC_LOWEST_PRIORITY:
-   vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
-   deliver_bitmask);
+   /* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
-   if (irq == 0)
-   vcpu = ioapic->kvm->vcpus[0];
+   if (irq == 0)
+   deliver_bitmask = 1;
 #endif
-   if (vcpu != NULL)
-   r = ioapic_inj_irq(ioapic, vcpu, vector,
-  trig_mode, delivery_mode);
-   else
-   ioapic_debug("null lowest prio vcpu: "
-"mask=%x vector=%x delivery_mode=%x\n",
-deliver_bitmask, vector, 
IOAPIC_LOWEST_PRIORITY);
-   break;
-   case IOAPIC_FIXED:
-#ifdef CONFIG_X86
-   if (irq == 0)
-   deliver_bitmask = 1;
-#endif
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask & (1 << vcpu_id)))
-   continue;
-   deliver_bitmask &= ~(1 << vcpu_id);
-   vcpu = ioapic->kvm->vcpus[vcpu_id];
-   if (vcpu) {
+
+   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+   if (!(deliver_bitmask & (1 << vcpu_id)))
+   continue;
+   deliver_bitmask &= ~(1 << vcpu_id);
+   vcpu = ioapic->kvm->vcpus[vcpu_id];
+   if (vcpu) {
+   if (entry.fields.delivery_mode ==
+   IOAPIC_LOWEST_PRIORITY ||
+   entry.fields.delivery_mode == IOAPIC_FIXED) {
if (r < 0)
r = 0;
-   r += ioapic_inj_irq(ioapic, vcpu, vector,
-  trig_mode, delivery_mode);
-   }
-   }
-   break;
-   case IOAPIC_NMI:
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask & (1 << vcpu_id)))
-   continue;
-   deliver_bitmask &= ~(1 << vcpu_id);
-

[PATCH 02/43] KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE

2009-05-18 Thread Avi Kivity
Windows 2008 accesses this MSR often on context switch intensive workloads;
since we run in guest context with the guest MSR value loaded (so swapgs can
work correctly), we can simply disable interception of rdmsr/wrmsr for this
MSR.

A complication occurs since in legacy mode, we run with the host MSR value
loaded. In this case we enable interception.  This means we need two MSR
bitmaps, one for legacy mode and one for long mode.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   57 +++
 1 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b20c9e4..b5eae7a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -113,7 +113,8 @@ static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
 
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
-static unsigned long *vmx_msr_bitmap;
+static unsigned long *vmx_msr_bitmap_legacy;
+static unsigned long *vmx_msr_bitmap_longmode;
 
 static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
 static DEFINE_SPINLOCK(vmx_vpid_lock);
@@ -812,6 +813,7 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int 
to)
 static void setup_msrs(struct vcpu_vmx *vmx)
 {
int save_nmsrs;
+   unsigned long *msr_bitmap;
 
vmx_load_host_state(vmx);
save_nmsrs = 0;
@@ -847,6 +849,15 @@ static void setup_msrs(struct vcpu_vmx *vmx)
__find_msr_index(vmx, MSR_KERNEL_GS_BASE);
 #endif
vmx->msr_offset_efer = __find_msr_index(vmx, MSR_EFER);
+
+   if (cpu_has_vmx_msr_bitmap()) {
+   if (is_long_mode(&vmx->vcpu))
+   msr_bitmap = vmx_msr_bitmap_longmode;
+   else
+   msr_bitmap = vmx_msr_bitmap_legacy;
+
+   vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
+   }
 }
 
 /*
@@ -2082,7 +2093,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
spin_unlock(&vmx_vpid_lock);
 }
 
-static void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
+static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, u32 msr)
 {
int f = sizeof(unsigned long);
 
@@ -2104,6 +2115,13 @@ static void vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap, u32 msr)
}
 }
 
+static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
+{
+   if (!longmode_only)
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, msr);
+   __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
+}
+
 /*
  * Sets up the vmcs for emulated real mode.
  */
@@ -2123,7 +2141,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
 
if (cpu_has_vmx_msr_bitmap())
-   vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap));
+   vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
 
vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
 
@@ -3705,12 +3723,18 @@ static int __init vmx_init(void)
goto out;
}
 
-   vmx_msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
-   if (!vmx_msr_bitmap) {
+   vmx_msr_bitmap_legacy = (unsigned long *)__get_free_page(GFP_KERNEL);
+   if (!vmx_msr_bitmap_legacy) {
r = -ENOMEM;
goto out1;
}
 
+   vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL);
+   if (!vmx_msr_bitmap_longmode) {
+   r = -ENOMEM;
+   goto out2;
+   }
+
/*
 * Allow direct access to the PC debug port (it is often used for I/O
 * delays, but the vmexits simply slow things down).
@@ -3720,19 +3744,21 @@ static int __init vmx_init(void)
 
memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE);
 
-   memset(vmx_msr_bitmap, 0xff, PAGE_SIZE);
+   memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
+   memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);
 
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE);
if (r)
-   goto out2;
+   goto out3;
 
-   vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_FS_BASE);
-   vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_GS_BASE);
-   vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_CS);
-   vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP);
-   vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP);
+   vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
+   vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
+   vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
+   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
+   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
+   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
 
if (vm_need_ept()) {

[PATCH 05/43] KVM: Change API of kvm_ioapic_get_delivery_bitmask

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

In order to use with bit ops.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 virt/kvm/ioapic.c   |   17 -
 virt/kvm/ioapic.h   |4 ++--
 virt/kvm/irq_comm.c |5 +++--
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 883fd0d..3b53712 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -161,22 +161,22 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
kvm_vcpu_kick(vcpu);
 }
 
-u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-   u8 dest_mode)
+void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
+u8 dest_mode, unsigned long *mask)
 {
-   u32 mask = 0;
int i;
struct kvm *kvm = ioapic->kvm;
struct kvm_vcpu *vcpu;
 
ioapic_debug("dest %d dest_mode %d\n", dest, dest_mode);
 
+   *mask = 0;
if (dest_mode == 0) {   /* Physical mode. */
if (dest == 0xFF) { /* Broadcast. */
for (i = 0; i < KVM_MAX_VCPUS; ++i)
if (kvm->vcpus[i] && kvm->vcpus[i]->arch.apic)
-   mask |= 1 << i;
-   return mask;
+   *mask |= 1 << i;
+   return;
}
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
vcpu = kvm->vcpus[i];
@@ -184,7 +184,7 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
continue;
if (kvm_apic_match_physical_addr(vcpu->arch.apic, 
dest)) {
if (vcpu->arch.apic)
-   mask = 1 << i;
+   *mask = 1 << i;
break;
}
}
@@ -195,10 +195,9 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
continue;
if (vcpu->arch.apic &&
kvm_apic_match_logical_addr(vcpu->arch.apic, dest))
-   mask |= 1 << vcpu->vcpu_id;
+   *mask |= 1 << vcpu->vcpu_id;
}
-   ioapic_debug("mask %x\n", mask);
-   return mask;
+   ioapic_debug("mask %x\n", *mask);
 }
 
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 008ec87..f395798 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -70,7 +70,7 @@ void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int 
trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
-u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-   u8 dest_mode);
+void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
+u8 dest_mode, unsigned long *mask);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index aec7a0d..e8ff89c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -49,8 +49,9 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 {
struct kvm_vcpu *vcpu;
 
-   *deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
-   entry->fields.dest_id, entry->fields.dest_mode);
+   kvm_ioapic_get_delivery_bitmask(ioapic, entry->fields.dest_id,
+   entry->fields.dest_mode,
+   deliver_bitmask);
switch (entry->fields.delivery_mode) {
case IOAPIC_LOWEST_PRIORITY:
vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/43] KVM: bit ops for deliver_bitmap

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

It's also convenient when we extend KVM supported vcpu number in the future.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/lapic.c |7 ---
 virt/kvm/ioapic.c|   24 +---
 virt/kvm/irq_comm.c  |   17 +
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6aa8d20..afc59b2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -483,9 +483,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
struct kvm_vcpu *target;
struct kvm_vcpu *vcpu;
-   unsigned long lpr_map = 0;
+   DECLARE_BITMAP(lpr_map, KVM_MAX_VCPUS);
int i;
 
+   bitmap_zero(lpr_map, KVM_MAX_VCPUS);
apic_debug("icr_high 0x%x, icr_low 0x%x, "
   "short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, "
   "dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n",
@@ -500,7 +501,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
if (vcpu->arch.apic &&
apic_match_dest(vcpu, apic, short_hand, dest, dest_mode)) {
if (delivery_mode == APIC_DM_LOWEST)
-   set_bit(vcpu->vcpu_id, &lpr_map);
+   __set_bit(vcpu->vcpu_id, lpr_map);
else
__apic_accept_irq(vcpu->arch.apic, 
delivery_mode,
  vector, level, trig_mode);
@@ -508,7 +509,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
}
 
if (delivery_mode == APIC_DM_LOWEST) {
-   target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, &lpr_map);
+   target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, lpr_map);
if (target != NULL)
__apic_accept_irq(target->arch.apic, delivery_mode,
  vector, level, trig_mode);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3b53712..7c2cb2b 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -203,7 +203,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
-   unsigned long deliver_bitmask;
+   DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
struct kvm_vcpu *vcpu;
int vcpu_id, r = -1;
 
@@ -213,22 +213,24 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 entry.fields.delivery_mode, entry.fields.vector,
 entry.fields.trig_mode);
 
-   kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
-   if (!deliver_bitmask) {
-   ioapic_debug("no target on destination\n");
-   return 0;
-   }
+   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
 
/* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
if (irq == 0)
-   deliver_bitmask = 1;
+   __set_bit(0, deliver_bitmask);
+   else
 #endif
+   kvm_get_intr_delivery_bitmask(ioapic, &entry, deliver_bitmask);
+
+   if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) >= KVM_MAX_VCPUS) {
+   ioapic_debug("no target on destination\n");
+   return 0;
+   }
 
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask & (1 << vcpu_id)))
-   continue;
-   deliver_bitmask &= ~(1 << vcpu_id);
+   while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
+   < KVM_MAX_VCPUS) {
+   __clear_bit(vcpu_id, deliver_bitmask);
vcpu = ioapic->kvm->vcpus[vcpu_id];
if (vcpu) {
if (entry.fields.delivery_mode ==
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index d4421cd..d165e05 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -56,7 +56,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
case IOAPIC_LOWEST_PRIORITY:
vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
entry->fields.vector, deliver_bitmask);
-   *deliver_bitmask = 1 << vcpu->vcpu_id;
+   __set_bit(vcpu->vcpu_id, deliver_bitmask);
break;
case IOAPIC_FIXED:
case IOAPIC_NMI:
@@ -76,10 +76,12 @@ static int kvm_set_msi(struct kvm_kernel_irq_routing_entry 
*e,
struct kvm_vcpu *vcpu;
struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
union kvm_ioapic_redirect_entry entry;
-   unsigned long deliver_bitmask;
+   DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
 
BUG_ON(!ioapic);
 
+   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
+
entry.bits = 0;
entry.fields.dest_id = (e->msi.ad

[PATCH 09/43] KVM: Add MSI-X interrupt injection logic

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

We have to handle more than one interrupt with one handler for MSI-X. Avi
suggested to use a flag to indicate the pending. So here is it.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |   66 +-
 2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 432edc2..3832243 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -319,6 +319,7 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+#define KVM_ASSIGNED_MSIX_PENDING  0x1
 struct kvm_guest_msix_entry {
u32 vector;
u16 entry;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fd33def..7c20e06 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -95,25 +95,69 @@ static struct kvm_assigned_dev_kernel 
*kvm_find_assigned_dev(struct list_head *h
return NULL;
 }
 
+static int find_index_from_host_irq(struct kvm_assigned_dev_kernel
+   *assigned_dev, int irq)
+{
+   int i, index;
+   struct msix_entry *host_msix_entries;
+
+   host_msix_entries = assigned_dev->host_msix_entries;
+
+   index = -1;
+   for (i = 0; i < assigned_dev->entries_nr; i++)
+   if (irq == host_msix_entries[i].vector) {
+   index = i;
+   break;
+   }
+   if (index < 0) {
+   printk(KERN_WARNING "Fail to find correlated MSI-X entry!\n");
+   return 0;
+   }
+
+   return index;
+}
+
 static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 {
struct kvm_assigned_dev_kernel *assigned_dev;
+   struct kvm *kvm;
+   int irq, i;
 
assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
interrupt_work);
+   kvm = assigned_dev->kvm;
 
/* This is taken to safely inject irq inside the guest. When
 * the interrupt injection (or the ioapic code) uses a
 * finer-grained lock, update this
 */
-   mutex_lock(&assigned_dev->kvm->lock);
-   kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
-   assigned_dev->guest_irq, 1);
-
-   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
-   enable_irq(assigned_dev->host_irq);
-   assigned_dev->host_irq_disabled = false;
+   mutex_lock(&kvm->lock);
+   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) {
+   struct kvm_guest_msix_entry *guest_entries =
+   assigned_dev->guest_msix_entries;
+   for (i = 0; i < assigned_dev->entries_nr; i++) {
+   if (!(guest_entries[i].flags &
+   KVM_ASSIGNED_MSIX_PENDING))
+   continue;
+   guest_entries[i].flags &= ~KVM_ASSIGNED_MSIX_PENDING;
+   kvm_set_irq(assigned_dev->kvm,
+   assigned_dev->irq_source_id,
+   guest_entries[i].vector, 1);
+   irq = assigned_dev->host_msix_entries[i].vector;
+   if (irq != 0)
+   enable_irq(irq);
+   assigned_dev->host_irq_disabled = false;
+   }
+   } else {
+   kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
+   assigned_dev->guest_irq, 1);
+   if (assigned_dev->irq_requested_type &
+   KVM_ASSIGNED_DEV_GUEST_MSI) {
+   enable_irq(assigned_dev->host_irq);
+   assigned_dev->host_irq_disabled = false;
+   }
}
+
mutex_unlock(&assigned_dev->kvm->lock);
 }
 
@@ -122,6 +166,14 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void 
*dev_id)
struct kvm_assigned_dev_kernel *assigned_dev =
(struct kvm_assigned_dev_kernel *) dev_id;
 
+   if (assigned_dev->irq_requested_type == KVM_ASSIGNED_DEV_MSIX) {
+   int index = find_index_from_host_irq(assigned_dev, irq);
+   if (index < 0)
+   return IRQ_HANDLED;
+   assigned_dev->guest_msix_entries[index].flags |=
+   KVM_ASSIGNED_MSIX_PENDING;
+   }
+
schedule_work(&assigned_dev->interrupt_work);
 
disable_irq_nosync(irq);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/43] KVM: PIT: remove unused scheduled variable

2009-05-18 Thread Avi Kivity
From: Marcelo Tosatti 

Unused.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/i8254.c |1 -
 arch/x86/kvm/i8254.h |1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c13bb92..3eddae6 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -208,7 +208,6 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps)
wake_up_interruptible(&vcpu0->wq);
 
hrtimer_add_expires_ns(&pt->timer, pt->period);
-   pt->scheduled = hrtimer_get_expires_ns(&pt->timer);
if (pt->period)
ps->channels[0].count_load_time = ktime_get();
 
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 6acbe4b..521accf 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -7,7 +7,6 @@ struct kvm_kpit_timer {
struct hrtimer timer;
int irq;
s64 period; /* unit: ns */
-   s64 scheduled;
atomic_t pending;
bool reinject;
 };
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/43] KVM: x86: silence preempt warning on kvm_write_guest_time

2009-05-18 Thread Avi Kivity
From: Matt T. Yourst 

This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64)
with PREEMPT enabled.

We're getting syslog warnings like this many (but not all) times qemu
tells KVM to run the VCPU:

BUG: using smp_processor_id() in preemptible [] code:
qemu-system-x86/28938
caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit
Call Trace:
debug_smp_processor_id+0xf7/0x100
kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm]
? __wake_up+0x4e/0x70
? wake_futex+0x27/0x40
kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm]
enqueue_hrtimer+0x8a/0x110
_spin_unlock_irqrestore+0x27/0x50
vfs_ioctl+0x31/0xa0
do_vfs_ioctl+0x74/0x480
sys_futex+0xb4/0x140
sys_ioctl+0x99/0xa0
system_call_fastpath+0x16/0x1b

As it turns out, the call trace is messed up due to gcc's inlining, but
I isolated the problem anyway: kvm_write_guest_time() is being used in a
non-thread-safe manner on preemptable kernels.

Basically kvm_write_guest_time()'s body needs to be surrounded by
preempt_disable() and preempt_enable(), since the kernel won't let us
query any per-CPU data (indirectly using smp_processor_id()) without
preemption disabled. The attached patch fixes this issue by disabling
preemption inside kvm_write_guest_time().

[marcelo: surround only __get_cpu_var calls since the warning
is harmless]

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 49079a4..cac6e63 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -630,10 +630,12 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
if ((!vcpu->time_page))
return;
 
+   preempt_disable();
if (unlikely(vcpu->hv_clock_tsc_khz != __get_cpu_var(cpu_tsc_khz))) {
kvm_set_time_scale(__get_cpu_var(cpu_tsc_khz), &vcpu->hv_clock);
vcpu->hv_clock_tsc_khz = __get_cpu_var(cpu_tsc_khz);
}
+   preempt_enable();
 
/* Keep irq disabled to prevent changes to the clock */
local_irq_save(flags);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/43] KVM: Ioctls for init MSI-X entry

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.

This two ioctls are used by userspace to specific guest device MSI-X entry
number and correlate MSI-X entry with GSI during the initialization stage.

MSI-X should be well initialzed before enabling.

Don't support change MSI-X entry number for now.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 include/linux/kvm.h  |   18 
 include/linux/kvm_host.h |   10 
 virt/kvm/kvm_main.c  |  104 ++
 3 files changed, 132 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 8cc1379..78cdee8 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -487,6 +487,10 @@ struct kvm_irq_routing {
 #define KVM_REINJECT_CONTROL  _IO(KVMIO, 0x71)
 #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
 struct kvm_assigned_pci_dev)
+#define KVM_ASSIGN_SET_MSIX_NR \
+   _IOW(KVMIO, 0x73, struct kvm_assigned_msix_nr)
+#define KVM_ASSIGN_SET_MSIX_ENTRY \
+   _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 
 /*
  * ioctls for vcpu fds
@@ -607,4 +611,18 @@ struct kvm_assigned_irq {
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION  KVM_DEV_IRQ_ASSIGN_ENABLE_MSI
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI  (1 << 0)
 
+struct kvm_assigned_msix_nr {
+   __u32 assigned_dev_id;
+   __u16 entry_nr;
+   __u16 padding;
+};
+
+#define KVM_MAX_MSIX_PER_DEV   512
+struct kvm_assigned_msix_entry {
+   __u32 assigned_dev_id;
+   __u32 gsi;
+   __u16 entry; /* The index of entry in the MSI-X table */
+   __u16 padding[3];
+};
+
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a2f98f..432edc2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -319,6 +319,12 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+struct kvm_guest_msix_entry {
+   u32 vector;
+   u16 entry;
+   u16 flags;
+};
+
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct work_struct interrupt_work;
@@ -326,13 +332,17 @@ struct kvm_assigned_dev_kernel {
int assigned_dev_id;
int host_busnr;
int host_devfn;
+   unsigned int entries_nr;
int host_irq;
bool host_irq_disabled;
+   struct msix_entry *host_msix_entries;
int guest_irq;
+   struct kvm_guest_msix_entry *guest_msix_entries;
 #define KVM_ASSIGNED_DEV_GUEST_INTX(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI (1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX (1 << 8)
 #define KVM_ASSIGNED_DEV_HOST_MSI  (1 << 9)
+#define KVM_ASSIGNED_DEV_MSIX  ((1 << 2) | (1 << 10))
unsigned long irq_requested_type;
int irq_source_id;
int flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1ecbe23..fd33def 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1593,6 +1593,88 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu 
*vcpu, sigset_t *sigset)
return 0;
 }
 
+#ifdef __KVM_HAVE_MSIX
+static int kvm_vm_ioctl_set_msix_nr(struct kvm *kvm,
+   struct kvm_assigned_msix_nr *entry_nr)
+{
+   int r = 0;
+   struct kvm_assigned_dev_kernel *adev;
+
+   mutex_lock(&kvm->lock);
+
+   adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+ entry_nr->assigned_dev_id);
+   if (!adev) {
+   r = -EINVAL;
+   goto msix_nr_out;
+   }
+
+   if (adev->entries_nr == 0) {
+   adev->entries_nr = entry_nr->entry_nr;
+   if (adev->entries_nr == 0 ||
+   adev->entries_nr >= KVM_MAX_MSIX_PER_DEV) {
+   r = -EINVAL;
+   goto msix_nr_out;
+   }
+
+   adev->host_msix_entries = kzalloc(sizeof(struct msix_entry) *
+   entry_nr->entry_nr,
+   GFP_KERNEL);
+   if (!adev->host_msix_entries) {
+   r = -ENOMEM;
+   goto msix_nr_out;
+   }
+   adev->guest_msix_entries = kzalloc(
+   sizeof(struct kvm_guest_msix_entry) *
+   entry_nr->entry_nr, GFP_KERNEL);
+   if (!adev->guest_msix_entries) {
+   kfree(adev->host_msix_entries);
+   r = -ENOMEM;
+   goto msix_nr_out;
+   }
+   } else /* Not allowed set MSI-X number twice */
+   r = -EINVAL;
+msix_nr_out:
+   mutex_unlock(&kvm->lock);
+   return r;
+}
+
+static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
+  struct kvm_assigned_msix_entry *entry)
+{
+ 

[PATCH 31/43] KVM: MMU: do not free active mmu pages in free_mmu_pages()

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

free_mmu_pages() should only undo what alloc_mmu_pages() does.
Free mmu pages from the generic VM destruction function, kvm_destroy_vm().

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/mmu.c  |8 
 virt/kvm/kvm_main.c |2 ++
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b6caf13..9256484 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2638,14 +2638,6 @@ EXPORT_SYMBOL_GPL(kvm_disable_tdp);
 
 static void free_mmu_pages(struct kvm_vcpu *vcpu)
 {
-   struct kvm_mmu_page *sp;
-
-   while (!list_empty(&vcpu->kvm->arch.active_mmu_pages)) {
-   sp = container_of(vcpu->kvm->arch.active_mmu_pages.next,
- struct kvm_mmu_page, link);
-   kvm_mmu_zap_page(vcpu->kvm, sp);
-   cond_resched();
-   }
free_page((unsigned long)vcpu->arch.mmu.pae_root);
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 231ca3c..7379eab 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1032,6 +1032,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 #endif
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
+#else
+   kvm_arch_flush_shadow(kvm);
 #endif
kvm_arch_destroy_vm(kvm);
mmdrop(mm);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/43] KVM: ia64: fix compilation error in kvm_get_lowest_prio_vcpu

2009-05-18 Thread Avi Kivity
From: Yang Zhang 

Modify the arg of kvm_get_lowest_prio_vcpu().
Make it consistent with its declaration.

Signed-off-by: Yang Zhang 
Signed-off-by: Marcelo Tosatti 
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d20a5db..774f0d7 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1837,7 +1837,7 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, 
u8 mda)
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-  unsigned long bitmap)
+  unsigned long *bitmap)
 {
struct kvm_vcpu *lvcpu = kvm->vcpus[0];
int i;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/43] KVM: x86: paravirt skip pit-through-ioapic boot check

2009-05-18 Thread Avi Kivity
From: Marcelo Tosatti 

Skip the test which checks if the PIT is properly routed when
using the IOAPIC, aimed at buggy hardware.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kernel/kvm.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 33019dd..057173d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MMU_QUEUE_SIZE 1024
 
@@ -230,6 +231,9 @@ static void paravirt_ops_setup(void)
pv_mmu_ops.lazy_mode.enter = kvm_enter_lazy_mmu;
pv_mmu_ops.lazy_mode.leave = kvm_leave_lazy_mmu;
}
+#ifdef CONFIG_X86_IO_APIC
+   no_timer_check = 1;
+#endif
 }
 
 void __init kvm_guest_init(void)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/43] KVM: PIT: remove usage of count_load_time for channel 0

2009-05-18 Thread Avi Kivity
From: Marcelo Tosatti 

We can infer elapsed time from hrtimer_expires_remaining.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/i8254.c |   37 +++--
 1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 3eddae6..09ae841 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -98,6 +98,32 @@ static int pit_get_gate(struct kvm *kvm, int channel)
return kvm->arch.vpit->pit_state.channels[channel].gate;
 }
 
+static s64 __kpit_elapsed(struct kvm *kvm)
+{
+   s64 elapsed;
+   ktime_t remaining;
+   struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
+
+   remaining = hrtimer_expires_remaining(&ps->pit_timer.timer);
+   if (ktime_to_ns(remaining) < 0)
+   remaining = ktime_set(0, 0);
+
+   elapsed = ps->pit_timer.period;
+   if (ktime_to_ns(remaining) <= ps->pit_timer.period)
+   elapsed = ps->pit_timer.period - ktime_to_ns(remaining);
+
+   return elapsed;
+}
+
+static s64 kpit_elapsed(struct kvm *kvm, struct kvm_kpit_channel_state *c,
+   int channel)
+{
+   if (channel == 0)
+   return __kpit_elapsed(kvm);
+
+   return ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
+}
+
 static int pit_get_count(struct kvm *kvm, int channel)
 {
struct kvm_kpit_channel_state *c =
@@ -107,7 +133,7 @@ static int pit_get_count(struct kvm *kvm, int channel)
 
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
 
-   t = ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
+   t = kpit_elapsed(kvm, c, channel);
d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
switch (c->mode) {
@@ -137,7 +163,7 @@ static int pit_get_out(struct kvm *kvm, int channel)
 
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
 
-   t = ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
+   t = kpit_elapsed(kvm, c, channel);
d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
switch (c->mode) {
@@ -208,8 +234,6 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps)
wake_up_interruptible(&vcpu0->wq);
 
hrtimer_add_expires_ns(&pt->timer, pt->period);
-   if (pt->period)
-   ps->channels[0].count_load_time = ktime_get();
 
return (pt->period == 0 ? 0 : 1);
 }
@@ -305,11 +329,12 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
if (val == 0)
val = 0x1;
 
-   ps->channels[channel].count_load_time = ktime_get();
ps->channels[channel].count = val;
 
-   if (channel != 0)
+   if (channel != 0) {
+   ps->channels[channel].count_load_time = ktime_get();
return;
+   }
 
/* Two types of timer
 * mode 1 is one shot, mode 2 is period, otherwise del timer */
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/43] KVM: unify part of generic timer handling

2009-05-18 Thread Avi Kivity
From: Marcelo Tosatti 

Hide the internals of vcpu awakening / injection from the in-kernel
emulated timers. This makes future changes in this logic easier and
decreases the distance to more generic timer handling.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/i8254.c |   57 ++--
 arch/x86/kvm/i8254.h |   11 +
 arch/x86/kvm/kvm_timer.h |   18 +
 arch/x86/kvm/lapic.c |   94 +++---
 arch/x86/kvm/lapic.h |9 +---
 arch/x86/kvm/timer.c |   46 ++
 7 files changed, 129 insertions(+), 108 deletions(-)
 create mode 100644 arch/x86/kvm/kvm_timer.h
 create mode 100644 arch/x86/kvm/timer.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index d3ec292..b43c4ef 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -14,7 +14,7 @@ endif
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-   i8254.o
+   i8254.o timer.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 09ae841..4e2e3f2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -219,25 +219,6 @@ static void pit_latch_status(struct kvm *kvm, int channel)
}
 }
 
-static int __pit_timer_fn(struct kvm_kpit_state *ps)
-{
-   struct kvm_vcpu *vcpu0 = ps->pit->kvm->vcpus[0];
-   struct kvm_kpit_timer *pt = &ps->pit_timer;
-
-   if (!atomic_inc_and_test(&pt->pending))
-   set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests);
-
-   if (!pt->reinject)
-   atomic_set(&pt->pending, 1);
-
-   if (vcpu0 && waitqueue_active(&vcpu0->wq))
-   wake_up_interruptible(&vcpu0->wq);
-
-   hrtimer_add_expires_ns(&pt->timer, pt->period);
-
-   return (pt->period == 0 ? 0 : 1);
-}
-
 int pit_has_pending_timer(struct kvm_vcpu *vcpu)
 {
struct kvm_pit *pit = vcpu->kvm->arch.vpit;
@@ -258,21 +239,6 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier 
*kian)
spin_unlock(&ps->inject_lock);
 }
 
-static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
-{
-   struct kvm_kpit_state *ps;
-   int restart_timer = 0;
-
-   ps = container_of(data, struct kvm_kpit_state, pit_timer.timer);
-
-   restart_timer = __pit_timer_fn(ps);
-
-   if (restart_timer)
-   return HRTIMER_RESTART;
-   else
-   return HRTIMER_NORESTART;
-}
-
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
 {
struct kvm_pit *pit = vcpu->kvm->arch.vpit;
@@ -286,15 +252,26 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
-static void destroy_pit_timer(struct kvm_kpit_timer *pt)
+static void destroy_pit_timer(struct kvm_timer *pt)
 {
pr_debug("pit: execute del timer!\n");
hrtimer_cancel(&pt->timer);
 }
 
+static bool kpit_is_periodic(struct kvm_timer *ktimer)
+{
+   struct kvm_kpit_state *ps = container_of(ktimer, struct kvm_kpit_state,
+pit_timer);
+   return ps->is_periodic;
+}
+
+struct kvm_timer_ops kpit_ops = {
+   .is_periodic = kpit_is_periodic,
+};
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
-   struct kvm_kpit_timer *pt = &ps->pit_timer;
+   struct kvm_timer *pt = &ps->pit_timer;
s64 interval;
 
interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
@@ -304,7 +281,13 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
/* TODO The new value only affected after the retriggered */
hrtimer_cancel(&pt->timer);
pt->period = (is_period == 0) ? 0 : interval;
-   pt->timer.function = pit_timer_fn;
+   ps->is_periodic = is_period;
+
+   pt->timer.function = kvm_timer_fn;
+   pt->t_ops = &kpit_ops;
+   pt->kvm = ps->pit->kvm;
+   pt->vcpu_id = 0;
+
atomic_set(&pt->pending, 0);
ps->irq_ack = 1;
 
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 521accf..bbd863f 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -3,14 +3,6 @@
 
 #include "iodev.h"
 
-struct kvm_kpit_timer {
-   struct hrtimer timer;
-   int irq;
-   s64 period; /* unit: ns */
-   atomic_t pending;
-   bool reinject;
-};
-
 struct kvm_kpit_channel_state {
u32 count; /* can be 65536 */
u16 latched_count;
@@ -29,7 +21,8 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
-   struct kvm_kpit_timer pit_timer;
+   struct kvm_timer pit_timer;
+   bool is_periodic;
u32speaker_data_on;
struct mutex lock;
struct kvm_pit *pit;
diff --git a/arch/x86/kvm/kvm_timer.h b

[PATCH 06/43] KVM: Update intr delivery func to accept unsigned long* bitmap

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
increased.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/lapic.c |8 
 virt/kvm/ioapic.h|2 +-
 virt/kvm/irq_comm.c  |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f0b67f2..6aa8d20 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -409,7 +409,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
 }
 
 static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
-  unsigned long bitmap)
+  unsigned long *bitmap)
 {
int last;
int next;
@@ -421,7 +421,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm 
*kvm, u8 vector,
do {
if (++next == KVM_MAX_VCPUS)
next = 0;
-   if (kvm->vcpus[next] == NULL || !test_bit(next, &bitmap))
+   if (kvm->vcpus[next] == NULL || !test_bit(next, bitmap))
continue;
apic = kvm->vcpus[next]->arch.apic;
if (apic && apic_enabled(apic))
@@ -437,7 +437,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm 
*kvm, u8 vector,
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-   unsigned long bitmap)
+   unsigned long *bitmap)
 {
struct kvm_lapic *apic;
 
@@ -508,7 +508,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
}
 
if (delivery_mode == APIC_DM_LOWEST) {
-   target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, lpr_map);
+   target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, &lpr_map);
if (target != NULL)
__apic_accept_irq(target->arch.apic, delivery_mode,
  vector, level, trig_mode);
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index f395798..7275f87 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -65,7 +65,7 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-  unsigned long bitmap);
+  unsigned long *bitmap);
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index e8ff89c..d4421cd 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -55,7 +55,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
switch (entry->fields.delivery_mode) {
case IOAPIC_LOWEST_PRIORITY:
vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
-   entry->fields.vector, *deliver_bitmask);
+   entry->fields.vector, deliver_bitmask);
*deliver_bitmask = 1 << vcpu->vcpu_id;
break;
case IOAPIC_FIXED:
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/43] KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr

2009-05-18 Thread Avi Kivity
From: Joerg Roedel 

There is no reason to update the shadow pte here because the guest pte
is only changed to dirty state.

Signed-off-by: Joerg Roedel 
Signed-off-by: Marcelo Tosatti 
---
 arch/x86/kvm/paging_tmpl.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6bd7020..855eb71 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -209,7 +209,6 @@ walk:
if (ret)
goto walk;
pte |= PT_DIRTY_MASK;
-   kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte), 0);
walker->ptes[walker->level - 1] = pte;
}
 
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/43] KVM: ioapic/msi interrupt delivery consolidation

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

ioapic_deliver() and kvm_set_msi() have code duplication. Move
the code into ioapic_deliver_entry() function and call it from
both places.

Signed-off-by: Gleb Natapov 
Signed-off-by: Marcelo Tosatti 
---
 include/linux/kvm_host.h |2 +-
 virt/kvm/ioapic.c|   61 --
 virt/kvm/ioapic.h|4 +-
 virt/kvm/irq_comm.c  |   32 ++--
 4 files changed, 38 insertions(+), 61 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b91ec9..ec9d078 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -364,7 +364,7 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
 
 #ifdef __KVM_HAVE_IOAPIC
-void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+void kvm_get_intr_delivery_bitmask(struct kvm *kvm,
   union kvm_ioapic_redirect_entry *entry,
   unsigned long *deliver_bitmask);
 #endif
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index d4a7948..b71c044 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -142,54 +142,57 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
}
 }
 
-static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
+int ioapic_deliver_entry(struct kvm *kvm, union kvm_ioapic_redirect_entry *e)
 {
-   union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
-   struct kvm_vcpu *vcpu;
-   int vcpu_id, r = -1;
+   int i, r = -1;
 
-   ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
-"vector=%x trig_mode=%x\n",
-entry.fields.dest, entry.fields.dest_mode,
-entry.fields.delivery_mode, entry.fields.vector,
-entry.fields.trig_mode);
-
-   /* Always delivery PIT interrupt to vcpu 0 */
-#ifdef CONFIG_X86
-   if (irq == 0) {
-bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
-   __set_bit(0, deliver_bitmask);
-} else
-#endif
-   kvm_get_intr_delivery_bitmask(ioapic, &entry, deliver_bitmask);
+   kvm_get_intr_delivery_bitmask(kvm, e, deliver_bitmask);
 
if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) >= KVM_MAX_VCPUS) {
ioapic_debug("no target on destination\n");
-   return 0;
+   return r;
}
 
-   while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
+   while ((i = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
< KVM_MAX_VCPUS) {
-   __clear_bit(vcpu_id, deliver_bitmask);
-   vcpu = ioapic->kvm->vcpus[vcpu_id];
+   struct kvm_vcpu *vcpu = kvm->vcpus[i];
+   __clear_bit(i, deliver_bitmask);
if (vcpu) {
if (r < 0)
r = 0;
-   r += kvm_apic_set_irq(vcpu,
-   entry.fields.vector,
-   entry.fields.trig_mode,
-   entry.fields.delivery_mode);
+   r += kvm_apic_set_irq(vcpu, e->fields.vector,
+   e->fields.delivery_mode,
+   e->fields.trig_mode);
} else
ioapic_debug("null destination vcpu: "
 "mask=%x vector=%x delivery_mode=%x\n",
-entry.fields.deliver_bitmask,
-entry.fields.vector,
-entry.fields.delivery_mode);
+e->fields.deliver_bitmask,
+e->fields.vector, e->fields.delivery_mode);
}
return r;
 }
 
+static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
+{
+   union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
+
+   ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
+"vector=%x trig_mode=%x\n",
+entry.fields.dest, entry.fields.dest_mode,
+entry.fields.delivery_mode, entry.fields.vector,
+entry.fields.trig_mode);
+
+#ifdef CONFIG_X86
+   /* Always delivery PIT interrupt to vcpu 0 */
+   if (irq == 0) {
+   entry.fields.dest_mode = 0; /* Physical mode. */
+   entry.fields.dest_id = ioapic->kvm->vcpus[0]->vcpu_id;
+   }
+#endif
+   return ioapic_deliver_entry(ioapic->kvm, &entry);
+}
+
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 {
u32 old_irr = ioapic->irr;
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index c8032ab..bedeea5 100644
--- a/virt/kvm/ioapic.h
+++ b/

[PATCH 28/43] KVM: ia64: Drop in SN2 replacement of fast path ITC emulation fault handler

2009-05-18 Thread Avi Kivity
From: Jes Sorensen 

Copy in SN2 RTC based ITC emulation for fast exit. The two versions
have the same size, so a dropin is simpler than patching the branch
instruction to hit the SN2 version.

Signed-off-by: Jes Sorensen 
Acked-by: Xiantao Zhang 
Signed-off-by: Avi Kivity 
---
 arch/ia64/include/asm/kvm_host.h |2 ++
 arch/ia64/kvm/kvm-ia64.c |   32 +++-
 arch/ia64/kvm/optvfault.S|   30 ++
 arch/ia64/kvm/vmm.c  |   12 
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 1243995..589536f 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -580,6 +580,8 @@ struct kvm_vmm_info{
kvm_vmm_entry   *vmm_entry;
kvm_tramp_entry *tramp_entry;
unsigned long   vmm_ivt;
+   unsigned long   patch_mov_ar;
+   unsigned long   patch_mov_ar_sn2;
 };
 
 int kvm_highest_pending_irq(struct kvm_vcpu *vcpu);
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index ebbd995..4623a90 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1671,8 +1671,37 @@ out:
return 0;
 }
 
+
+/*
+ * On SN2, the ITC isn't stable, so copy in fast path code to use the
+ * SN2 RTC, replacing the ITC based default verion.
+ */
+static void kvm_patch_vmm(struct kvm_vmm_info *vmm_info,
+ struct module *module)
+{
+   unsigned long new_ar, new_ar_sn2;
+   unsigned long module_base;
+
+   if (!ia64_platform_is("sn2"))
+   return;
+
+   module_base = (unsigned long)module->module_core;
+
+   new_ar = kvm_vmm_base + vmm_info->patch_mov_ar - module_base;
+   new_ar_sn2 = kvm_vmm_base + vmm_info->patch_mov_ar_sn2 - module_base;
+
+   printk(KERN_INFO "kvm: Patching ITC emulation to use SGI SN2 RTC "
+  "as source\n");
+
+   /*
+* Copy the SN2 version of mov_ar into place. They are both
+* the same size, so 6 bundles is sufficient (6 * 0x10).
+*/
+   memcpy((void *)new_ar, (void *)new_ar_sn2, 0x60);
+}
+
 static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info,
-   struct module *module)
+   struct module *module)
 {
unsigned long module_base;
unsigned long vmm_size;
@@ -1694,6 +1723,7 @@ static int kvm_relocate_vmm(struct kvm_vmm_info *vmm_info,
return -EFAULT;
 
memcpy((void *)kvm_vmm_base, (void *)module_base, vmm_size);
+   kvm_patch_vmm(vmm_info, module);
kvm_flush_icache(kvm_vmm_base, vmm_size);
 
/*Recalculate kvm_vmm_info based on new VMM*/
diff --git a/arch/ia64/kvm/optvfault.S b/arch/ia64/kvm/optvfault.S
index 32254ce..f793be3 100644
--- a/arch/ia64/kvm/optvfault.S
+++ b/arch/ia64/kvm/optvfault.S
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 
 #include "vti.h"
 #include "asm-offsets.h"
@@ -140,6 +141,35 @@ GLOBAL_ENTRY(kvm_asm_mov_from_ar)
;;
 END(kvm_asm_mov_from_ar)
 
+/*
+ * Special SGI SN2 optimized version of mov_from_ar using the SN2 RTC
+ * clock as it's source for emulating the ITC. This version will be
+ * copied on top of the original version if the host is determined to
+ * be an SN2.
+ */
+GLOBAL_ENTRY(kvm_asm_mov_from_ar_sn2)
+   add r18=VMM_VCPU_ITC_OFS_OFFSET, r21
+   movl r19 = (KVM_VMM_BASE+(1

[PATCH 18/43] KVM: Merge kvm_ioapic_get_delivery_bitmask into kvm_get_intr_delivery_bitmask

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

Gleb fixed bitmap ops usage in kvm_ioapic_get_delivery_bitmask.

Sheng merged two functions, as well as fixed several issues in
kvm_get_intr_delivery_bitmask
1. deliver_bitmask is a bitmap rather than a unsigned long intereger.
2. Lowest priority target bitmap wrong calculated by mistake.
3. Prevent potential NULL reference.
4. Declaration in include/kvm_host.h caused powerpc compilation warning.
5. Add warning for guest broadcast interrupt with lowest priority delivery mode.
6. Removed duplicate bitmap clean up in caller of kvm_get_intr_delivery_bitmask.

Signed-off-by: Gleb Natapov 
Signed-off-by: Sheng Yang 
Signed-off-by: Marcelo Tosatti 
---
 virt/kvm/ioapic.c   |   46 +++---
 virt/kvm/ioapic.h   |5 +++--
 virt/kvm/irq_comm.c |   49 +++--
 3 files changed, 49 insertions(+), 51 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 7c2cb2b..ea268a8 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -161,45 +161,6 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
kvm_vcpu_kick(vcpu);
 }
 
-void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-u8 dest_mode, unsigned long *mask)
-{
-   int i;
-   struct kvm *kvm = ioapic->kvm;
-   struct kvm_vcpu *vcpu;
-
-   ioapic_debug("dest %d dest_mode %d\n", dest, dest_mode);
-
-   *mask = 0;
-   if (dest_mode == 0) {   /* Physical mode. */
-   if (dest == 0xFF) { /* Broadcast. */
-   for (i = 0; i < KVM_MAX_VCPUS; ++i)
-   if (kvm->vcpus[i] && kvm->vcpus[i]->arch.apic)
-   *mask |= 1 << i;
-   return;
-   }
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   vcpu = kvm->vcpus[i];
-   if (!vcpu)
-   continue;
-   if (kvm_apic_match_physical_addr(vcpu->arch.apic, 
dest)) {
-   if (vcpu->arch.apic)
-   *mask = 1 << i;
-   break;
-   }
-   }
-   } else if (dest != 0)   /* Logical mode, MDA non-zero. */
-   for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   vcpu = kvm->vcpus[i];
-   if (!vcpu)
-   continue;
-   if (vcpu->arch.apic &&
-   kvm_apic_match_logical_addr(vcpu->arch.apic, dest))
-   *mask |= 1 << vcpu->vcpu_id;
-   }
-   ioapic_debug("mask %x\n", *mask);
-}
-
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
@@ -213,13 +174,12 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 entry.fields.delivery_mode, entry.fields.vector,
 entry.fields.trig_mode);
 
-   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
-
/* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
-   if (irq == 0)
+   if (irq == 0) {
+bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
__set_bit(0, deliver_bitmask);
-   else
+} else
 #endif
kvm_get_intr_delivery_bitmask(ioapic, &entry, deliver_bitmask);
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 7275f87..c8032ab 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -70,7 +70,8 @@ void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int 
trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
-void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-u8 dest_mode, unsigned long *mask);
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+  union kvm_ioapic_redirect_entry *entry,
+  unsigned long *deliver_bitmask);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index d165e05..1c6ff6d 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -47,15 +47,54 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   union kvm_ioapic_redirect_entry *entry,
   unsigned long *deliver_bitmask)
 {
+   int i;
+   struct kvm *kvm = ioapic->kvm;
struct kvm_vcpu *vcpu;
 
-   kvm_ioapic_get_delivery_bitmask(ioapic, entry->fields.dest_id,
-   entry->fields.dest_mode,
-   deliver_bitmask);
+   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
+
+   if (entry->fields.dest_mo

[PATCH 10/43] KVM: Enable MSI-X for KVM assigned device

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.

2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.

Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm.h |1 +
 include/linux/kvm.h|8 
 virt/kvm/kvm_main.c|   98 +---
 3 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index dc3f6cf..125be8b 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -16,6 +16,7 @@
 #define __KVM_HAVE_MSI
 #define __KVM_HAVE_USER_NMI
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_MSIX
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 78cdee8..640835e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -409,6 +409,9 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
 #define KVM_CAP_DEVICE_DEASSIGNMENT 27
 #endif
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX 28
+#endif
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
 
@@ -611,6 +614,11 @@ struct kvm_assigned_irq {
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION  KVM_DEV_IRQ_ASSIGN_ENABLE_MSI
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI  (1 << 0)
 
+#define KVM_DEV_IRQ_ASSIGN_MSIX_ACTION  (KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX |\
+   KVM_DEV_IRQ_ASSIGN_MASK_MSIX)
+#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX  (1 << 1)
+#define KVM_DEV_IRQ_ASSIGN_MASK_MSIX(1 << 2)
+
 struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
__u16 entry_nr;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7c20e06..034ed43 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -236,13 +236,33 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 * now, the kvm state is still legal for probably we also have to wait
 * interrupt_work done.
 */
-   disable_irq_nosync(assigned_dev->host_irq);
-   cancel_work_sync(&assigned_dev->interrupt_work);
+   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) {
+   int i;
+   for (i = 0; i < assigned_dev->entries_nr; i++)
+   disable_irq_nosync(assigned_dev->
+  host_msix_entries[i].vector);
+
+   cancel_work_sync(&assigned_dev->interrupt_work);
 
-   free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+   for (i = 0; i < assigned_dev->entries_nr; i++)
+   free_irq(assigned_dev->host_msix_entries[i].vector,
+(void *)assigned_dev);
 
-   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
-   pci_disable_msi(assigned_dev->dev);
+   assigned_dev->entries_nr = 0;
+   kfree(assigned_dev->host_msix_entries);
+   kfree(assigned_dev->guest_msix_entries);
+   pci_disable_msix(assigned_dev->dev);
+   } else {
+   /* Deal with MSI and INTx */
+   disable_irq_nosync(assigned_dev->host_irq);
+   cancel_work_sync(&assigned_dev->interrupt_work);
+
+   free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+
+   if (assigned_dev->irq_requested_type &
+   KVM_ASSIGNED_DEV_HOST_MSI)
+   pci_disable_msi(assigned_dev->dev);
+   }
 
assigned_dev->irq_requested_type = 0;
 }
@@ -373,6 +393,60 @@ static int assigned_device_update_msi(struct kvm *kvm,
 }
 #endif
 
+#ifdef __KVM_HAVE_MSIX
+static int assigned_device_update

[PATCH 25/43] KVM: ia64: Map in SN2 RTC registers to the VMM module

2009-05-18 Thread Avi Kivity
From: Jes Sorensen 

On SN2, map in the SN2 RTC registers to the VMM module, needed for ITC
emulation.

Signed-off-by: Jes Sorensen 
Acked-by: Xiantao Zhang 
Signed-off-by: Avi Kivity 
---
 arch/ia64/include/asm/kvm_host.h |2 +
 arch/ia64/include/asm/pgtable.h  |2 +
 arch/ia64/kvm/kvm-ia64.c |   43 +
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 5608488..1243995 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -371,6 +371,7 @@ struct kvm_vcpu_arch {
int last_run_cpu;
int vmm_tr_slot;
int vm_tr_slot;
+   int sn_rtc_tr_slot;
 
 #define KVM_MP_STATE_RUNNABLE  0
 #define KVM_MP_STATE_UNINITIALIZED 1
@@ -465,6 +466,7 @@ struct kvm_arch {
unsigned long   vmm_init_rr;
 
int online_vcpus;
+   int is_sn2;
 
struct kvm_ioapic *vioapic;
struct kvm_vm_stat stat;
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 7a9bff4..0a9cc73 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -146,6 +146,8 @@
 #define PAGE_GATE  __pgprot(__ACCESS_BITS | _PAGE_PL_0 | _PAGE_AR_X_RX)
 #define PAGE_KERNEL__pgprot(__DIRTY_BITS  | _PAGE_PL_0 | _PAGE_AR_RWX)
 #define PAGE_KERNELRX  __pgprot(__ACCESS_BITS | _PAGE_PL_0 | _PAGE_AR_RX)
+#define PAGE_KERNEL_UC __pgprot(__DIRTY_BITS  | _PAGE_PL_0 | _PAGE_AR_RWX | \
+_PAGE_MA_UC)
 
 # ifndef __ASSEMBLY__
 
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index acf43ec..14a3fab 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -41,6 +41,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "misc.h"
 #include "vti.h"
@@ -119,8 +122,7 @@ void kvm_arch_hardware_enable(void *garbage)
unsigned long saved_psr;
int slot;
 
-   pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base),
-   PAGE_KERNEL));
+   pte = pte_val(mk_pte_phys(__pa(kvm_vmm_base), PAGE_KERNEL));
local_irq_save(saved_psr);
slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT);
local_irq_restore(saved_psr);
@@ -425,6 +427,23 @@ static int handle_switch_rr6(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
return 1;
 }
 
+static int kvm_sn2_setup_mappings(struct kvm_vcpu *vcpu)
+{
+   unsigned long pte, rtc_phys_addr, map_addr;
+   int slot;
+
+   map_addr = KVM_VMM_BASE + (1UL << KVM_VMM_SHIFT);
+   rtc_phys_addr = LOCAL_MMR_OFFSET | SH_RTC;
+   pte = pte_val(mk_pte_phys(rtc_phys_addr, PAGE_KERNEL_UC));
+   slot = ia64_itr_entry(0x3, map_addr, pte, PAGE_SHIFT);
+   vcpu->arch.sn_rtc_tr_slot = slot;
+   if (slot < 0) {
+   printk(KERN_ERR "Mayday mayday! RTC mapping failed!\n");
+   slot = 0;
+   }
+   return slot;
+}
+
 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 {
 
@@ -563,18 +582,29 @@ static int kvm_insert_vmm_mapping(struct kvm_vcpu *vcpu)
if (r < 0)
goto out;
vcpu->arch.vm_tr_slot = r;
+
+#if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
+   if (kvm->arch.is_sn2) {
+   r = kvm_sn2_setup_mappings(vcpu);
+   if (r < 0)
+   goto out;
+   }
+#endif
+
r = 0;
 out:
return r;
-
 }
 
 static void kvm_purge_vmm_mapping(struct kvm_vcpu *vcpu)
 {
-
+   struct kvm *kvm = vcpu->kvm;
ia64_ptr_entry(0x3, vcpu->arch.vmm_tr_slot);
ia64_ptr_entry(0x3, vcpu->arch.vm_tr_slot);
-
+#if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
+   if (kvm->arch.is_sn2)
+   ia64_ptr_entry(0x3, vcpu->arch.sn_rtc_tr_slot);
+#endif
 }
 
 static int kvm_vcpu_pre_transition(struct kvm_vcpu *vcpu)
@@ -800,6 +830,9 @@ struct  kvm *kvm_arch_create_vm(void)
 
if (IS_ERR(kvm))
return ERR_PTR(-ENOMEM);
+
+   kvm->arch.is_sn2 = ia64_platform_is("sn2");
+
kvm_init_vm(kvm);
 
kvm->arch.online_vcpus = 0;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 32/43] KVM: x86: Ignore reads to EVNTSEL MSRs

2009-05-18 Thread Avi Kivity
From: Amit Shah 

We ignore writes to the performance counters and performance event
selector registers already. Kaspersky antivirus reads the eventsel
MSR causing it to crash with the current behaviour.

Return 0 as data when the eventsel registers are read to stop the
crash.

Signed-off-by: Amit Shah 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a27a64c..0131b5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -891,6 +891,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
case MSR_IA32_LASTINTFROMIP:
case MSR_IA32_LASTINTTOIP:
case MSR_VM_HSAVE_PA:
+   case MSR_P6_EVNTSEL0:
+   case MSR_P6_EVNTSEL1:
data = 0;
break;
case MSR_MTRRcap:
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/43] KVM: consolidate ioapic/ipi interrupt delivery logic

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
apic_send_ipi() to figure out ipi destination instead of reimplementing
the logic.

Signed-off-by: Gleb Natapov 
Signed-off-by: Marcelo Tosatti 
---
 arch/ia64/kvm/kvm-ia64.c |8 +
 arch/ia64/kvm/lapic.h|3 ++
 arch/x86/kvm/lapic.c |   69 --
 arch/x86/kvm/lapic.h |2 +
 include/linux/kvm_host.h |5 ---
 virt/kvm/ioapic.c|5 ++-
 virt/kvm/ioapic.h|   10 --
 virt/kvm/irq_comm.c  |   76 +
 8 files changed, 71 insertions(+), 107 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 99d6d17..8eea9cb 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1852,6 +1852,14 @@ struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm 
*kvm, u8 vector,
return lvcpu;
 }
 
+int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+   int short_hand, int dest, int dest_mode)
+{
+   return (dest_mode == 0) ?
+   kvm_apic_match_physical_addr(target, dest) :
+   kvm_apic_match_logical_addr(target, dest);
+}
+
 static int find_highest_bits(int *dat)
 {
u32  bits, bitnum;
diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index cbcfaa6..31602e7 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -20,6 +20,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
+int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+   int short_hand, int dest, int dest_mode);
+bool kvm_apic_present(struct kvm_vcpu *vcpu);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig);
 
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a42f968..998862a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -260,7 +260,7 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
 {
-   return kvm_apic_id(apic) == dest;
+   return dest == 0xff || kvm_apic_id(apic) == dest;
 }
 
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
@@ -289,37 +289,34 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, 
u8 mda)
return result;
 }
 
-static int apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
   int short_hand, int dest, int dest_mode)
 {
int result = 0;
struct kvm_lapic *target = vcpu->arch.apic;
 
apic_debug("target %p, source %p, dest 0x%x, "
-  "dest_mode 0x%x, short_hand 0x%x",
+  "dest_mode 0x%x, short_hand 0x%x\n",
   target, source, dest, dest_mode, short_hand);
 
ASSERT(!target);
switch (short_hand) {
case APIC_DEST_NOSHORT:
-   if (dest_mode == 0) {
+   if (dest_mode == 0)
/* Physical mode. */
-   if ((dest == 0xFF) || (dest == kvm_apic_id(target)))
-   result = 1;
-   } else
+   result = kvm_apic_match_physical_addr(target, dest);
+   else
/* Logical mode. */
result = kvm_apic_match_logical_addr(target, dest);
break;
case APIC_DEST_SELF:
-   if (target == source)
-   result = 1;
+   result = (target == source);
break;
case APIC_DEST_ALLINC:
result = 1;
break;
case APIC_DEST_ALLBUT:
-   if (target != source)
-   result = 1;
+   result = (target != source);
break;
default:
printk(KERN_WARNING "Bad dest shorthand value %x\n",
@@ -492,38 +489,26 @@ static void apic_send_ipi(struct kvm_lapic *apic)
unsigned int delivery_mode = icr_low & APIC_MODE_MASK;
unsigned int vector = icr_low & APIC_VECTOR_MASK;
 
-   struct kvm_vcpu *target;
-   struct kvm_vcpu *vcpu;
-   DECLARE_BITMAP(lpr_map, KVM_MAX_VCPUS);
+   DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
int i;
 
-   bitmap_zero(lpr_map, KVM_MAX_VCPUS);
apic_debug("icr_high 0x%x, icr_low 0x%x, "
   "short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, "
   "dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n",
   icr_high, icr_low, short_hand, dest,
   trig_mode, level, dest_mode, delivery_mode, vector);
 
-   for (i = 0; i < KVM_MAX_VCPUS; i++)

[PATCH 20/43] KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions.

Signed-off-by: Gleb Natapov 
Signed-off-by: Marcelo Tosatti 
---
 arch/ia64/include/asm/kvm_host.h |1 -
 arch/ia64/kvm/kvm-ia64.c |8 +++---
 arch/ia64/kvm/lapic.h|2 +-
 arch/x86/kvm/lapic.c |   47 +++--
 arch/x86/kvm/lapic.h |2 +-
 virt/kvm/ioapic.c|   40 +---
 virt/kvm/irq_comm.c  |1 +
 7 files changed, 42 insertions(+), 59 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 4542651..5608488 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -585,7 +585,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
-static inline void kvm_inject_nmi(struct kvm_vcpu *vcpu) {}
 #endif /* __ASSEMBLY__*/
 
 #endif
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 774f0d7..99d6d17 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -292,13 +292,13 @@ static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, 
uint64_t dm,
 {
switch (dm) {
case SAPIC_FIXED:
-   kvm_apic_set_irq(vcpu, vector, 0);
+   kvm_apic_set_irq(vcpu, vector, dm, 0);
break;
case SAPIC_NMI:
-   kvm_apic_set_irq(vcpu, 2, 0);
+   kvm_apic_set_irq(vcpu, 2, dm, 0);
break;
case SAPIC_EXTINT:
-   kvm_apic_set_irq(vcpu, 0, 0);
+   kvm_apic_set_irq(vcpu, 0, dm, 0);
break;
case SAPIC_INIT:
case SAPIC_PMI:
@@ -1813,7 +1813,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
put_cpu();
 }
 
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig)
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig)
 {
 
struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index 6d6cbcb..cbcfaa6 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -20,6 +20,6 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig);
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig);
 
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 27ca43e..a42f968 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -196,20 +196,30 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr);
 
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig)
+static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
+int vector, int level, int trig_mode);
+
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
+   int lapic_dmode;
 
-   if (!apic_test_and_set_irr(vec, apic)) {
-   /* a new pending irq is set in IRR */
-   if (trig)
-   apic_set_vector(vec, apic->regs + APIC_TMR);
-   else
-   apic_clear_vector(vec, apic->regs + APIC_TMR);
-   kvm_vcpu_kick(apic->vcpu);
-   return 1;
+   switch (dmode) {
+   case IOAPIC_LOWEST_PRIORITY:
+   lapic_dmode = APIC_DM_LOWEST;
+   break;
+   case IOAPIC_FIXED:
+   lapic_dmode = APIC_DM_FIXED;
+   break;
+   case IOAPIC_NMI:
+   lapic_dmode = APIC_DM_NMI;
+   break;
+   default:
+   printk(KERN_DEBUG"Ignoring delivery mode %d\n", dmode);
+   return 0;
+   break;
}
-   return 0;
+   return __apic_accept_irq(apic, lapic_dmode, vec, 1, trig);
 }
 
 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
@@ -327,7 +337,7 @@ static int apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 int vector, int level, int trig_mode)
 {
-   int orig_irr, result = 0;
+   int result = 0;
struct kvm_vcpu *vcpu = apic->vcpu;
 
switch (delivery_mode) {
@@ -337,10 +347,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
if (unlikely(!apic_enabled(apic)))
break;
 
-   orig_irr = apic_test_and_set_irr(vector, apic);
-   if (orig_irr && trig_mode) {
-   apic_debug("level trig mode repeatedly for vector %d",
-  vector);
+   result = 

[PATCH 23/43] KVM: change the way how lowest priority vcpu is calculated

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

The new way does not require additional loop over vcpus to calculate
the one with lowest priority as one is chosen during delivery bitmap
construction.

Signed-off-by: Gleb Natapov 
Signed-off-by: Marcelo Tosatti 
---
 arch/ia64/kvm/kvm-ia64.c|   15 +---
 arch/ia64/kvm/lapic.h   |1 +
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/lapic.c|   43 +-
 virt/kvm/ioapic.h   |3 +-
 virt/kvm/irq_comm.c |   19 ++---
 6 files changed, 22 insertions(+), 61 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8eea9cb..1887a93 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1836,20 +1836,9 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, 
u8 mda)
return 0;
 }
 
-struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-  unsigned long *bitmap)
+int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 {
-   struct kvm_vcpu *lvcpu = kvm->vcpus[0];
-   int i;
-
-   for (i = 1; i < kvm->arch.online_vcpus; i++) {
-   if (!kvm->vcpus[i])
-   continue;
-   if (lvcpu->arch.xtp > kvm->vcpus[i]->arch.xtp)
-   lvcpu = kvm->vcpus[i];
-   }
-
-   return lvcpu;
+   return vcpu1->arch.xtp - vcpu2->arch.xtp;
 }
 
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index 31602e7..e42109e 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -22,6 +22,7 @@ int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 
dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, int dest, int dest_mode);
+int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
 bool kvm_apic_present(struct kvm_vcpu *vcpu);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig);
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f0faf58..4627627 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -286,6 +286,7 @@ struct kvm_vcpu_arch {
u64 shadow_efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
+   int32_t apic_arb_prio;
int mp_state;
int sipi_vector;
u64 ia32_misc_enable_msr;
@@ -400,7 +401,6 @@ struct kvm_arch{
struct hlist_head irq_ack_notifier_list;
int vapics_in_nmi_mode;
 
-   int round_robin_prev_vcpu;
unsigned int tss_addr;
struct page *apic_access_page;
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 998862a..814466f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -338,8 +338,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
struct kvm_vcpu *vcpu = apic->vcpu;
 
switch (delivery_mode) {
-   case APIC_DM_FIXED:
case APIC_DM_LOWEST:
+   vcpu->arch.apic_arb_prio++;
+   case APIC_DM_FIXED:
/* FIXME add logic for vcpu on reset */
if (unlikely(!apic_enabled(apic)))
break;
@@ -416,43 +417,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
return result;
 }
 
-static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
-  unsigned long *bitmap)
+int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 {
-   int last;
-   int next;
-   struct kvm_lapic *apic = NULL;
-
-   last = kvm->arch.round_robin_prev_vcpu;
-   next = last;
-
-   do {
-   if (++next == KVM_MAX_VCPUS)
-   next = 0;
-   if (kvm->vcpus[next] == NULL || !test_bit(next, bitmap))
-   continue;
-   apic = kvm->vcpus[next]->arch.apic;
-   if (apic && apic_enabled(apic))
-   break;
-   apic = NULL;
-   } while (next != last);
-   kvm->arch.round_robin_prev_vcpu = next;
-
-   if (!apic)
-   printk(KERN_DEBUG "vcpu not ready for apic_round_robin\n");
-
-   return apic;
-}
-
-struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-   unsigned long *bitmap)
-{
-   struct kvm_lapic *apic;
-
-   apic = kvm_apic_round_robin(kvm, vector, bitmap);
-   if (apic)
-   return apic->vcpu;
-   return NULL;
+   return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
 }
 
 static void apic_set_eoi(struct kvm_lapic *apic)
@@ -908,6 +875,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
vcpu->arch.apic_base |= MSR

[PATCH 43/43] KVM: Fix interrupt unhalting a vcpu when it shouldn't

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/kvm-ia64.c|6 ++
 arch/powerpc/kvm/powerpc.c  |6 ++
 arch/s390/kvm/interrupt.c   |6 ++
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/svm.c  |   10 ++
 arch/x86/kvm/vmx.c  |8 +++-
 arch/x86/kvm/x86.c  |5 +
 include/linux/kvm_host.h|1 +
 virt/kvm/kvm_main.c |3 ++-
 9 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d2a90fd..3bf0a34 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1963,6 +1963,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
return 0;
 }
 
+int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+   /* do real check here */
+   return 1;
+}
+
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.timer_fired;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 9057335..2cf915e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -41,6 +41,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
return !!(v->arch.pending_exceptions);
 }
 
+int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+   /* do real check here */
+   return 1;
+}
+
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
return !(v->arch.msr & MSR_WE);
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 0189356..4ed4c3a 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -318,6 +318,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
return rc;
 }
 
+int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+   /* do real check here */
+   return 1;
+}
+
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
return 0;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4627627..8351c4d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -521,7 +521,7 @@ struct kvm_x86_ops {
void (*inject_pending_irq)(struct kvm_vcpu *vcpu);
void (*inject_pending_vectors)(struct kvm_vcpu *vcpu,
   struct kvm_run *run);
-
+   int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
int (*get_mt_mask_shift)(void);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index aa528db..de74104 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2270,6 +2270,15 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
vmcb->control.intercept_cr_write |= INTERCEPT_CR8_MASK;
 }
 
+static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   struct vmcb *vmcb = svm->vmcb;
+   return (vmcb->save.rflags & X86_EFLAGS_IF) &&
+   !(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) &&
+   (svm->vcpu.arch.hflags & HF_GIF_MASK);
+}
+
 static void svm_intr_assist(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2649,6 +2658,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.exception_injected = svm_exception_injected,
.inject_pending_irq = svm_intr_assist,
.inject_pending_vectors = do_interrupt_requests,
+   .interrupt_allowed = svm_interrupt_allowed,
 
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index da6461d..b9e06b0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2490,6 +2490,12 @@ static void vmx_update_window_states(struct kvm_vcpu 
*vcpu)
 GUEST_INTR_STATE_MOV_SS)));
 }
 
+static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+   vmx_update_window_states(vcpu);
+   return vcpu->arch.interrupt_window_open;
+}
+
 static void do_interrupt_requests(struct kvm_vcpu *vcpu,
   struct kvm_run *kvm_run)
 {
@@ -3691,7 +3697,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.exception_injected = vmx_exception_injected,
.inject_pending_irq = vmx_intr_assist,
.inject_pending_vectors = do_interrupt_requests,
-
+   .interrupt_allowed = vmx_interrupt_allowed,
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,
.get_mt_mask_shift = vmx_get_mt_mask_shift,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa8b585..ab61ea6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4471,3 +4471,8 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0);
put_cpu

[PATCH 40/43] KVM: VMX: Zero ept module parameter if ept is not present

2009-05-18 Thread Avi Kivity
Allows reading back hardware capability.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b97c8e..2f65120 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -265,7 +265,7 @@ static inline int cpu_has_vmx_ept(void)
 
 static inline int vm_need_ept(void)
 {
-   return (cpu_has_vmx_ept() && enable_ept);
+   return enable_ept;
 }
 
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
@@ -1205,6 +1205,9 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
if (!cpu_has_vmx_vpid())
enable_vpid = 0;
 
+   if (!cpu_has_vmx_ept())
+   enable_ept = 0;
+
min = 0;
 #ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 42/43] KVM: Timer event should not unconditionally unhalt vcpu.

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

Currently timer events are processed before entering guest mode. Move it
to main vcpu event loop since timer events should be processed even while
vcpu is halted.  Timer may cause interrupt/nmi to be injected and only then
vcpu will be unhalted.

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/kvm-ia64.c |6 ++--
 arch/x86/kvm/x86.c   |   57 +++--
 virt/kvm/kvm_main.c  |5 ++-
 3 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 4623a90..d2a90fd 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -488,10 +488,10 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
hrtimer_cancel(p_ht);
vcpu->arch.ht_active = 0;
 
-   if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests))
+   if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests) ||
+   kvm_cpu_has_pending_timer(vcpu))
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
-   vcpu->arch.mp_state =
-   KVM_MP_STATE_RUNNABLE;
+   vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
return -EINTR;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0131b5f..aa8b585 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3129,9 +3129,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
}
}
 
-   clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
-   kvm_inject_pending_timer_irqs(vcpu);
-
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -3231,6 +3228,7 @@ out:
return r;
 }
 
+
 static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
int r;
@@ -3257,29 +3255,42 @@ static int __vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_vcpu_block(vcpu);
down_read(&vcpu->kvm->slots_lock);
if (test_and_clear_bit(KVM_REQ_UNHALT, &vcpu->requests))
-   if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
+   {
+   switch(vcpu->arch.mp_state) {
+   case KVM_MP_STATE_HALTED:
vcpu->arch.mp_state =
-   KVM_MP_STATE_RUNNABLE;
-   if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
-   r = -EINTR;
+   KVM_MP_STATE_RUNNABLE;
+   case KVM_MP_STATE_RUNNABLE:
+   break;
+   case KVM_MP_STATE_SIPI_RECEIVED:
+   default:
+   r = -EINTR;
+   break;
+   }
+   }
}
 
-   if (r > 0) {
-   if (dm_request_for_irq_injection(vcpu, kvm_run)) {
-   r = -EINTR;
-   kvm_run->exit_reason = KVM_EXIT_INTR;
-   ++vcpu->stat.request_irq_exits;
-   }
-   if (signal_pending(current)) {
-   r = -EINTR;
-   kvm_run->exit_reason = KVM_EXIT_INTR;
-   ++vcpu->stat.signal_exits;
-   }
-   if (need_resched()) {
-   up_read(&vcpu->kvm->slots_lock);
-   kvm_resched(vcpu);
-   down_read(&vcpu->kvm->slots_lock);
-   }
+   if (r <= 0)
+   break;
+
+   clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
+   if (kvm_cpu_has_pending_timer(vcpu))
+   kvm_inject_pending_timer_irqs(vcpu);
+
+   if (dm_request_for_irq_injection(vcpu, kvm_run)) {
+   r = -EINTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;
+   ++vcpu->stat.request_irq_exits;
+   }
+   if (signal_pending(current)) {
+   r = -EINTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;
+   ++vcpu->stat.signal_exits;
+   }
+   if (need_resched()) {
+   up_read(&vcpu->kvm->slots_lock);
+   kvm_resched(vcpu);
+   down_read(&vcpu->kvm->slots_lock);
}
}
 
diff --git a/vir

[PATCH 24/43] KVM: APIC: get rid of deliver_bitmask

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

Deliver interrupt during destination matching loop.

Signed-off-by: Gleb Natapov 
Acked-by: Xiantao Zhang 
Signed-off-by: Marcelo Tosatti 
---
 arch/ia64/kvm/kvm-ia64.c  |   33 -
 arch/ia64/kvm/lapic.h |4 +-
 arch/x86/kvm/lapic.c  |   59 ++---
 arch/x86/kvm/lapic.h  |3 +-
 include/linux/kvm_types.h |   10 ++
 virt/kvm/ioapic.c |   57 ++--
 virt/kvm/ioapic.h |6 +--
 virt/kvm/irq_comm.c   |   71 ++--
 8 files changed, 108 insertions(+), 135 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 1887a93..acf43ec 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -283,6 +283,18 @@ static int handle_sal_call(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 
 }
 
+static int __apic_accept_irq(struct kvm_vcpu *vcpu, uint64_t vector)
+{
+   struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
+
+   if (!test_and_set_bit(vector, &vpd->irr[0])) {
+   vcpu->arch.irq_new_pending = 1;
+   kvm_vcpu_kick(vcpu);
+   return 1;
+   }
+   return 0;
+}
+
 /*
  *  offset: address offset to IPI space.
  *  value:  deliver value.
@@ -292,20 +304,20 @@ static void vcpu_deliver_ipi(struct kvm_vcpu *vcpu, 
uint64_t dm,
 {
switch (dm) {
case SAPIC_FIXED:
-   kvm_apic_set_irq(vcpu, vector, dm, 0);
break;
case SAPIC_NMI:
-   kvm_apic_set_irq(vcpu, 2, dm, 0);
+   vector = 2;
break;
case SAPIC_EXTINT:
-   kvm_apic_set_irq(vcpu, 0, dm, 0);
+   vector = 0;
break;
case SAPIC_INIT:
case SAPIC_PMI:
default:
printk(KERN_ERR"kvm: Unimplemented Deliver reserved IPI!\n");
-   break;
+   return;
}
+   __apic_accept_irq(vcpu, vector);
 }
 
 static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id,
@@ -1813,17 +1825,9 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
put_cpu();
 }
 
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig)
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 {
-
-   struct vpd *vpd = to_host(vcpu->kvm, vcpu->arch.vpd);
-
-   if (!test_and_set_bit(vec, &vpd->irr[0])) {
-   vcpu->arch.irq_new_pending = 1;
-   kvm_vcpu_kick(vcpu);
-   return 1;
-   }
-   return 0;
+   return __apic_accept_irq(vcpu, irq->vector);
 }
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
@@ -1844,6 +1848,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
kvm_vcpu *vcpu2)
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, int dest, int dest_mode)
 {
+   struct kvm_lapic *target = vcpu->arch.apic;
return (dest_mode == 0) ?
kvm_apic_match_physical_addr(target, dest) :
kvm_apic_match_logical_addr(target, dest);
diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
index e42109e..ee541ce 100644
--- a/arch/ia64/kvm/lapic.h
+++ b/arch/ia64/kvm/lapic.h
@@ -23,7 +23,7 @@ int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 
mda);
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, int dest, int dest_mode);
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
-bool kvm_apic_present(struct kvm_vcpu *vcpu);
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig);
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
+#define kvm_apic_present(x) (true)
 
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 814466f..dd934d2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -199,27 +199,12 @@ EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr);
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 int vector, int level, int trig_mode);
 
-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 dmode, u8 trig)
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
-   int lapic_dmode;
 
-   switch (dmode) {
-   case IOAPIC_LOWEST_PRIORITY:
-   lapic_dmode = APIC_DM_LOWEST;
-   break;
-   case IOAPIC_FIXED:
-   lapic_dmode = APIC_DM_FIXED;
-   break;
-   case IOAPIC_NMI:
-   lapic_dmode = APIC_DM_NMI;
-   break;
-   default:
-   printk(KERN_DEBUG"Ignoring delivery mode %d\n", dmode);
-   return 0;
-   break;
-   }
-   return __apic_accept_irq(apic, lapic_dmode, vec, 1, trig);
+   return __apic_accept_irq(apic, irq->deli

[PATCH 39/43] KVM: VMX: Zero the vpid module parameter if vpid is not supported

2009-05-18 Thread Avi Kivity
This allows reading back how the hardware is configured.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f4b6c4b..9b97c8e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1202,6 +1202,9 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
  vmx_capability.ept, vmx_capability.vpid);
}
 
+   if (!cpu_has_vmx_vpid())
+   enable_vpid = 0;
+
min = 0;
 #ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
@@ -2082,7 +2085,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
int vpid;
 
vmx->vpid = 0;
-   if (!enable_vpid || !cpu_has_vmx_vpid())
+   if (!enable_vpid)
return;
spin_lock(&vmx_vpid_lock);
vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 37/43] KVM: VMX: Simplify module parameter names

2009-05-18 Thread Avi Kivity
Instead of 'enable_vpid=1', use a simple 'vpid=1'.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85f4fd5..a69ba6b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -42,13 +42,13 @@ static int bypass_guest_pf = 1;
 module_param(bypass_guest_pf, bool, S_IRUGO);
 
 static int enable_vpid = 1;
-module_param(enable_vpid, bool, 0444);
+module_param_named(vpid, enable_vpid, bool, 0444);
 
 static int flexpriority_enabled = 1;
-module_param(flexpriority_enabled, bool, S_IRUGO);
+module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO);
 
 static int enable_ept = 1;
-module_param(enable_ept, bool, S_IRUGO);
+module_param_named(ept, enable_ept, bool, S_IRUGO);
 
 static int emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 41/43] KVM: VMX: Fold vm_need_ept() into callers

2009-05-18 Thread Avi Kivity
Trivial.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   33 ++---
 1 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2f65120..da6461d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -263,11 +263,6 @@ static inline int cpu_has_vmx_ept(void)
SECONDARY_EXEC_ENABLE_EPT);
 }
 
-static inline int vm_need_ept(void)
-{
-   return enable_ept;
-}
-
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
return ((cpu_has_vmx_virtualize_apic_accesses()) &&
@@ -382,7 +377,7 @@ static inline void ept_sync_global(void)
 
 static inline void ept_sync_context(u64 eptp)
 {
-   if (vm_need_ept()) {
+   if (enable_ept) {
if (cpu_has_vmx_invept_context())
__invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0);
else
@@ -392,7 +387,7 @@ static inline void ept_sync_context(u64 eptp)
 
 static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa)
 {
-   if (vm_need_ept()) {
+   if (enable_ept) {
if (cpu_has_vmx_invept_individual_addr())
__invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR,
eptp, gpa);
@@ -491,7 +486,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
}
if (vcpu->arch.rmode.active)
eb = ~0;
-   if (vm_need_ept())
+   if (enable_ept)
eb &= ~(1u << PF_VECTOR); /* bypass_guest_pf = 0 */
vmcs_write32(EXCEPTION_BITMAP, eb);
 }
@@ -1502,7 +1497,7 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
vpid_sync_vcpu_all(to_vmx(vcpu));
-   if (vm_need_ept())
+   if (enable_ept)
ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa));
 }
 
@@ -1587,7 +1582,7 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
}
 #endif
 
-   if (vm_need_ept())
+   if (enable_ept)
ept_update_paging_mode_cr0(&hw_cr0, cr0, vcpu);
 
vmcs_writel(CR0_READ_SHADOW, cr0);
@@ -1616,7 +1611,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)
u64 eptp;
 
guest_cr3 = cr3;
-   if (vm_need_ept()) {
+   if (enable_ept) {
eptp = construct_eptp(cr3);
vmcs_write64(EPT_POINTER, eptp);
ept_sync_context(eptp);
@@ -1637,7 +1632,7 @@ static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
 
vcpu->arch.cr4 = cr4;
-   if (vm_need_ept())
+   if (enable_ept)
ept_update_paging_mode_cr4(&hw_cr4, vcpu);
 
vmcs_writel(CR4_READ_SHADOW, cr4);
@@ -1999,7 +1994,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
pfn_t identity_map_pfn;
u32 tmp;
 
-   if (!vm_need_ept())
+   if (!enable_ept)
return 1;
if (unlikely(!kvm->arch.ept_identity_pagetable)) {
printk(KERN_ERR "EPT: identity-mapping pagetable "
@@ -2163,7 +2158,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
CPU_BASED_CR8_LOAD_EXITING;
 #endif
}
-   if (!vm_need_ept())
+   if (!enable_ept)
exec_control |= CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_CR3_LOAD_EXITING  |
CPU_BASED_INVLPG_EXITING;
@@ -2176,7 +2171,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
if (vmx->vpid == 0)
exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
-   if (!vm_need_ept())
+   if (!enable_ept)
exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
@@ -2637,7 +2632,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
if (is_page_fault(intr_info)) {
/* EPT won't cause page fault directly */
-   if (vm_need_ept())
+   if (enable_ept)
BUG();
cr2 = vmcs_readl(EXIT_QUALIFICATION);
KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
@@ -3187,7 +3182,7 @@ static int vmx_handle_exit(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 
/* Access CR3 don't cause VMExit in paging mode, so we need
 * to sync with guest real CR3. */
-   if (vm_need_ept() && is_paging(vcpu)) {
+   if (enable_ept && is_paging(vcpu)) {
vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
ept_load_pdptrs(vcpu);
}
@@ -3602,7 +3597,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id

[PATCH 35/43] KVM: VMX: Make module parameters readable

2009-05-18 Thread Avi Kivity
Useful to see how the module was loaded.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2c0a2ed..469787c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -39,19 +39,19 @@ MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
 static int bypass_guest_pf = 1;
-module_param(bypass_guest_pf, bool, 0);
+module_param(bypass_guest_pf, bool, S_IRUGO);
 
 static int enable_vpid = 1;
-module_param(enable_vpid, bool, 0);
+module_param(enable_vpid, bool, 0444);
 
 static int flexpriority_enabled = 1;
-module_param(flexpriority_enabled, bool, 0);
+module_param(flexpriority_enabled, bool, S_IRUGO);
 
 static int enable_ept = 1;
-module_param(enable_ept, bool, 0);
+module_param(enable_ept, bool, S_IRUGO);
 
 static int emulate_invalid_guest_state = 0;
-module_param(emulate_invalid_guest_state, bool, 0);
+module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
 struct vmcs {
u32 revision_id;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 30/43] KVM: Device assignment framework rework

2009-05-18 Thread Avi Kivity
From: Sheng Yang 

After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:

1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.

2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.

This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).

[avi: replace homemade bitcount() by hweight_long()]

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Sheng Yang 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c   |1 +
 include/linux/kvm.h  |   26 ++-
 include/linux/kvm_host.h |5 -
 virt/kvm/kvm_main.c  |  486 --
 4 files changed, 276 insertions(+), 242 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cac6e63..a27a64c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1022,6 +1022,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_SYNC_MMU:
case KVM_CAP_REINJECT_CONTROL:
case KVM_CAP_IRQ_INJECT_STATUS:
+   case KVM_CAP_ASSIGN_DEV_IRQ:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 640835e..644e3a9 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -412,6 +412,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_MSIX
 #define KVM_CAP_DEVICE_MSIX 28
 #endif
+#define KVM_CAP_ASSIGN_DEV_IRQ 29
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
 
@@ -485,8 +486,10 @@ struct kvm_irq_routing {
 #define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
   struct kvm_assigned_pci_dev)
 #define KVM_SET_GSI_ROUTING   _IOW(KVMIO, 0x6a, struct kvm_irq_routing)
+/* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */
 #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
struct kvm_assigned_irq)
+#define KVM_ASSIGN_DEV_IRQ_IOW(KVMIO, 0x70, struct kvm_assigned_irq)
 #define KVM_REINJECT_CONTROL  _IO(KVMIO, 0x71)
 #define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
 struct kvm_assigned_pci_dev)
@@ -494,6 +497,7 @@ struct kvm_irq_routing {
_IOW(KVMIO, 0x73, struct kvm_assigned_msix_nr)
 #define KVM_ASSIGN_SET_MSIX_ENTRY \
_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
+#define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 
 /*
  * ioctls for vcpu fds
@@ -584,6 +588,8 @@ struct kvm_debug_guest {
 #define KVM_TRC_STLB_INVAL   (KVM_TRC_HANDLER + 0x18)
 #define KVM_TRC_PPC_INSTR(KVM_TRC_HANDLER + 0x19)
 
+#define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
+
 struct kvm_assigned_pci_dev {
__u32 assigned_dev_id;
__u32 busnr;
@@ -594,6 +600,17 @@ struct kvm_assigned_pci_dev {
};
 };
 
+#define KVM_DEV_IRQ_HOST_INTX(1 << 0)
+#define KVM_DEV_IRQ_HOST_MSI (1 << 1)
+#define KVM_DEV_IRQ_HOST_MSIX(1 << 2)
+
+#define KVM_DEV_IRQ_GUEST_INTX   (1 << 8)
+#define KVM_DEV_IRQ_GUEST_MSI(1 << 9)
+#define KVM_DEV_IRQ_GUEST_MSIX   (1 << 10)
+
+#define KVM_DEV_IRQ_HOST_MASK   0x00ff
+#define KVM_DEV_IRQ_GUEST_MASK   0xff00
+
 struct kvm_assigned_irq {
__u32 assigned_dev_id;
__u32 host_irq;
@@ -609,15 +626,6 @@ struct kvm_assigned_irq {
};
 };
 
-#define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
-
-#define KVM_DEV_IRQ_ASSIGN_MSI_ACTION  KVM_DEV_IRQ_ASSIGN_ENABLE_MSI
-#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI  (1 << 0)
-
-#define KVM_DEV_IRQ_ASSIGN_MSIX_ACTION  (KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX |\
-   KVM_DEV_IRQ_ASSIGN_MASK_MSIX)
-#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX  (1 << 1)
-#define KVM_DEV_IRQ_ASSIGN_MASK_MSIX(1 << 2)
 
 struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fb60f31..40e49ed 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -339,11 +339,6 @@ struct kvm_assigned_dev_kernel {
struct msix_entry *host_msix_entries;
int guest_irq;
struct kvm_guest_msix_entry *guest_msix_entries;
-#define KVM_ASSIGNED_DEV_GUEST_INTX(1 << 0)
-#define KVM_ASSIGNED_DEV_GUEST_MSI (1 << 1)
-#define KVM_ASSIGNED_DEV_HOST_INTX (1 << 8)
-#define KVM_ASSIGNED_DEV_HOST_MSI  (1 << 9)
-#define KVM_ASSIGNED_DEV_MSIX  ((1 << 2) | (1 << 10))
unsigned long irq_requested_type;
int irq_source_id;
int flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 034ed43..231ca3c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_

[PATCH 36/43] KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit()

2009-05-18 Thread Avi Kivity
It is a static vmx-specific function.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 469787c..85f4fd5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3162,7 +3162,7 @@ static const int kvm_vmx_max_exit_handlers =
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
  */
-static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+static int vmx_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
u32 exit_reason = vmcs_read32(VM_EXIT_REASON);
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -3681,7 +3681,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.tlb_flush = vmx_flush_tlb,
 
.run = vmx_vcpu_run,
-   .handle_exit = kvm_handle_exit,
+   .handle_exit = vmx_handle_exit,
.skip_emulated_instruction = skip_emulated_instruction,
.patch_hypercall = vmx_patch_hypercall,
.get_irq = vmx_get_irq,
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 29/43] KVM: make 'lapic_timer_ops' and 'kpit_ops' static

2009-05-18 Thread Avi Kivity
From: Hannes Eder 

Fix this sparse warnings:
  arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not 
declared. Should it be static?
  arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. 
Should it be static?

Signed-off-by: Hannes Eder 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/i8254.c |2 +-
 arch/x86/kvm/lapic.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4e2e3f2..cf09bb6 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -265,7 +265,7 @@ static bool kpit_is_periodic(struct kvm_timer *ktimer)
return ps->is_periodic;
 }
 
-struct kvm_timer_ops kpit_ops = {
+static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dd934d2..4d76bb6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -913,7 +913,7 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
-struct kvm_timer_ops lapic_timer_ops = {
+static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
 
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/43] KVM: ia64: SN2 adjust emulated ITC frequency to match RTC frequency

2009-05-18 Thread Avi Kivity
From: Jes Sorensen 

On SN2 do not pass down the real ITC frequency, but rather patch the
values to match the SN2 RTC frequency.

Signed-off-by: Jes Sorensen 
Acked-by: Xiantao Zhang 
Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/kvm_fw.c |   28 +++-
 1 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kvm/kvm_fw.c b/arch/ia64/kvm/kvm_fw.c
index a8ae52e..e4b8231 100644
--- a/arch/ia64/kvm/kvm_fw.c
+++ b/arch/ia64/kvm/kvm_fw.c
@@ -21,6 +21,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "vti.h"
 #include "misc.h"
@@ -188,12 +191,35 @@ static struct ia64_pal_retval pal_freq_base(struct 
kvm_vcpu *vcpu)
return result;
 }
 
-static struct ia64_pal_retval pal_freq_ratios(struct kvm_vcpu *vcpu)
+/*
+ * On the SGI SN2, the ITC isn't stable. Emulation backed by the SN2
+ * RTC is used instead. This function patches the ratios from SAL
+ * to match the RTC before providing them to the guest.
+ */
+static void sn2_patch_itc_freq_ratios(struct ia64_pal_retval *result)
 {
+   struct pal_freq_ratio *ratio;
+   unsigned long sal_freq, sal_drift, factor;
+
+   result->status = ia64_sal_freq_base(SAL_FREQ_BASE_PLATFORM,
+   &sal_freq, &sal_drift);
+   ratio = (struct pal_freq_ratio *)&result->v2;
+   factor = ((sal_freq * 3) + (sn_rtc_cycles_per_second / 2)) /
+   sn_rtc_cycles_per_second;
+
+   ratio->num = 3;
+   ratio->den = factor;
+}
 
+static struct ia64_pal_retval pal_freq_ratios(struct kvm_vcpu *vcpu)
+{
struct ia64_pal_retval result;
 
PAL_CALL(result, PAL_FREQ_RATIOS, 0, 0, 0);
+
+   if (vcpu->kvm->arch.is_sn2)
+   sn2_patch_itc_freq_ratios(&result);
+
return result;
 }
 
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 33/43] KVM: SVM: Remove duplicate code in svm_do_inject_vector()

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

svm_do_inject_vector() reimplements pop_irq().

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |   10 +-
 1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f8510c..5b35ebd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2346,15 +2346,7 @@ static void kvm_reput_irq(struct vcpu_svm *svm)
 
 static void svm_do_inject_vector(struct vcpu_svm *svm)
 {
-   struct kvm_vcpu *vcpu = &svm->vcpu;
-   int word_index = __ffs(vcpu->arch.irq_summary);
-   int bit_index = __ffs(vcpu->arch.irq_pending[word_index]);
-   int irq = word_index * BITS_PER_LONG + bit_index;
-
-   clear_bit(bit_index, &vcpu->arch.irq_pending[word_index]);
-   if (!vcpu->arch.irq_pending[word_index])
-   clear_bit(word_index, &vcpu->arch.irq_summary);
-   svm_inject_irq(svm, irq);
+   svm_inject_irq(svm, pop_irq(&svm->vcpu));
 }
 
 static void do_interrupt_requests(struct kvm_vcpu *vcpu,
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/43] KVM updates for the 2.6.31 merge window (batch 1/4)

2009-05-18 Thread Avi Kivity
Following is my queue for the 2.6.31 merge window that is now approaching.
Please review.

Amit Shah (1):
  KVM: x86: Ignore reads to EVNTSEL MSRs

Avi Kivity (9):
  KVM: VMX: Don't use highmem pages for the msr and pio bitmaps
  KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE
  KVM: VMX: Make module parameters readable
  KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit()
  KVM: VMX: Simplify module parameter names
  KVM: VMX: Annotate module parameters as __read_mostly
  KVM: VMX: Zero the vpid module parameter if vpid is not supported
  KVM: VMX: Zero ept module parameter if ept is not present
  KVM: VMX: Fold vm_need_ept() into callers

Christian Borntraeger (1):
  KVM: declare ioapic functions only on affected hardware

Gleb Natapov (10):
  KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts
  KVM: ioapic/msi interrupt delivery consolidation
  KVM: consolidate ioapic/ipi interrupt delivery logic
  KVM: change the way how lowest priority vcpu is calculated
  KVM: APIC: get rid of deliver_bitmask
  KVM: MMU: do not free active mmu pages in free_mmu_pages()
  KVM: SVM: Remove duplicate code in svm_do_inject_vector()
  KVM: reuse (pop|push)_irq from svm.c in vmx.c
  KVM: Timer event should not unconditionally unhalt vcpu.
  KVM: Fix interrupt unhalting a vcpu when it shouldn't

Hannes Eder (1):
  KVM: make 'lapic_timer_ops' and 'kpit_ops' static

Jes Sorensen (4):
  KVM: ia64: Map in SN2 RTC registers to the VMM module
  KVM: ia64: Create inline function kvm_get_itc() to centralize ITC
reading.
  KVM: ia64: SN2 adjust emulated ITC frequency to match RTC frequency
  KVM: ia64: Drop in SN2 replacement of fast path ITC emulation fault
handler

Joerg Roedel (1):
  KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr

Marcelo Tosatti (4):
  KVM: x86: paravirt skip pit-through-ioapic boot check
  KVM: PIT: remove unused scheduled variable
  KVM: PIT: remove usage of count_load_time for channel 0
  KVM: unify part of generic timer handling

Matt T. Yourst (1):
  KVM: x86: silence preempt warning on kvm_write_guest_time

Sheng Yang (10):
  KVM: Split IOAPIC structure
  KVM: Unify the delivery of IOAPIC and MSI interrupts
  KVM: Change API of kvm_ioapic_get_delivery_bitmask
  KVM: Update intr delivery func to accept unsigned long* bitmap
  KVM: bit ops for deliver_bitmap
  KVM: Ioctls for init MSI-X entry
  KVM: Add MSI-X interrupt injection logic
  KVM: Enable MSI-X for KVM assigned device
  KVM: Merge kvm_ioapic_get_delivery_bitmask into
kvm_get_intr_delivery_bitmask
  KVM: Device assignment framework rework

Yang Zhang (1):
  KVM: ia64: fix compilation error in kvm_get_lowest_prio_vcpu

 arch/ia64/include/asm/kvm_host.h |5 +-
 arch/ia64/include/asm/pgtable.h  |2 +
 arch/ia64/kvm/kvm-ia64.c |  159 +++---
 arch/ia64/kvm/kvm_fw.c   |   28 ++-
 arch/ia64/kvm/lapic.h|6 +-
 arch/ia64/kvm/optvfault.S|   30 ++
 arch/ia64/kvm/vcpu.c |   20 +-
 arch/ia64/kvm/vmm.c  |   12 +-
 arch/powerpc/kvm/powerpc.c   |6 +
 arch/s390/kvm/interrupt.c|6 +
 arch/x86/include/asm/kvm.h   |1 +
 arch/x86/include/asm/kvm_host.h  |4 +-
 arch/x86/kernel/kvm.c|4 +
 arch/x86/kvm/Makefile|2 +-
 arch/x86/kvm/i8254.c |   95 +++---
 arch/x86/kvm/i8254.h |   12 +-
 arch/x86/kvm/kvm_timer.h |   18 +
 arch/x86/kvm/lapic.c |  251 +--
 arch/x86/kvm/lapic.h |   12 +-
 arch/x86/kvm/mmu.c   |8 -
 arch/x86/kvm/paging_tmpl.h   |1 -
 arch/x86/kvm/svm.c   |   43 +--
 arch/x86/kvm/timer.c |   46 +++
 arch/x86/kvm/vmx.c   |  186 ++-
 arch/x86/kvm/x86.c   |   67 +++--
 arch/x86/kvm/x86.h   |   18 +
 include/linux/kvm.h  |   40 +++-
 include/linux/kvm_host.h |   15 +-
 include/linux/kvm_types.h|   27 ++
 virt/kvm/ioapic.c|  153 ++
 virt/kvm/ioapic.h|   27 +--
 virt/kvm/irq_comm.c  |  109 ---
 virt/kvm/kvm_main.c  |  634 +++---
 33 files changed, 1240 insertions(+), 807 deletions(-)
 create mode 100644 arch/x86/kvm/kvm_timer.h
 create mode 100644 arch/x86/kvm/timer.c

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 38/43] KVM: VMX: Annotate module parameters as __read_mostly

2009-05-18 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a69ba6b..f4b6c4b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -38,19 +38,19 @@
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
-static int bypass_guest_pf = 1;
+static int __read_mostly bypass_guest_pf = 1;
 module_param(bypass_guest_pf, bool, S_IRUGO);
 
-static int enable_vpid = 1;
+static int __read_mostly enable_vpid = 1;
 module_param_named(vpid, enable_vpid, bool, 0444);
 
-static int flexpriority_enabled = 1;
+static int __read_mostly flexpriority_enabled = 1;
 module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO);
 
-static int enable_ept = 1;
+static int __read_mostly enable_ept = 1;
 module_param_named(ept, enable_ept, bool, S_IRUGO);
 
-static int emulate_invalid_guest_state = 0;
+static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
 struct vmcs {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 34/43] KVM: reuse (pop|push)_irq from svm.c in vmx.c

2009-05-18 Thread Avi Kivity
From: Gleb Natapov 

The prioritized bit vector manipulation functions are useful in both vmx and
svm.

Signed-off-by: Gleb Natapov 
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |   25 -
 arch/x86/kvm/vmx.c |   17 ++---
 arch/x86/kvm/x86.h |   18 ++
 3 files changed, 24 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5b35ebd..aa528db 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -19,6 +19,7 @@
 #include "irq.h"
 #include "mmu.h"
 #include "kvm_cache_regs.h"
+#include "x86.h"
 
 #include 
 #include 
@@ -132,24 +133,6 @@ static inline u32 svm_has(u32 feat)
return svm_features & feat;
 }
 
-static inline u8 pop_irq(struct kvm_vcpu *vcpu)
-{
-   int word_index = __ffs(vcpu->arch.irq_summary);
-   int bit_index = __ffs(vcpu->arch.irq_pending[word_index]);
-   int irq = word_index * BITS_PER_LONG + bit_index;
-
-   clear_bit(bit_index, &vcpu->arch.irq_pending[word_index]);
-   if (!vcpu->arch.irq_pending[word_index])
-   clear_bit(word_index, &vcpu->arch.irq_summary);
-   return irq;
-}
-
-static inline void push_irq(struct kvm_vcpu *vcpu, u8 irq)
-{
-   set_bit(irq, vcpu->arch.irq_pending);
-   set_bit(irq / BITS_PER_LONG, &vcpu->arch.irq_summary);
-}
-
 static inline void clgi(void)
 {
asm volatile (__ex(SVM_CLGI));
@@ -1116,7 +1099,7 @@ static int pf_interception(struct vcpu_svm *svm, struct 
kvm_run *kvm_run)
if (!irqchip_in_kernel(kvm) &&
is_external_interrupt(exit_int_info)) {
event_injection = true;
-   push_irq(&svm->vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
+   kvm_push_irq(&svm->vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
}
 
fault_address  = svm->vmcb->control.exit_info_2;
@@ -2336,7 +2319,7 @@ static void kvm_reput_irq(struct vcpu_svm *svm)
if ((control->int_ctl & V_IRQ_MASK)
&& !irqchip_in_kernel(svm->vcpu.kvm)) {
control->int_ctl &= ~V_IRQ_MASK;
-   push_irq(&svm->vcpu, control->int_vector);
+   kvm_push_irq(&svm->vcpu, control->int_vector);
}
 
svm->vcpu.arch.interrupt_window_open =
@@ -2346,7 +2329,7 @@ static void kvm_reput_irq(struct vcpu_svm *svm)
 
 static void svm_do_inject_vector(struct vcpu_svm *svm)
 {
-   svm_inject_irq(svm, pop_irq(&svm->vcpu));
+   svm_inject_irq(svm, kvm_pop_irq(&svm->vcpu));
 }
 
 static void do_interrupt_requests(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b5eae7a..2c0a2ed 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2489,18 +2489,6 @@ static void vmx_update_window_states(struct kvm_vcpu 
*vcpu)
 GUEST_INTR_STATE_MOV_SS)));
 }
 
-static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
-{
-   int word_index = __ffs(vcpu->arch.irq_summary);
-   int bit_index = __ffs(vcpu->arch.irq_pending[word_index]);
-   int irq = word_index * BITS_PER_LONG + bit_index;
-
-   clear_bit(bit_index, &vcpu->arch.irq_pending[word_index]);
-   if (!vcpu->arch.irq_pending[word_index])
-   clear_bit(word_index, &vcpu->arch.irq_summary);
-   kvm_queue_interrupt(vcpu, irq);
-}
-
 static void do_interrupt_requests(struct kvm_vcpu *vcpu,
   struct kvm_run *kvm_run)
 {
@@ -2534,7 +2522,7 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 
if (vcpu->arch.interrupt_window_open) {
if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
-   kvm_do_inject_irq(vcpu);
+   kvm_queue_interrupt(vcpu, kvm_pop_irq(vcpu));
 
if (vcpu->arch.interrupt.pending)
vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
@@ -2619,8 +2607,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 
if (!irqchip_in_kernel(vcpu->kvm) && is_external_interrupt(vect_info)) {
int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
-   set_bit(irq, vcpu->arch.irq_pending);
-   set_bit(irq / BITS_PER_LONG, &vcpu->arch.irq_summary);
+   kvm_push_irq(vcpu, irq);
}
 
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6a4be78..2ab6791 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -19,4 +19,22 @@ static inline void kvm_clear_interrupt_queue(struct kvm_vcpu 
*vcpu)
vcpu->arch.interrupt.pending = false;
 }
 
+static inline u8 kvm_pop_irq(struct kvm_vcpu *vcpu)
+{
+   int word_index = __ffs(vcpu->arch.irq_summary);
+   int bit_index = __ffs(vcpu->arch.irq_pending[word_index]);
+   int irq = word_index * BITS_PER_LONG + bit_index;
+
+   clear_bit(bit_index, &vcpu->arch.irq_pending[word_index]);
+   if (!vcpu->arch.irq_pending[word_index])

[PATCH 26/43] KVM: ia64: Create inline function kvm_get_itc() to centralize ITC reading.

2009-05-18 Thread Avi Kivity
From: Jes Sorensen 

Move all reading of special register 'AR_ITC' into two functions, one
in the kernel and one in the VMM module. When running on SN2, base the
result on the RTC rather the system ITC, as the ITC isn't
synchronized.

Signed-off-by: Jes Sorensen 
Acked-by: Xiantao Zhang 
Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/kvm-ia64.c |   18 ++
 arch/ia64/kvm/vcpu.c |   20 ++--
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 14a3fab..ebbd995 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -68,6 +68,16 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+static unsigned long kvm_get_itc(struct kvm_vcpu *vcpu)
+{
+#if defined(CONFIG_IA64_SGI_SN2) || defined(CONFIG_IA64_GENERIC)
+   if (vcpu->kvm->arch.is_sn2)
+   return rtc_time();
+   else
+#endif
+   return ia64_getreg(_IA64_REG_AR_ITC);
+}
+
 static void kvm_flush_icache(unsigned long start, unsigned long len)
 {
int l;
@@ -457,7 +467,7 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 
if (irqchip_in_kernel(vcpu->kvm)) {
 
-   vcpu_now_itc = ia64_getreg(_IA64_REG_AR_ITC) + 
vcpu->arch.itc_offset;
+   vcpu_now_itc = kvm_get_itc(vcpu) + vcpu->arch.itc_offset;
 
if (time_after(vcpu_now_itc, vpd->itm)) {
vcpu->arch.timer_check = 1;
@@ -929,7 +939,7 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
RESTORE_REGS(saved_gp);
 
vcpu->arch.irq_new_pending = 1;
-   vcpu->arch.itc_offset = regs->saved_itc - ia64_getreg(_IA64_REG_AR_ITC);
+   vcpu->arch.itc_offset = regs->saved_itc - kvm_get_itc(vcpu);
set_bit(KVM_REQ_RESUME, &vcpu->requests);
 
vcpu_put(vcpu);
@@ -1210,7 +1220,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
regs->cr_iip = PALE_RESET_ENTRY;
 
/*Initialize itc offset for vcpus*/
-   itc_offset = 0UL - ia64_getreg(_IA64_REG_AR_ITC);
+   itc_offset = 0UL - kvm_get_itc(vcpu);
for (i = 0; i < kvm->arch.online_vcpus; i++) {
v = (struct kvm_vcpu *)((char *)vcpu +
sizeof(struct kvm_vcpu_data) * i);
@@ -1472,7 +1482,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
}
for (i = 0; i < 4; i++)
regs->insvc[i] = vcpu->arch.insvc[i];
-   regs->saved_itc = vcpu->arch.itc_offset + ia64_getreg(_IA64_REG_AR_ITC);
+   regs->saved_itc = vcpu->arch.itc_offset + kvm_get_itc(vcpu);
SAVE_REGS(xtp);
SAVE_REGS(metaphysical_rr0);
SAVE_REGS(metaphysical_rr4);
diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c
index a18ee17..a2c6c15 100644
--- a/arch/ia64/kvm/vcpu.c
+++ b/arch/ia64/kvm/vcpu.c
@@ -788,13 +788,29 @@ void vcpu_set_fpreg(struct kvm_vcpu *vcpu, unsigned long 
reg,
setfpreg(reg, val, regs);   /* FIXME: handle NATs later*/
 }
 
+/*
+ * The Altix RTC is mapped specially here for the vmm module
+ */
+#define SN_RTC_BASE(u64 *)(KVM_VMM_BASE+(1UL= VMX(vcpu, last_itc)) {
VMX(vcpu, last_itc) = guest_itc;
@@ -809,7 +825,7 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val)
struct kvm_vcpu *v;
struct kvm *kvm;
int i;
-   long itc_offset = val - ia64_getreg(_IA64_REG_AR_ITC);
+   long itc_offset = val - kvm_get_itc(vcpu);
unsigned long vitv = VCPU(vcpu, itv);
 
kvm = (struct kvm *)KVM_VM_BASE;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/43] KVM: declare ioapic functions only on affected hardware

2009-05-18 Thread Avi Kivity
From: Christian Borntraeger 

Since "KVM: Unify the delivery of IOAPIC and MSI interrupts"
I get the following warnings:

  CC [M]  arch/s390/kvm/kvm-s390.o
In file included from arch/s390/kvm/kvm-s390.c:22:
include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside 
parameter list
include/linux/kvm_host.h:357: warning: its scope is only this definition or 
declaration, which is probably not what you want

This patch limits IOAPIC functions for architectures that have one.

Signed-off-by: Christian Borntraeger 
Signed-off-by: Avi Kivity 
---
 include/linux/kvm_host.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3832243..3b91ec9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -363,9 +363,11 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
  struct kvm_irq_mask_notifier *kimn);
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
 
+#ifdef __KVM_HAVE_IOAPIC
 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
   union kvm_ioapic_redirect_entry *entry,
   unsigned long *deliver_bitmask);
+#endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio net regression

2009-05-18 Thread Avi Kivity

Antoine Martin wrote:

Hi,

Here is another one, any ideas?
These oopses do look quite deep. Is it normal to end up in tcp_send_ack
from pdflush??

  


I think it can happen anywhere, part of the net softirq.


Cheers
Antoine

[929492.154634] pdflush: page allocation failure. order:0, mode:0x20
  


You're out of memory.  How much memory did you allocate to the guest?  
did you balloon it?



[929492.154637] Pid: 291, comm: pdflush Not tainted 2.6.29.2 #5
[929492.154639] Call Trace:
[929492.154641][]
__alloc_pages_internal+0x3e1/0x401
[929492.154649]  [] try_fill_recv+0xa1/0x182
[929492.154652]  [] virtnet_poll+0x533/0x5ab
[929492.154655]  [] net_rx_action+0x70/0x143
[929492.154658]  [] __do_softirq+0x83/0x123
[929492.154661]  [] call_softirq+0x1c/0x28
[929492.154664]  [] do_softirq+0x3c/0x85
[929492.154666]  [] irq_exit+0x3f/0x7a
[929492.154668]  [] do_IRQ+0x12b/0x14f
[929492.154670]  [] ret_from_intr+0x0/0x29
[929492.154672][]
__set_page_dirty_buffers+0x0/0x8f
[929492.154677]  [] bget_one+0x0/0xb
[929492.154680]  [] walk_page_buffers+0x2/0x8b
[929492.154682]  [] ext3_ordered_writepage+0xae/0x134
[929492.154685]  [] __writepage+0xa/0x25
[929492.154687]  [] write_cache_pages+0x206/0x322
[929492.154689]  [] __writepage+0x0/0x25
[929492.154691]  [] do_writepages+0x27/0x2d
[929492.154694]  [] __writeback_single_inode+0x1a7/0x3b5
[929492.154696]  [] __switch_to+0xb4/0x38c
[929492.154698]  [] generic_sync_sb_inodes+0x2a7/0x458
[929492.154701]  [] writeback_inodes+0x8d/0xe6
[929492.154704]  [] _spin_lock+0x5/0x7
[929492.155056]  [] wb_kupdate+0x9f/0x116
[929492.155058]  [] pdflush+0x14b/0x202
[929492.155061]  [] wb_kupdate+0x0/0x116
[929492.155063]  [] pdflush+0x0/0x202
[929492.155065]  [] pdflush+0x0/0x202
[929492.155068]  [] kthread+0x47/0x73
[929492.155070]  [] child_rip+0xa/0x20
[929492.155072]  [] kthread+0x0/0x73
[929492.183142]  [] child_rip+0x0/0x20
[929492.183145] Mem-Info:
[929492.183147] DMA per-cpu:
[929492.183149] CPU0: hi:0, btch:   1 usd:   0
[929492.183151] DMA32 per-cpu:
[929492.183154] CPU0: hi:  186, btch:  31 usd: 184
[929492.183158] Active_anon:2755 active_file:39849 inactive_anon:2972
[929492.183159]  inactive_file:70353 unevictable:0 dirty:4172
writeback:1580 unstable:0
[929492.183161]  free:734 slab:5619 mapped:15047 pagetables:927 bounce:0
[929492.183166] DMA free:1968kB min:28kB low:32kB high:40kB
active_anon:0kB inactive_anon:40kB active_file:2116kB
inactive_file:1880kB unevictable:0kB present:5448kB pages_scanned:0
all_unreclaimable? no
[929492.183169] lowmem_reserve[]: 0 489 489 489
[929492.183176] DMA32 free:968kB min:2812kB low:3512kB high:4216kB
active_anon:11020kB inactive_anon:11848kB active_file:157280kB
inactive_file:279532kB unevictable:0kB present:500896kB pages_scanned:0
all_unreclaimable? no
[929492.183180] lowmem_reserve[]: 0 0 0 0
[929492.183183] DMA: 6*4kB 2*8kB 3*16kB 1*32kB 1*64kB 2*128kB 0*256kB
1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB
[929492.183235] DMA32: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 3*128kB 2*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 968kB
[929492.183244] 110992 total pagecache pages
[929492.183246] 739 pages in swap cache
[929492.183248] Swap cache stats: add 8996, delete 8257, find 92604/93191
[929492.183250] Free swap  = 1040016kB
[929492.183252] Total swap = 1048568kB
[929492.186003] 131056 pages RAM
[929492.186006] 4799 pages reserved
[929492.186007] 44697 pages shared
[929492.186008] 90516 pages non-shared
[930274.380075] eth0: no IPv6 routers present

  

Strange, seems to be a bit of free memory here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: ia64: fix up compilation error

2009-05-18 Thread Zhang, Yang
add iommu_flags for struct kvm_arch in ia64

Signed-off-by: Yang Zhang 
Signed-off-by: Xiantao Zhang 

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 589536f..012aca0 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -468,6 +468,7 @@ struct kvm_arch {
int online_vcpus;
int is_sn2;
 
+   int iommu_flags;
struct kvm_ioapic *vioapic;
struct kvm_vm_stat stat;
struct kvm_sal_data rdv_sal_data;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 716a4ec..4eba715 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -395,7 +395,7 @@ struct kvm_arch{
struct list_head active_mmu_pages;
struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
-#define KVM_IOMMU_CACHE_COHERENCY  0x1
+
int iommu_flags;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2b8df0c..ec0f004 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -42,6 +42,8 @@
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 
+#define KVM_IOMMU_CACHE_COHERENCY  0x1
+
 struct kvm_vcpu;
 extern struct kmem_cache *kvm_vcpu_cache;
 

0001-kvm-ia64-fix-up-compilation-error.patch
Description: 0001-kvm-ia64-fix-up-compilation-error.patch


Re: kvm 84 update

2009-05-18 Thread Avi Kivity

Brent A Nelson wrote:
Well, I just tried disabling smp on my guest, and live migration 
worked fine.  I see that there are already SMP complaints with live 
migration in KVM:84 in the bug database, so I guess this is just 
another, "Me, too."


Is this expected to be fixed in the soon-to-be-released KVM:85?


There's a fix in kvm-86.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] Intel-IOMMU, intr-remap: source-id checking

2009-05-18 Thread Han, Weidong
Ingo Molnar wrote:
> * Han, Weidong  wrote:
> 
>> Siddha, Suresh B wrote:
>>> On Wed, 2009-05-06 at 23:16 -0700, Han, Weidong wrote:
 @@ -634,6 +694,44 @@ static int ir_parse_ioapic_scope(struct
   acpi_dmar_header *header, " 0x%Lx\n",
   scope->enumeration_id, drhd->address);
 
 +  bus = pci_find_bus(drhd->segment, scope->bus);
 +  path = (struct acpi_dmar_pci_path *)(scope + 1); +  
 count =
 (scope->length - +  sizeof(struct 
 acpi_dmar_device_scope))
 +  / sizeof(struct acpi_dmar_pci_path);
 +
 +  while (count) {
 +  if (pdev)
 +  pci_dev_put(pdev);
 +
 +  if (!bus)
 +  break;
 +
 +  pdev = pci_get_slot(bus,
 +  PCI_DEVFN(path->dev, path->fn));
 +  if (!pdev)
 +  break;
>>> 
>>> ir_parse_ioapic_scope() happens very early in the boot. So, I
>>> don't think we can do the pci related discovery here.
>>> 
>> 
>> Thanks for your pointing it out. It should enable the source-id
>> checking for io-apic's after the pci subsystem is up. I will
>> change it.
> 
> Note, there's ways to do early PCI quirks too, check
> arch/x86/kernel/early-quirks.c. It's done by reading the PCI
> configuration space directly via a careful early-capable subset of
> the PCI config space APIs.
> 
> But it's a method of last resort.
> 

Thanks for your reminder. It can use direct PCI access here as follows. It's 
easy and clean. I think it's better than adding the source-id checking for 
io-apic's after the pci subsystem is up. I will send out updated patches after 
some tests.
 
@@ -634,6 +695,24 @@ static int ir_parse_ioapic_scope(struct acpi_dmar_header 
*header,
   " 0x%Lx\n", scope->enumeration_id,
   drhd->address);

+   bus = scope->bus;
+   path = (struct acpi_dmar_pci_path *)(scope + 1);
+   count = (scope->length -
+sizeof(struct acpi_dmar_device_scope))
+   / sizeof(struct acpi_dmar_pci_path);
+
+   while (--count > 0) {
+   /* Access PCI directly due to the PCI
+* subsystem isn't initialized yet.
+*/
+   bus = read_pci_config_byte(bus, path->dev,
+   path->fn, PCI_SECONDARY_BUS);
+   path++;
+   }
+
+   ir_ioapic[ir_ioapic_num].bus = bus;
+   ir_ioapic[ir_ioapic_num].devfn =
+   PCI_DEVFN(path->dev, path->fn);
ir_ioapic[ir_ioapic_num].iommu = iommu;
ir_ioapic[ir_ioapic_num].id = scope->enumeration_id;
ir_ioapic_num++;




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: ia64: fix up compilation error

2009-05-18 Thread Avi Kivity

Zhang, Yang wrote:

add iommu_flags for struct kvm_arch in ia64
  


Thanks, I already fixed it (in the same way).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 4/4] virtio_pci: optional MSI-X support

2009-05-18 Thread Michael S. Tsirkin
On Mon, May 18, 2009 at 12:30:49AM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> This implements optional MSI-X support in virtio_pci.
>> MSI-X is used whenever the host supports at least 2 MSI-X
>> vectors: 1 for configuration changes and 1 for virtqueues.
>> Per-virtqueue vectors are allocated if enough vectors
>> available.
> 
>
> I'm not sure I understand how the vq -> msi mapping works.  Do we  
> actually support an arbitrary mapping, or just either linear or n:1?

Arbitrary mapping.

> I don't mind the driver being limited, but the device interface should  
> be flexible.  We'll want to deal with limited vector availability soon.

I agree.

The code in qemu lets you specify, for each queue, which MSIX vector you
want to use, or a special value if you don't want signalling. You also
specify which MSIX vector you want to use for config change
notifications, or a special value if you want to e.g. poll.

I think that's as flexible as it gets.

The API within guest is much simpler, but it does not need to be stable.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Shared memory device with interrupt support

2009-05-18 Thread Gregory Haskins
Avi Kivity wrote:
> Cam Macdonell wrote:
>>>
>>> If my understanding is correct both the VM's who wants to
>>> communicate would gives this path in the command line with one of
>>> them specifying as "server".
>>
>> Exactly, the one with the "server" in the parameter list will wait
>> for a connection before booting.
>
> hm, we may be able to eliminate the server from the fast path, at the
> cost of some complexity.
>
> When a guest connects to the server, the server creates an eventfd and
> passes using SCM_RIGHTS to all other connected guests.  The server
> also passes the eventfds of currently connected guests to the new
> guest.  From now on, the server does not participate in anything; when
> a quest wants to send an interrupt to one or more other guests, its
> qemu just writes to the eventfds() of the corresponding guests; their
> qemus will inject the interrupt, without any server involvement.
>
> Now, anyone who has been paying attention will have their alarms going
> off at the word eventfd.  And yes, if the host supports irqfd, the
> various qemus can associate those eventfds with an irq and pretty much
> forget about them.  When a qemu triggers an irqfd, the interrupt will
> be injected directly without the target qemu's involvement.

I'll just add that you could tie the irqfd to an iosignalfd to eliminate
the involvement of qemu on either side as well.  I'm not sure if that
really works with the design of this particular device (e.g. perhaps
qemu is needed for other reasons besides signaling), but it is a neat
demonstration of the flexibility of the newly emerging kvm-eventfd
interfaces.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Michael S. Tsirkin
On Sun, May 17, 2009 at 11:47:53PM +0300, Avi Kivity wrote:
> Alex Williamson wrote:
>> On Wed, 2009-05-13 at 08:33 -0600, Alex Williamson wrote:
>>   
>>> On Wed, 2009-05-13 at 08:15 -0600, Alex Williamson wrote:
>>> 
 On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote:
   
> Very surprising: I haven't seen any driver disable MSI expect on device
> destructor path. Is this a linux guest?
> 
 Yes, Debian 2.6.26 kernel.  I'll check it it behaves the same on newer
 upstream kernels and try to figure out why it's doing it.
   
>>> Updating the guest to 2.6.29 seems to fix the interrupt toggling.  So
>>> it's either something in older kernels or something debian introduced,
>>> but that seems unlikely.
>>> 
>>
>> For the curious, this was fixed prior to 2.6.27-rc1 by this:
>>
>> commit ce6fce4295ba727b36fdc73040e444bd1aae64cd
>> Author: Matthew Wilcox
>> Date:   Fri Jul 25 15:42:58 2008 -0600
>>
>> PCI MSI: Don't disable MSIs if the mask bit isn't supported
>> David Vrabel has a device which generates an interrupt storm on 
>> the INTx
>> pin if we disable MSI interrupts altogether.  Masking interrupts is only
>> a performance optimisation, so we can ignore the request to mask the
>> interrupt.
>>
>> It looks like without the maskbit attribute on MSI, the default way to
>> mask an MSI interrupt was to toggle the MSI enable bit.  This was
>> introduced in 58e0543e8f355b32f0778a18858b255adb7402ae, so it's lifespan
>> was probably 2.6.21 - 2.6.26.
>>
>>   
>
> On the other hand, if the host device supports this maskbit attribute,  
> we might want to support it.  I'm not sure exactly how though.
>
> If we trap msi entry writes, we're inviting the guest to exit every time  
> it wants to disable interrupts.  If we don't, we're inviting spurious  
> interrupts, which will cause unwanted exits and injections.

Avi, I think you are mixing up MSI and MSI-X. MSI does not have any tables:
all changes go through configuration writes, which AFAIK we always trap.
Isn't that right?

On the other hand, in MSI-X mask bit is mandatory, not optional
so we'll have to support it for assigned devices at some point.

If we are worried about speed of masking/unmasking MSI-X interrupts for
assigned devices (older kernels used to mask them, recent kernels leave
this to drivers) we will probably need to have MSI-X support in the
kernel, and have kernel examine the mask bit before injecting the
interrupt, just like real devices do.

> Maybe we ought to let the guest write to the msi tables without  
> trapping, and in the injection logic do something like
>
>msi_entry = *msi_entry_ptr;
>mb();
>if (msi_entry != msi->last_msi_entry)
> msi_reconfigure();
>if (msi_enabled(msi_entry))
> insert_the_needle();

I don't really understand why do you need the reconfigure
and tracking last msi entry here.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] If interrupt injection is not possible do not scan IRR.

2009-05-18 Thread Gleb Natapov
Forget to remove debug output before submitting. Resending.

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..d32ceac 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -218,6 +218,11 @@ int kvm_pic_read_irq(struct kvm *kvm)
struct kvm_pic *s = pic_irqchip(kvm);
 
pic_lock(s);
+   if (!s->output) {
+   pic_unlock(s);
+   return -1;
+   }
+   s->output = 0;
irq = pic_get_irq(&s->pics[0]);
if (irq >= 0) {
pic_intack(&s->pics[0], irq);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 96dfbb6..e93405a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -78,7 +78,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
if (vector == -1) {
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v->kvm);
-   s->output = 0;  /* PIC */
vector = kvm_pic_read_irq(v->kvm);
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 44e87a5..854e8c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3174,10 +3174,10 @@ static void inject_pending_irq(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
vcpu->arch.nmi_injected = true;
kvm_x86_ops->set_nmi(vcpu);
}
-   } else if (kvm_cpu_has_interrupt(vcpu)) {
-   if (kvm_x86_ops->interrupt_allowed(vcpu)) {
-   kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
-   false);
+   } else if (kvm_x86_ops->interrupt_allowed(vcpu)) {
+   int vec = kvm_cpu_get_interrupt(vcpu);
+   if (vec != -1) {
+   kvm_queue_interrupt(vcpu, vec, false);
kvm_x86_ops->set_irq(vcpu);
}
}
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM_HYPERCALL

2009-05-18 Thread Kumar, Venkat
Hi Avi - Yes the control is not coming to neither " kvm_handle_exit " nor 
"handle_vmcall" after the hypercall is made from the guest.
If I am not wrong, the "KVM_HYPERCALL" instruction is expected to work, isn't 
it?

Thx,

Venkat


-Original Message-
From: Avi Kivity [mailto:a...@redhat.com] 
Sent: Monday, May 18, 2009 1:56 AM
To: Kumar, Venkat
Cc: kvm@vger.kernel.org
Subject: Re: KVM_HYPERCALL

Kumar, Venkat wrote:
> I am making a hypercall "kvm_hypercall0" with number 0 from a Linux guest. 
> But I don't see the control coming to "handle_vmcall" or even 
> "kvm_handle_exit". What could be the reason?
>   

No idea.  kvm_handle_exit() is called very frequently, even without 
hypercalls.  Are you sure you don't see it called?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM_HYPERCALL

2009-05-18 Thread Kumar, Venkat
Hi Avi - Yes the control is not coming to neither " kvm_handle_exit " nor 
"handle_vmcall" after the hypercall is made from the guest.
If I am not wrong, the "KVM_HYPERCALL" instruction is expected to work, isn't 
it?

Thx,

Venkat


-Original Message-
From: Avi Kivity [mailto:a...@redhat.com] 
Sent: Monday, May 18, 2009 1:56 AM
To: Kumar, Venkat
Cc: kvm@vger.kernel.org
Subject: Re: KVM_HYPERCALL

Kumar, Venkat wrote:
> I am making a hypercall "kvm_hypercall0" with number 0 from a Linux guest. 
> But I don't see the control coming to "handle_vmcall" or even 
> "kvm_handle_exit". What could be the reason?
>   

No idea.  kvm_handle_exit() is called very frequently, even without 
hypercalls.  Are you sure you don't see it called?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM_HYPERCALL

2009-05-18 Thread Avi Kivity

Kumar, Venkat wrote:

Hi Avi - Yes the control is not coming to neither " kvm_handle_exit " nor 
"handle_vmcall" after the hypercall is made from the guest.
If I am not wrong, the "KVM_HYPERCALL" instruction is expected to work, isn't 
it?
  


Yes, it should.  Are you sure the guest is executing this instruction?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set bit 1 in disabled processor's _STA

2009-05-18 Thread Glauber Costa
On Mon, May 18, 2009 at 08:53:58AM +0300, Gleb Natapov wrote:
> On Mon, May 18, 2009 at 08:44:54AM +0300, Avi Kivity wrote:
> > Gleb Natapov wrote:
> >> On Sun, May 17, 2009 at 11:07:31PM +0300, Avi Kivity wrote:
> >>   
> >>> Gleb Natapov wrote:
> >>> 
>  Theoretically we can provide different values for different OSes, but
>  this is just a guess work since there is no any documentation how CPU
>  hot-plug should work on x86.
>  
> >>> ACPI in fact supports this, but I hope we don't have to do that.
> >>>
> >>> 
> >> ACPI way is what I am talking about. Implement _OS object.
> >>   
> >
> > /*
> > * The story of _OSI(Linux)
> > *
> > * From pre-history through Linux-2.6.22,
> > * Linux responded TRUE upon a BIOS OSI(Linux) query.
> > *
> > * Unfortunately, reference BIOS writers got wind of this
> > * and put OSI(Linux) in their example code, quickly exposing
> > * this string as ill-conceived and opening the door to
> > * an un-bounded number of BIOS incompatibilities.
> > *
> > * For example, OSI(Linux) was used on resume to re-POST a
> > * video card on one system, because Linux at that time
> > * could not do a speedy restore in its native driver.
> > * But then upon gaining quick native restore capability,
> > * Linux has no way to tell the BIOS to skip the time-consuming
> > * POST -- putting Linux at a permanent performance disadvantage.
> > * On another system, the BIOS writer used OSI(Linux)
> > * to infer native OS support for IPMI!  On other systems,
> > * OSI(Linux) simply got in the way of Linux claiming to
> > * be compatible with other operating systems, exposing
> > * BIOS issues such as skipped device initialization.
> > *
> > * So "Linux" turned out to be a really poor chose of
> > * OSI string, and from Linux-2.6.23 onward we respond FALSE.
> > *
> > * BIOS writers should NOT query _OSI(Linux) on future systems.
> > * Linux will complain on the console when it sees it, and return FALSE.
> > * To get Linux to return TRUE for your system  will require
> > * a kernel source update to add a DMI entry,
> > * or boot with "acpi_osi=Linux"
> > */
> >
> > // Looks like no real content in this message?
> >
> Now I recall something on LKML about this. Well, in this case Linux
> shouldn't have used ACPI to invent its own way to do cpu hot-plug.
It didn't.
History shows that this method is what is used in some unisys machines,
which seems to be the only ones implementing this around.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Sun, May 17, 2009 at 11:47:53PM +0300, Avi Kivity wrote:
  

Alex Williamson wrote:


On Wed, 2009-05-13 at 08:33 -0600, Alex Williamson wrote:
  
  

On Wed, 2009-05-13 at 08:15 -0600, Alex Williamson wrote:



On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote:
  
  

Very surprising: I haven't seen any driver disable MSI expect on device
destructor path. Is this a linux guest?



Yes, Debian 2.6.26 kernel.  I'll check it it behaves the same on newer
upstream kernels and try to figure out why it's doing it.
  
  

Updating the guest to 2.6.29 seems to fix the interrupt toggling.  So
it's either something in older kernels or something debian introduced,
but that seems unlikely.



For the curious, this was fixed prior to 2.6.27-rc1 by this:

commit ce6fce4295ba727b36fdc73040e444bd1aae64cd
Author: Matthew Wilcox
Date:   Fri Jul 25 15:42:58 2008 -0600

PCI MSI: Don't disable MSIs if the mask bit isn't supported
David Vrabel has a device which generates an interrupt storm on 
the INTx

pin if we disable MSI interrupts altogether.  Masking interrupts is only
a performance optimisation, so we can ignore the request to mask the
interrupt.

It looks like without the maskbit attribute on MSI, the default way to
mask an MSI interrupt was to toggle the MSI enable bit.  This was
introduced in 58e0543e8f355b32f0778a18858b255adb7402ae, so it's lifespan
was probably 2.6.21 - 2.6.26.

  
  
On the other hand, if the host device supports this maskbit attribute,  
we might want to support it.  I'm not sure exactly how though.


If we trap msi entry writes, we're inviting the guest to exit every time  
it wants to disable interrupts.  If we don't, we're inviting spurious  
interrupts, which will cause unwanted exits and injections.



Avi, I think you are mixing up MSI and MSI-X. MSI does not have any tables:
all changes go through configuration writes, which AFAIK we always trap.
Isn't that right?
  


Right.


On the other hand, in MSI-X mask bit is mandatory, not optional
so we'll have to support it for assigned devices at some point.

If we are worried about speed of masking/unmasking MSI-X interrupts for
assigned devices (older kernels used to mask them, recent kernels leave
this to drivers) we will probably need to have MSI-X support in the
kernel, and have kernel examine the mask bit before injecting the
interrupt, just like real devices do.
  


Yes.  Let's actually quantify this though.  Several times per second 
doesn't qualify.


Maybe we ought to let the guest write to the msi tables without  
trapping, and in the injection logic do something like


   msi_entry = *msi_entry_ptr;
   mb();
   if (msi_entry != msi->last_msi_entry)
msi_reconfigure();
   if (msi_enabled(msi_entry))
insert_the_needle();



I don't really understand why do you need the reconfigure
and tracking last msi entry here.
  


The msi entry can change the vector and delivery mode, no?  This way we 
can cache some stuff instead of decoding it each time.  For example, if 
the entry points at a specific vcpu, we can cache the vcpu pointer, 
instead of searching for the apic ID.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Shared memory device with interrupt support

2009-05-18 Thread Avi Kivity

Gregory Haskins wrote:

I'll just add that you could tie the irqfd to an iosignalfd to eliminate
the involvement of qemu on either side as well.  I'm not sure if that
really works with the design of this particular device (e.g. perhaps
qemu is needed for other reasons besides signaling), but it is a neat
demonstration of the flexibility of the newly emerging kvm-eventfd
interfaces.
  


If we have an iosignalfd for point-to-point (say, a pio port with the 
guest ID) we can do direct guest-to-guest signalling.  For broadcast or 
multicast, we need to exit to qemu to handle the loop.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set bit 1 in disabled processor's _STA

2009-05-18 Thread Gleb Natapov
On Mon, May 18, 2009 at 08:39:43AM -0300, Glauber Costa wrote:
> > Now I recall something on LKML about this. Well, in this case Linux
> > shouldn't have used ACPI to invent its own way to do cpu hot-plug.
> It didn't.
> History shows that this method is what is used in some unisys machines,
> which seems to be the only ones implementing this around.
The questions are: What is "this" that linux currently implements, how
windows expects CPU hot-plug to work, are there any real x86 hardware
that supports CPU host-plug and what should we do about all this.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Nested SVM: Improve interrupt injection

2009-05-18 Thread Alexander Graf


On 17.05.2009, at 08:48, Gleb Natapov wrote:


On Fri, May 15, 2009 at 10:22:20AM +0200, Alexander Graf wrote:

static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
{
struct vcpu_svm *svm = to_svm(vcpu);

-   nested_svm_intr(svm);
+   if(!(svm->vcpu.arch.hflags & HF_GIF_MASK))
+   return;


Why would this function be called if HF_GIF_MASK is not set? This
check is done in svm_interrupt_allowed().


Looks like I was doing something odd - WARN_ON doesn't trigger (which  
is reasonable).


I think I put the check in when I still had nested_svm_intr() before,  
because that would unset GIF.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-Autotest: server problem

2009-05-18 Thread Lucas Meneghel Rodrigues
On Mon, 2009-05-18 at 04:01 -0400, Michael Goldish wrote:
> Hi Lucas,
> 
> Since I consider you our Autotest reference I direct the following question 
> to you.
> 
> Currently our Autotest servers run tests in client mode using the same 
> control file on all hosts. We want to move on to dispatching tests from the 
> server, using a server control file, so that each host runs several test 
> execution pipelines. As far as I know this should be straightforward using 
> at.run_test(), subcommand() and parallel(), as explained in 
> http://autotest.kernel.org/wiki/ServerControlHowto. Theoretically there 
> should be no problem running several pipelines on each host because several 
> independent copies of the Autotest client can be installed in unique 
> temporary directories on each host (using set_install_in_tmpdir()).
> 
> Though me managed to run two tests in parallel on a single host, the server 
> seems to have trouble parsing the results. Depending on what test tags we 
> specify, the server either displays none or some of the results, but never 
> all of them. Also, there seems to be a difference between what the server 
> displays during execution, and what it displays after execution has 
> completed. In one of the configurations we've tried the server displayed the 
> test results while they were still executing, but as soon as they were 
> completed, the results disappeared and the only visible results remaining 
> were those of the Autotest client installation.
> 
> Is this a known issue, or is it more likely that I made a mistake somewhere? 
> Is there a known fix or workaround? Could this functionality (running tests 
> in parallel on the same host, from the server) be unsupported?

We've had bugfixes going on the development branch (and 0.10 for that
matter) that fixed multi-machine parsing results (this is one example).

http://autotest.kernel.org/changeset/2894

So yes, it's very likely that this is a bug that happened on the
autotest tree we've used as a base for kvm-autotest. 

I've made tests with my patches that sync our tests with upstream, and
had good results (test doesn't break and does generate all results as
expected). I would like to work with you to try this using a current
autotest development tree + the upstream conversion patches.

> I haven't provided any code because I've temporarily lost contact with the 
> server I was experimenting on. If you find it useful I'll provide some code 
> as soon as I regain access.

That'd be great, let's work this out this week.

> Note: this message is unrelated to the one I posted yesterday to the KVM 
> list. Yesterday's message refers to the possibility of running tests in 
> parallel using a client control file, not a server one.
> 
> Thanks,
> Michael
-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM_HYPERCALL

2009-05-18 Thread Gregory Haskins
Kumar, Venkat wrote:
> Hi Avi - Yes the control is not coming to neither " kvm_handle_exit " nor 
> "handle_vmcall" after the hypercall is made from the guest.
> If I am not wrong, the "KVM_HYPERCALL" instruction is expected to work, isn't 
> it?
>   

Hi Venkat,
  I have used this method of IO many times, so I can attest it does work
in general.   I suspect that there may be some issue in either the
configuration of your environment or perhaps your particular usage of it.

What version guest and host are you running?  Can you confirm that your
guest has the proper pv-ops support enabled?  If so, do you check the
cpuid for the presence of your HC feature (or at least check for the
presence of others if you are not explicitly advertising your own, like
MMU_OP).  See kvm_para_has_feature().  If that returns 'true' then you
at least know the hypercall patching has been performed on your guest.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [PATCH, RFC] virtio_blk: add cache flush command

2009-05-18 Thread Christoph Hellwig
On Mon, May 11, 2009 at 01:29:06PM -0500, Anthony Liguori wrote:
> >It don't think we realistically can.
> 
> Maybe two fds?  One open in O_SYNC and one not.  Is such a thing sane?

O_SYNC and O_DIRECT currently behave exactly in the same way.  In none
of the filesystem I've looked at there is an explicit cache flush,
but all send down a barrier which gives you an implicit cache flush
with the way barriers are implemented.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] virtio_blk: add cache flush command

2009-05-18 Thread Christoph Hellwig
On Tue, May 12, 2009 at 11:35:23AM +0300, Avi Kivity wrote:
> >The cache size on disks is constantly growing, and if you lose cache
> >it doesn't really matter how much you lose but what you lose.
> >  
> 
> Software errors won't cause data loss on a real disk (firmware bugs 
> will, but the firmware is less likely to crash than the host OS).

OS crash or hardware crash really makes no difference for writeback
caches.   The case we are trying to protect against here is any crash,
and a hardware induced one or power failure is probably more likely
in either case than a software failure in either the OS or firmware.

> >If you care about data integrity in case of crashes qcow2 doesn't work
> >at all.
> >  
> 
> Do you known of any known corruptors in qcow2 with cache=writethrough?

It's not related to cache=writethrough.  The issue is that there are no
transactional guarantees in qcow2.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] virtio_blk: add cache flush command

2009-05-18 Thread Christoph Hellwig
On Tue, May 12, 2009 at 04:18:36PM +0200, Christian Borntraeger wrote:
> Am Tuesday 12 May 2009 15:54:14 schrieb Rusty Russell:
> > On Mon, 11 May 2009 06:09:08 pm Christoph Hellwig wrote:
> > > Do we need a new feature flag for this command or can we expect that
> > > all previous barrier support was buggy enough anyway?
> > 
> > You mean reuse the VIRTIO_BLK_F_BARRIER for this as well?  Seems fine.
> > 
> > AFAIK only lguest offered that, and lguest has no ABI.  Best would be to 
> > implement VIRTIO_BLK_T_FLUSH as well; it's supposed to be demo code, and it 
> > should be easy).
> 
> It is also used by kuli 
> (http://www.ibm.com/developerworks/linux/linux390/kuli.html)
> and kuli used fdatasync. Since kuli is on offical webpages it takes a while
> to update that code. When you reuse the VIRTIO_BLK_F_BARRIER flag, that would 
> trigger some VIRTIO_BLK_T_FLUSH commands sent to the kuli host, right?
> I think the if/else logic of kuli would interpret that as a read requestI 
> am voting for a new feature flag :-)

Ok, next version will have a new feature flag.  I actually have some
other bits I want to redesign so it might take a bit longer to get it
out.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] Shared memory device with interrupt support

2009-05-18 Thread Kumar, Venkat
Cam - I got your patch to work but without notifications. I could share memory 
using the patch but notifications aren't working.

I bring up two VM's with option "-ivshmem shrmem,1024,/dev/shm/shrmem,server" 
and "-ivshmem shrmem,1024,/dev/shm/shrmem" respectively.

When I make an "ioctl" from one of the VM's to inject an interrupt to the other 
VM, I get an error in "qemu_chr_write" and return value is "-1". "write" call 
in "send_all" is failing with return value "-1".

Am I missing something here?

Thx,

Venkat


-Original Message-
From: Cam Macdonell [mailto:c...@cs.ualberta.ca]
Sent: Saturday, May 16, 2009 9:01 AM
To: Kumar, Venkat
Cc: kvm@vger.kernel.org list
Subject: Re: [PATCH v2] Shared memory device with interrupt support


On 15-May-09, at 8:54 PM, Kumar, Venkat wrote:

> Cam,
>
> A questions on interrupts as well.
> What is "unix:path" that needs to be passed in the argument list?
> Can it be any string?

It has to be a valid path on the host.  It will create a unix domain
socket on that path.

>
> If my understanding is correct both the VM's who wants to
> communicate would gives this path in the command line with one of
> them specifying as "server".

Exactly, the one with the "server" in the parameter list will wait for
a connection before booting.

Cam

>
> Thx,
> Venkat
>
>
>
>
>
>
>Support an inter-vm shared memory device that maps a shared-
> memory object
> as a PCI device in the guest.  This patch also supports interrupts
> between
> guest by communicating over a unix domain socket.  This patch
> applies to the
> qemu-kvm repository.
>
> This device now creates a qemu character device and sends 1-bytes
> messages to
> trigger interrupts.  Writes are trigger by writing to the "Doorbell"
> register
> on the shared memory PCI device.  The lower 8-bits of the value
> written to this
> register are sent as the 1-byte message so different meanings of
> interrupts can
> be supported.
>
> Interrupts are only supported between 2 VMs currently.  One VM must
> act as the
> server by adding "server" to the command-line argument.  Shared
> memory devices
> are created with the following command-line:
>
> -ivhshmem ,,[unix:][,server]
>
> Interrupts can also be used between host and guest as well by
> implementing a
> listener on the host.
>
> Cam
>
> ---
> Makefile.target |3 +
> hw/ivshmem.c|  421 ++
> +
> hw/pc.c |6 +
> hw/pc.h |3 +
> qemu-options.hx |   14 ++
> sysemu.h|8 +
> vl.c|   14 ++
> 7 files changed, 469 insertions(+), 0 deletions(-)
> create mode 100644 hw/ivshmem.c
>
> diff --git a/Makefile.target b/Makefile.target
> index b68a689..3190bba 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -643,6 +643,9 @@ OBJS += pcnet.o
> OBJS += rtl8139.o
> OBJS += e1000.o
>
> +# Inter-VM PCI shared memory
> +OBJS += ivshmem.o
> +
> # Generic watchdog support and some watchdog devices
> OBJS += watchdog.o
> OBJS += wdt_ib700.o wdt_i6300esb.o
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> new file mode 100644
> index 000..95e2268
> --- /dev/null
> +++ b/hw/ivshmem.c
> @@ -0,0 +1,421 @@
> +/*
> + * Inter-VM Shared Memory PCI device.
> + *
> + * Author:
> + *  Cam Macdonell 
> + *
> + * Based On: cirrus_vga.c and rtl8139.c
> + *
> + * This code is licensed under the GNU GPL v2.
> + */
> +
> +#include "hw.h"
> +#include "console.h"
> +#include "pc.h"
> +#include "pci.h"
> +#include "sysemu.h"
> +
> +#include "qemu-common.h"
> +#include 
> +
> +#define PCI_COMMAND_IOACCESS0x0001
> +#define PCI_COMMAND_MEMACCESS   0x0002
> +#define PCI_COMMAND_BUSMASTER   0x0004
> +
> +//#define DEBUG_IVSHMEM
> +
> +#ifdef DEBUG_IVSHMEM
> +#define IVSHMEM_DPRINTF(fmt, args...)\
> +do {printf("IVSHMEM: " fmt, ##args); } while (0)
> +#else
> +#define IVSHMEM_DPRINTF(fmt, args...)
> +#endif
> +
> +typedef struct IVShmemState {
> +uint16_t intrmask;
> +uint16_t intrstatus;
> +uint16_t doorbell;
> +uint8_t *ivshmem_ptr;
> +unsigned long ivshmem_offset;
> +unsigned int ivshmem_size;
> +unsigned long bios_offset;
> +unsigned int bios_size;
> +target_phys_addr_t base_ctrl;
> +int it_shift;
> +PCIDevice *pci_dev;
> +CharDriverState * chr;
> +unsigned long map_addr;
> +unsigned long map_end;
> +int ivshmem_mmio_io_addr;
> +} IVShmemState;
> +
> +typedef struct PCI_IVShmemState {
> +PCIDevice dev;
> +IVShmemState ivshmem_state;
> +} PCI_IVShmemState;
> +
> +typedef struct IVShmemDesc {
> +char name[1024];
> +char * chrdev;
> +int size;
> +} IVShmemDesc;
> +
> +
> +/* registers for the Inter-VM shared memory device */
> +enum ivshmem_registers {
> +IntrMask = 0,
> +IntrStatus = 16,
> +Doorbell = 32
> +};
> +
> +static int num_ivshmem_devices = 0;
> +static IVShmemDesc ivshmem_desc;
> +
> +static void ivshmem_map(PCIDevice *pci_dev, int region_num,
> +  

Re: [PATCH 0/2] Add serial number support for virtio_blk, V2

2009-05-18 Thread Christoph Hellwig
On Wed, May 13, 2009 at 01:06:57PM -0400, john cooper wrote:
> [Resend of earlier patch: 1/2 rebased to qemu-kvm,
> 2/2 minor tweak]

patch 1/2 seems to be missing.

> Equivalent functionality currently exists for IDE
> and SCSI, however it is not yet implemented for
> virtio.

So why can't we re-use the existing interfaces instead of inventing a
new one?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cleanup arch/x86/kvm/Makefile

2009-05-18 Thread Christoph Hellwig
Use proper foo-y style list additions to cleanup all the conditionals,
move module selection after compound object selection and remove the
superflous comment.


Signed-off-by: Christoph Hellwig 

Index: linux-2.6/arch/x86/kvm/Makefile
===
--- linux-2.6.orig/arch/x86/kvm/Makefile2009-05-18 13:18:03.783784453 
+0200
+++ linux-2.6/arch/x86/kvm/Makefile 2009-05-18 14:13:27.983658979 +0200
@@ -1,22 +1,16 @@
-#
-# Makefile for Kernel-based Virtual Machine module
-#
-
-common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-coalesced_mmio.o irq_comm.o)
-ifeq ($(CONFIG_KVM_TRACE),y)
-common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
-endif
-ifeq ($(CONFIG_IOMMU_API),y)
-common-objs += $(addprefix ../../../virt/kvm/, iommu.o)
-endif
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
-kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-   i8254.o
-obj-$(CONFIG_KVM) += kvm.o
-kvm-intel-objs = vmx.o
-obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
-kvm-amd-objs = svm.o
-obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
+   coalesced_mmio.o irq_comm.o)
+kvm-$(CONFIG_KVM_TRACE)+= $(addprefix ../../../virt/kvm/, kvm_trace.o)
+kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
+
+kvm-y  += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
+  i8254.o
+kvm-intel-y+= vmx.o
+kvm-amd-y  += svm.o
+
+obj-$(CONFIG_KVM)  += kvm.o
+obj-$(CONFIG_KVM_INTEL)+= kvm-intel.o
+obj-$(CONFIG_KVM_AMD)  += kvm-amd.o
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Michael S. Tsirkin
On Mon, May 18, 2009 at 02:36:41PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> On Sun, May 17, 2009 at 11:47:53PM +0300, Avi Kivity wrote:
>>   
>>> Alex Williamson wrote:
>>> 
 On Wed, 2009-05-13 at 08:33 -0600, Alex Williamson wrote:
 
> On Wed, 2009-05-13 at 08:15 -0600, Alex Williamson wrote:
> 
>> On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote:
>> 
>>> Very surprising: I haven't seen any driver disable MSI expect on device
>>> destructor path. Is this a linux guest?
>>> 
>> Yes, Debian 2.6.26 kernel.  I'll check it it behaves the same on newer
>> upstream kernels and try to figure out why it's doing it.
>> 
> Updating the guest to 2.6.29 seems to fix the interrupt toggling.  So
> it's either something in older kernels or something debian introduced,
> but that seems unlikely.
> 
 For the curious, this was fixed prior to 2.6.27-rc1 by this:

 commit ce6fce4295ba727b36fdc73040e444bd1aae64cd
 Author: Matthew Wilcox
 Date:   Fri Jul 25 15:42:58 2008 -0600

 PCI MSI: Don't disable MSIs if the mask bit isn't supported
 David Vrabel has a device which generates an interrupt 
 storm on the INTx
 pin if we disable MSI interrupts altogether.  Masking interrupts is 
 only
 a performance optimisation, so we can ignore the request to mask the
 interrupt.

 It looks like without the maskbit attribute on MSI, the default way to
 mask an MSI interrupt was to toggle the MSI enable bit.  This was
 introduced in 58e0543e8f355b32f0778a18858b255adb7402ae, so it's lifespan
 was probably 2.6.21 - 2.6.26.

 
>>> On the other hand, if the host device supports this maskbit 
>>> attribute,  we might want to support it.  I'm not sure exactly how 
>>> though.
>>>
>>> If we trap msi entry writes, we're inviting the guest to exit every 
>>> time  it wants to disable interrupts.  If we don't, we're inviting 
>>> spurious  interrupts, which will cause unwanted exits and injections.
>>> 
>>
>> Avi, I think you are mixing up MSI and MSI-X. MSI does not have any tables:
>> all changes go through configuration writes, which AFAIK we always trap.
>> Isn't that right?
>>   
>
> Right.
>
>> On the other hand, in MSI-X mask bit is mandatory, not optional
>> so we'll have to support it for assigned devices at some point.
>>
>> If we are worried about speed of masking/unmasking MSI-X interrupts for
>> assigned devices (older kernels used to mask them, recent kernels leave
>> this to drivers) we will probably need to have MSI-X support in the
>> kernel, and have kernel examine the mask bit before injecting the
>> interrupt, just like real devices do.
>>   
>
> Yes.

Actually, if we do that, we'll need to address a race where a driver
has updated the mask bit in the window after we tested it
and before we inject the interrupt. Not sure how to do this.

> Let's actually quantify this though.  Several times per second  
> doesn't qualify.

Oh, I didn't actually imply that we need to optimize this path.
You seemed worried about the speed of masking the interrupt,
and I commented that to optimize it we'll have to move it
to kernel.

>>> Maybe we ought to let the guest write to the msi tables without   
>>> trapping, and in the injection logic do something like
>>>
>>>msi_entry = *msi_entry_ptr;
>>>mb();
>>>if (msi_entry != msi->last_msi_entry)
>>> msi_reconfigure();
>>>if (msi_enabled(msi_entry))
>>> insert_the_needle();
>>> 
>>
>> I don't really understand why do you need the reconfigure
>> and tracking last msi entry here.
>>   
>
> The msi entry can change the vector and delivery mode, no?  This way we  
> can cache some stuff instead of decoding it each time.  For example, if  
> the entry points at a specific vcpu, we can cache the vcpu pointer,  
> instead of searching for the apic ID.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM_HYPERCALL

2009-05-18 Thread Kumar, Venkat
Ok. With KVM-85 it works. I was using KVM-84 earlier.

Thx,

Venkat


-Original Message-
From: Avi Kivity [mailto:a...@redhat.com] 
Sent: Monday, May 18, 2009 5:03 PM
To: Kumar, Venkat
Cc: kvm@vger.kernel.org
Subject: Re: KVM_HYPERCALL

Kumar, Venkat wrote:
> Hi Avi - Yes the control is not coming to neither " kvm_handle_exit " nor 
> "handle_vmcall" after the hypercall is made from the guest.
> If I am not wrong, the "KVM_HYPERCALL" instruction is expected to work, isn't 
> it?
>   

Yes, it should.  Are you sure the guest is executing this instruction?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Avi Kivity

Michael S. Tsirkin wrote:



On the other hand, in MSI-X mask bit is mandatory, not optional
so we'll have to support it for assigned devices at some point.

If we are worried about speed of masking/unmasking MSI-X interrupts for
assigned devices (older kernels used to mask them, recent kernels leave
this to drivers) we will probably need to have MSI-X support in the
kernel, and have kernel examine the mask bit before injecting the
interrupt, just like real devices do.
  
  

Yes.



Actually, if we do that, we'll need to address a race where a driver
has updated the mask bit in the window after we tested it
and before we inject the interrupt. Not sure how to do this.
  


The driver can't tell if the interrupt came first, so it's a valid race 
(real hardware has the same race).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set bit 1 in disabled processor's _STA

2009-05-18 Thread Glauber Costa
On Mon, May 18, 2009 at 02:40:18PM +0300, Gleb Natapov wrote:
> On Mon, May 18, 2009 at 08:39:43AM -0300, Glauber Costa wrote:
> > > Now I recall something on LKML about this. Well, in this case Linux
> > > shouldn't have used ACPI to invent its own way to do cpu hot-plug.
> > It didn't.
> > History shows that this method is what is used in some unisys machines,
> > which seems to be the only ones implementing this around.
> The questions are: What is "this" that linux currently implements, how
> windows expects CPU hot-plug to work, are there any real x86 hardware
> that supports CPU host-plug and what should we do about all this.
as I said, there are unisys machines that implements cpu hotplug.
The way they do it, is the way Linux kernel currently expects. The same way
we implement on our BIOS.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set bit 1 in disabled processor's _STA

2009-05-18 Thread Gleb Natapov
On Mon, May 18, 2009 at 09:40:03AM -0300, Glauber Costa wrote:
> On Mon, May 18, 2009 at 02:40:18PM +0300, Gleb Natapov wrote:
> > On Mon, May 18, 2009 at 08:39:43AM -0300, Glauber Costa wrote:
> > > > Now I recall something on LKML about this. Well, in this case Linux
> > > > shouldn't have used ACPI to invent its own way to do cpu hot-plug.
> > > It didn't.
> > > History shows that this method is what is used in some unisys machines,
> > > which seems to be the only ones implementing this around.
> > The questions are: What is "this" that linux currently implements, how
> > windows expects CPU hot-plug to work, are there any real x86 hardware
> > that supports CPU host-plug and what should we do about all this.
> as I said, there are unisys machines that implements cpu hotplug.
> The way they do it, is the way Linux kernel currently expects. The same way
> we implement on our BIOS.
I read that document too. And "based on some input from Natalie of Unisys"
does not sound promising. Are those Unisys machines certified to run
Windows 2008? Because if they are, they surely do something different
from what we are doing.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-18 Thread Alexander Graf


On 17.05.2009, at 23:08, Avi Kivity wrote:


Alexander Graf wrote:
In order to find out why things were slow with nested SVM I hacked  
intercept reporting into debugfs in my local tree and found pretty  
interesting results (using NPT):




[...]

So apparently the most intercepts come from the SVM helper calls  
(clgi, stgi, vmload, vmsave). I guess I need to get back to the  
"emulate when GIF=0" approach to get things fast.


There's only a limited potential here (a factor of three, reducing 6  
exits to 2, less the emulation overhead).  There's a lot more to be  
gained from nested npt, since you'll avoid most of the original  
exits in the first place.


I think the reversed is the case. Look at those numbers (w2k8 bootup):

http://pastebin.ca/1423596

The only thing nested NPT would achieve is a reduction of #NPF exits.  
But they are absolutely in the minority today already. Normal #PF's do  
get directly passed to the guest already.


Of course, this all depends on the workload. For kernbench style  
benchmarks nested NPT probably gives you a bigger win, but anything  
doing IO is slowed down way more than it has to now.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set bit 1 in disabled processor's _STA

2009-05-18 Thread Glauber Costa
On Mon, May 18, 2009 at 03:42:10PM +0300, Gleb Natapov wrote:
> On Mon, May 18, 2009 at 09:40:03AM -0300, Glauber Costa wrote:
> > On Mon, May 18, 2009 at 02:40:18PM +0300, Gleb Natapov wrote:
> > > On Mon, May 18, 2009 at 08:39:43AM -0300, Glauber Costa wrote:
> > > > > Now I recall something on LKML about this. Well, in this case Linux
> > > > > shouldn't have used ACPI to invent its own way to do cpu hot-plug.
> > > > It didn't.
> > > > History shows that this method is what is used in some unisys machines,
> > > > which seems to be the only ones implementing this around.
> > > The questions are: What is "this" that linux currently implements, how
> > > windows expects CPU hot-plug to work, are there any real x86 hardware
> > > that supports CPU host-plug and what should we do about all this.
> > as I said, there are unisys machines that implements cpu hotplug.
> > The way they do it, is the way Linux kernel currently expects. The same way
> > we implement on our BIOS.
> I read that document too. And "based on some input from Natalie of Unisys"
> does not sound promising.
I'm not saying it is promising. Nothing un-specified is. Google search for the
mails exchanged on the time of development of this also point in the direction 
that
this is probably all we have.

> Are those Unisys machines certified to run
> Windows 2008? Because if they are, they surely do something different
> from what we are doing.
Unless they have a time machine, it is unlikely. Windows 2008 did not exist
at the time
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] Emulator: Inject #PF when page was not found

2009-05-18 Thread Alexander Graf


On 17.05.2009, at 22:58, Avi Kivity wrote:


Alexander Graf wrote:


I can't think why it was done for writes.  Normally, a guest page  
fault would be trapped and reflected a long time before emulation,  
in FNAME(page_fault)(), after walk_addr().


Can you give some details on the situation?  What instruction was  
executed, and why kvm tried to emulate it?


I remember it was something about accessing the apic with npt.  
Maybe the real problem was the restricted bit checking that made  
the emulated instruction behave differently from the real mmu.


The apic should not be mapped by Hyper-V's shadow page tables, so  
this should have been handled by page_fault().


I think I only had to include this to find out that the restricted bit  
was checked for, so I got a blue screen in the guest :-).

Hyper-V works fine without this patch on NPT.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] Nested SVM: Implement INVLPGA

2009-05-18 Thread Alexander Graf


On 15.05.2009, at 15:43, Joerg Roedel wrote:


On Fri, May 15, 2009 at 10:22:19AM +0200, Alexander Graf wrote:

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

Signed-off-by: Alexander Graf 
---
arch/x86/kvm/svm.c |   14 +-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 30e6b43..b2c6cf3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm  
*svm, struct kvm_run *kvm_run)

return 1;
}

+static int invlpga_interception(struct vcpu_svm *svm, struct  
kvm_run *kvm_run)

+{
+   struct kvm_vcpu *vcpu = &svm->vcpu;
+   nsvm_printk("INVLPGA\n");
+   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+   skip_emulated_instruction(&svm->vcpu);
+
+   kvm_mmu_reset_context(vcpu);
+   kvm_mmu_load(vcpu);
+   return 1;
+}
+


Hmm, since we flush the TLB on every nested-guest entry I think we can
make this function a nop.


Well we flush the TLB on every VMRUN, but this is still 100% within  
the 2nd level guest, so I think we should do something, no?.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/17] Use statfs to determine size of huge pages

2009-05-18 Thread Anthony Liguori

Joerg Roedel wrote:

On Sun, May 17, 2009 at 08:00:40PM +0300, Avi Kivity wrote:
  

Anthony Liguori wrote:


From: Joerg Roedel 

The current method of finding out the size of huge pages does not work
reliably anymore. Current Linux supports more than one huge page size
but /proc/meminfo only show one of the supported sizes.
To find out the real page size used can be found by calling statfs. This
patch changes qemu to use statfs instead of parsing /proc/meminfo.
  
  

Since we don't support 1GBpages in stable-0.10, this is unneeded.



This patch is needed to run current KVM on a hugetlbfs backed with
1GB pages. Therefore I think this patch is needed. It is an improvement
over the /proc/meminfo parsing anyway and is not strictly related to
kvm kernel support for 1GB pages.
  


Is there any userspace support requirements for 1GB pages?

That is, if you had a 2.6.31 kernel and stable-0.10, would 1GB pages 
work (assuming this patch is backported)?


This patch could still be considered a feature vs. bug fix but I'm 
mostly curious.


Regards,

Anthony Liguori


Joerg

  



--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio-net zero-copy

2009-05-18 Thread Raju Srivastava
Greetings,

Could someone let me know if current virtio-net supports zero-copy? I
see some discussion here:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/28061/
(copyless virtio net thoughts) and it looks like the copyless
virtio-net is not supported by KVM yet. If this is true, then is there
any plan to add the zero copy to the virtio-net?

I use KVM in 10G network environment and found the CPU usage is
saturated when network traffic becomes heavy. The native Linux under
the same load only consumes less than 10% CPU.

Thanks in advance,
Raju
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/17] Use statfs to determine size of huge pages

2009-05-18 Thread Joerg Roedel
On Mon, May 18, 2009 at 08:10:28AM -0500, Anthony Liguori wrote:
> Joerg Roedel wrote:
>> On Sun, May 17, 2009 at 08:00:40PM +0300, Avi Kivity wrote:
>>   
>>> Anthony Liguori wrote:
>>> 
 From: Joerg Roedel 

 The current method of finding out the size of huge pages does not work
 reliably anymore. Current Linux supports more than one huge page size
 but /proc/meminfo only show one of the supported sizes.
 To find out the real page size used can be found by calling statfs. This
 patch changes qemu to use statfs instead of parsing /proc/meminfo.
 
>>> Since we don't support 1GBpages in stable-0.10, this is unneeded.
>>> 
>>
>> This patch is needed to run current KVM on a hugetlbfs backed with
>> 1GB pages. Therefore I think this patch is needed. It is an improvement
>> over the /proc/meminfo parsing anyway and is not strictly related to
>> kvm kernel support for 1GB pages.
>>   
>
> Is there any userspace support requirements for 1GB pages?

The /proc/meminfo parsing code breaks when current KVM is run with
a -mempath on a 1GB backed hugetlbfs.

> That is, if you had a 2.6.31 kernel and stable-0.10, would 1GB pages  
> work (assuming this patch is backported)?

With this patch and kvm kernel support 1GB pages will work. Another
patch is needed to make it more easy to enable the pdpe1gb cpuid bit in
the guest.

Joerg

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-18 Thread Avi Kivity

Alexander Graf wrote:


There's only a limited potential here (a factor of three, reducing 6 
exits to 2, less the emulation overhead).  There's a lot more to be 
gained from nested npt, since you'll avoid most of the original exits 
in the first place.


I think the reversed is the case. Look at those numbers (w2k8 bootup):

http://pastebin.ca/1423596

The only thing nested NPT would achieve is a reduction of #NPF exits. 
But they are absolutely in the minority today already. Normal #PF's do 
get directly passed to the guest already.


#NPF exits are caused when guest/host mappings change, which they don't, 
or by mmio (which happens both for guest and nguest).


I don't understand how you can pass #PFs directly to the guest.  Surely 
the guest has enabled pagefault interception, and you need to set up its 
vmcb?


Of course, this all depends on the workload. For kernbench style 
benchmarks nested NPT probably gives you a bigger win, but anything 
doing IO is slowed down way more than it has to now.


What is causing 17K pio exits/sec?  What port numbers?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-18 Thread Alexander Graf


On 18.05.2009, at 15:29, Avi Kivity wrote:


Alexander Graf wrote:


There's only a limited potential here (a factor of three, reducing  
6 exits to 2, less the emulation overhead).  There's a lot more to  
be gained from nested npt, since you'll avoid most of the original  
exits in the first place.


I think the reversed is the case. Look at those numbers (w2k8  
bootup):


http://pastebin.ca/1423596

The only thing nested NPT would achieve is a reduction of #NPF  
exits. But they are absolutely in the minority today already.  
Normal #PF's do get directly passed to the guest already.


#NPF exits are caused when guest/host mappings change, which they  
don't, or by mmio (which happens both for guest and nguest).


I don't understand how you can pass #PFs directly to the guest.   
Surely the guest has enabled pagefault interception, and you need to  
set up its vmcb?


Ugh - looks like I totally forgot to include #PF exits in my stats,  
which is why I didn't see them.






Of course, this all depends on the workload. For kernbench style  
benchmarks nested NPT probably gives you a bigger win, but anything  
doing IO is slowed down way more than it has to now.


What is causing 17K pio exits/sec?  What port numbers?


Any hints on how to easily find that out? For someone who's too stupid  
to get kvmtrace working :-).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-18 Thread Avi Kivity

Alexander Graf wrote:
Of course, this all depends on the workload. For kernbench style 
benchmarks nested NPT probably gives you a bigger win, but anything 
doing IO is slowed down way more than it has to now.


What is causing 17K pio exits/sec?  What port numbers?


Any hints on how to easily find that out? For someone who's too stupid 
to get kvmtrace working :-).


You can always printk() every 1000 loops, but kvmtrace is actually 
pretty easy to use.  Compile it in, run your guest (pinning to one cpu 
deconfuses the output), run kvmtrace -o blah, then use 
'./kvmtrace_format formats' as a filter on the binary output.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Michael S. Tsirkin
On Mon, May 18, 2009 at 03:33:20PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>>>
 On the other hand, in MSI-X mask bit is mandatory, not optional
 so we'll have to support it for assigned devices at some point.

 If we are worried about speed of masking/unmasking MSI-X interrupts for
 assigned devices (older kernels used to mask them, recent kernels leave
 this to drivers) we will probably need to have MSI-X support in the
 kernel, and have kernel examine the mask bit before injecting the
 interrupt, just like real devices do.
 
>>> Yes.
>>> 
>>
>> Actually, if we do that, we'll need to address a race where a driver
>> has updated the mask bit in the window after we tested it
>> and before we inject the interrupt. Not sure how to do this.
>>   
>
> The driver can't tell if the interrupt came first, so it's a valid race  
> (real hardware has the same race).

The driver for real device can do a read followed by sync_irq to flush
out interrupts.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs

2009-05-18 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Mon, May 18, 2009 at 03:33:20PM +0300, Avi Kivity wrote:
  

Michael S. Tsirkin wrote:


On the other hand, in MSI-X mask bit is mandatory, not optional
so we'll have to support it for assigned devices at some point.

If we are worried about speed of masking/unmasking MSI-X interrupts for
assigned devices (older kernels used to mask them, recent kernels leave
this to drivers) we will probably need to have MSI-X support in the
kernel, and have kernel examine the mask bit before injecting the
interrupt, just like real devices do.

  

Yes.



Actually, if we do that, we'll need to address a race where a driver
has updated the mask bit in the window after we tested it
and before we inject the interrupt. Not sure how to do this.
  
  
The driver can't tell if the interrupt came first, so it's a valid race  
(real hardware has the same race).



The driver for real device can do a read followed by sync_irq to flush
out interrupts.
  


If it generates the interrupt after masking it in the msi-x entry, we'll 
see it.  If it generates the interrupt before masking it, it may or may 
not receive the interrupt, even on real hardware.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-net zero-copy

2009-05-18 Thread Avi Kivity

Raju Srivastava wrote:

Greetings,

Could someone let me know if current virtio-net supports zero-copy? I
see some discussion here:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/28061/
(copyless virtio net thoughts) and it looks like the copyless
virtio-net is not supported by KVM yet. 


That is correct.


If this is true, then is there
any plan to add the zero copy to the virtio-net?
  


Yes, but it will be a difficult journey.


I use KVM in 10G network environment and found the CPU usage is
saturated when network traffic becomes heavy. The native Linux under
the same load only consumes less than 10% CPU.
  


Running the latest guest and host software (both qemu and kernel) will 
improve things somewhat.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/17] Stop/start cpus before/after devices

2009-05-18 Thread Marcelo Tosatti
On Sun, May 17, 2009 at 07:59:51PM +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
>> From: Yaniv Kamay 
>>
>> Stop cpus before devices when stopping the VM, start cpus after devices
>> when starting VM.  Otherwise a vcpu could access a stopped device.
>>
>>   
>
> IIRC we decided this was unnecessary since everything is under  
> qemu_mutex.  However we might want to be on the safe side here and be  
> similar to master.

Yes, it is not necessary. Since upstream is already on the safe side
(and the plan is to merge with it), IMO better leave this patch out.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cleanup arch/x86/kvm/Makefile

2009-05-18 Thread Avi Kivity

Christoph Hellwig wrote:

Use proper foo-y style list additions to cleanup all the conditionals,
move module selection after compound object selection and remove the
superflous comment.

  


I think you're patching the wrong tree.


-kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-   i8254.o
  


I have a timer.o here, for example.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >