Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices

2010-10-21 Thread Sheng Yang
On Thursday 21 October 2010 16:39:07 Michael S. Tsirkin wrote:
> On Thu, Oct 21, 2010 at 04:30:02PM +0800, Sheng Yang wrote:
> > On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote:
> > > This patch enable per-vector mask for assigned devices using MSI-X.
> > 
> > The basic idea of kernel and QEmu's responsibilities are:
> > 
> > 1. Because QEmu owned the irq routing table, so the change of table
> > should still go to the QEmu, like we did in msix_mmio_write().
> > 
> > 2. And the others things can be done in kernel, for performance. Here we
> > covered the reading(converted entry from routing table and mask bit
> > state of enabled MSI-X entries), and writing the mask bit for enabled
> > MSI-X entries. Originally we only has mask bit handled in kernel, but
> > later we found that Linux kernel would read MSI-X mmio just after every
> > writing to mask bit, in order to flush the writing. So we add reading
> > MSI data/addr as well.
> > 
> > 3. Disabled entries's mask bit accessing would go to QEmu, because it may
> > result in disable/enable MSI-X. Explained later.
> > 
> > 4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to
> > decide enable/disable MSI-X for device.
> > .
> 
> Config space yes, but it's a simple global yes/no after all.
> 
> > 5. There is an distinction between enabled entry and disabled entry of
> > MSI-X table.
> 
> That's my point. There's no such thing as 'enabled entries'
> in the spec. There are only masked and unmasked entries.
>
> Current interface deals with gsi numbers so qemu had to work around
> this. The hack used there is removing gsi for masked vector which has 0
> address and data.  It works because this is what linux and windows
> guests happen to do, but it is out of spec: vector/data value for a
> masked entry have no meaning.

Well, I just realized something unnatural about the 0 contained data/address 
entry. So I checked spec again, and found the mask bit should be set after 
reset. 
So after fix this, I think unmasked 0 address/data entry shouldn't be there 
anymore.
 
> Since you are building a new interface, can design it without
> constraints...

A constraint is pci_enable_msix(). We have to use it to allocate irq for each 
entry, as well as program the entry in the real hardware. pci_enable_msix() is 
only a yes/no choice. We can't add new enabled entries after pci_enable_msix(), 
and we can only enable/disable/mask/unmask one IRQ through kernel API, not the 
entry in the MSI-X table. And we still have to allocate new IRQ for new entry.  
When guest unmask "disabled entry", we have to disable and enable MSI-X again 
in 
order to use the new entry. That's why "enabled/disabled entry" concept existed.

So even guest only unmasked one entry, it's a totally different work for KVM 
underlaying. This logic won't change no matter where the MMIO handling is. And 
in 
fact I don't like this kind of tricky things in kernel...

> > The entries we had used for pci_enable_msix()(not necessary in sequence
> > number) are already enabled, the others are disabled. When device's MSI-X
> > is enabled and guest want to enable an disabled entry, we would go back
> > to QEmu because this vector didn't exist in the routing table. Also due
> > to pci_enable_msix() in kernel didn't allow us to enable vectors one by
> > one, but all at once. So we have to disable MSI-X first, then enable it
> > with new entries, which contained the new vector guest want to use. This
> > situation is only happen when device is being initialized. After that,
> > kernel can know and handle the mask bit of the enabled entry.
> > 
> > I've also considered handle all MMIO operation in kernel, and changing
> > irq routing in kernel directly. But as long as irq routing is owned by
> > QEmu, I think it's better to leave to it...
> 
> Yes, this is my suggestion, except we don't need no routing :)
> To inject MSI you just need address/data pairs.
> Look at kvm_set_msi: given address/data you can just inject
> the interrupt. No need for table lookups.

You still need to look up data/address pair in the guest MSI-X table. The 
routing 
table used here is just an replacement for the table, because we can construct 
the 
entry according to the routing table. Two choices, using the routing table, or 
creating an new MSI-X table. 

Still, the key is about who to own the routing/MSI-X table. If kernel own it, 
it 
would be straightforward to intercept all the MMIO in the kernel; but if it's 
QEmu, we still need go back to QEmu for it.
 
> > Notice the mask/unmask bits must be handled together, either in kernel or
> > in userspace. Because if kernel has handled enabled vector's mask bit
> > directly, it would be unsync with QEmu's records. It doesn't matter when
> > QEmu don't access the related record. And the only place QEmu want to
> > consult it's enabled entries' mask bit state is writing to MSI
> > addr/data. The writing should be discarded if the entry is unmasked.
> > This checking has alrea

buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree

2010-10-21 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/549

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree

2010-10-21 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/549

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0

2010-10-21 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/601

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2010-10-21 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/600

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: pre-allocate one more dirty bitmap to avoid vmalloc() in x86's kvm_vm_ioctl_get_dirty_log()

2010-10-21 Thread Marcelo Tosatti
On Thu, Oct 21, 2010 at 05:40:33PM +0900, Takuya Yoshikawa wrote:
> Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
> vmalloc() which will be used in the next logging and this has been causing
> bad effect to VGA and live-migration: vmalloc() consumes extra systime,
> triggers tlb flush, etc.
> 
> This patch resolves this issue by pre-allocating one more bitmap and switching
> between two bitmaps during dirty logging.
> 
> Performance improvement:
>   I measured performance for the case of VGA update by trace-cmd.

>   So the result was 1.5 times faster than the original.
> 
> 
>   In the case of live migration, the improvement ratio depends on the workload
>   and the guest memory size.
> 
> Note:
>   This does not change other architectures's logic but the allocation size
>   becomes twice. This will increase the actual memory consumption only when
>   the new size changes the number of pages allocated by vmalloc().
> 
> Signed-off-by: Takuya Yoshikawa 
> Signed-off-by: Fernando Luis Vazquez Cao 

Looks good to me.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM with two interfaces

2010-10-21 Thread Nirmal Guhan
On Thu, Oct 21, 2010 at 9:23 AM, Nirmal Guhan  wrote:
> Hi,
>
> Am trying to create a VM using qemu-kvm with two interfaces(fedora12
> is the host and vm) and running into an issue. Given below is the
> command :
>
> qemu-kvm -net nic,macaddr=$macaddress,model=pcnet -net
> tap,script=/etc/qemu-ifup -net nic,model=pcnet -net
> tap,script=/etc/qemu-ifup -m 1024 -hda ./vdisk.img -kernel
> ./bzImage-1019 -append "ip=x.y.z.w:a.b.c.d:p.q.r.s:a.b.c.d
> ip=x.y.z.u:a.b.c.d:p.q.r.s:a.b.c.d root=/dev/nfs rw
> nfsroot=x.y.z.v:/blahblahblah"
>
> On boot, both eth0 and eth1 come up but the vm tries to send dhcp and
> rarp requests instead of using the command line IP addresses. DHCP
> would fail in my case.
>
> With just one interface, dhcp is not attempted and nfs mount of root works 
> fine.
>
> Any clue on what could be wrong here?
>
> Thanks,
> Nirmal
>
Can someone help please? Hard pressed on time... sorry

-Nirmal
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSC host kernel message with kvm.git as host kernel and qemu-kvm.git as userspace

2010-10-21 Thread Zachary Amsden

On 10/21/2010 09:49 AM, Lucas Meneghel Rodrigues wrote:

Hi folks,

I was doing some work at one of our host machines, the one that runs
upstream (Fedora 14, kvm.git as the kernel and qemu-kvm.git as
userspace), and I noticed the kernel message:

kvm: unreliable cycle conversion on adjustable rate TSC
kvm: SMP vm created on host with unstable TSC: guest TSC will not be
reliable
   


Yes, the message is a warning; without TSC trapping, SMP VMs will have 
unreliable TSC.  In this case, you are advised to use kvmclock or 
another non-TSC clocksource in the guest.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2 hosts sharing iSCSI

2010-10-21 Thread Francesc Guasch
Hi. I want to build two KVM servers sharing an iSCSI
storage backend. The goal is if one server falls the
other starts all the guests.

I've been searching for documents about best practices
with KVM and iSCSI. I already read

http://pve.proxmox.com/wiki/Storage_Model

as adviced but I'd appreciate if someone could point me
to some specifics about iSCSI. I am not sure if I should
build DRDB on top of it or should I use something like
heartbeat ? Or maybe the iSCSI vendor should provide me
some software to do this ?

Any hint would be appreciated, thank you very much.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TSC host kernel message with kvm.git as host kernel and qemu-kvm.git as userspace

2010-10-21 Thread Lucas Meneghel Rodrigues
Hi folks, 

I was doing some work at one of our host machines, the one that runs
upstream (Fedora 14, kvm.git as the kernel and qemu-kvm.git as
userspace), and I noticed the kernel message:

kvm: unreliable cycle conversion on adjustable rate TSC
kvm: SMP vm created on host with unstable TSC: guest TSC will not be
reliable

The host being tested in this case was RHEL 5.5 i386. Command line of
the install test:

/usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
unix:'/tmp/monitor-humanmonitor1-20101021-160403-PrKT',server,nowait -qmp 
unix:'/tmp/monitor-qmpmonitor1-20101021-160403-PrKT',server,nowait -serial 
unix:'/tmp/serial-20101021-160403-PrKT',server,nowait -drive 
file='/tmp/kvm_autotest_root/images/rhel5-32.qcow2',index=0,if=ide,cache=none 
-device rtl8139,mac=9a:47:46:1a:e6:b9,netdev=idAcPzhW -netdev 
user,id=idAcPzhW,tftp='/usr/local/autotest/tests/kvm/images/rhel55-32/tftpboot',bootfile='/pxelinux.0',hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:12323
 -m 1024 -smp 2 -drive 
file='/tmp/kvm_autotest_root/isos/linux/RHEL-5.5-i386-DVD.iso',media=cdrom,index=2
 -drive 
file='/tmp/kvm_autotest_root/images/rhel55-32/ks.iso',media=cdrom,index=1 -vnc 
:0  -boot d -boot cn

I was oriented by Marcelo to report this to the kvm list and copy Zach
on the e-mail. There, done :)

Lucas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Andrew Beekhof

On 10/21/2010 06:25 PM, Anthony Liguori wrote:

Hi Andrew,

On 10/21/2010 10:43 AM, Andrew Beekhof wrote:

In that case we've done a bad job of the wiki.
Windows and other distributions are a key part of the Matahari vision.

Matahari is two things
- an architecture, and
- an implementation of the most common API sets

Each set of APIs (ie. host, network, services) is an independent
daemon/agent which attaches to a common QMF broker (more on that later).

While some of these might be platform specific, packaging would be one
likely candidate, the intention is to be agnostic distro/platform
wherever possible.

Take netcf for example, instead of re-inventing the wheel we wrote the
windows port for netcf.


So what's this about QMF you ask?
Again, rather than invent our own message protocol we're leveraging an
existing standard that supports windows and linux, is fast, reliable
and secure.

Its also pluggable and discoverable - so simply starting a new agent
that connects to the matahari broker makes it's API available. Any QMF
client/console can also interrogate the guest to see what agents and
API calls are available.

Even better there's going to be a virtio-serial transport. So we can
access the same agents in the same way with or without host-to-guest
networking. This was a key requirement for us because of EC2-like
cloud scenarios where we don't have access to the physical host.


I did get this much and I think I'm doing a poor job explaining myself.

I think Matahari is tackling the same space that many other frameworks
are. For instance, everything you say above is (supposed to be) true for
something like OpenWBEM, Pegasus, etc.


I think the scope is also different.
Our focus is on satisfying concrete needs rather than nebulous 
all-encompassing goals.




The advantage I see in Matahari is that 1) it can take advantage of
virtio-serial 2) it's significantly lighter than CIM 3) it's community
driven.

So there's no doubt in my mind that if you need a way to inventory
physical and virtual systems, something like Matahari becomes a very
appealing option to do that.

But that's not the problem space I'm trying to tackle.


Neither are we really.

I came to Matahari came from clustering (I wrote Pacemaker), so service 
management is my original area of interest.


But for the sake of argument, assume inventory was our sole focus.
There is, by design, a place in the architecture for third-party agents.

Just because the agent wasn't built from matahari.git doesn't mean it 
can't make use of "our" bus.



An example of the problem I'm trying to tackle is guest reboot.


Reboot was one of the first things we implemented :-)



On x86, there is no ACPI interface for requesting a guest reboot. Other
platforms do provide this and we usually try to emulate platform
interfaces whenever possible. In order to implement host-initiated
reboot (aka virDomainReboot) we need to have a mechanism to execute the
reboot command in the guest that's initiated by the hypervisor.

This is not the same thing as a remote system's management interface.
This is something that should be dead-simple and trivially portable.

I think there's symbiotic relationship that a QEMU-based guest agent
could play with a Matahari based agent. But my initial impression is
that the types of problems we're trying to tackle are fundamentally
different than the problem that Matahari is


Honestly, we're interested in whatever the teams that want to work with 
us are interested in.  Our primary mission is to consolidate common 
functionality and avoid N implementations of essentially the same thing.


And I suspect that many people would be interested in having what you're 
working on exposed remotely.



So if you'd like to do this agent as part of Matahari, great.
But if you want to keep it separate and just want to leverage the 
design, that also not a problem.


Or if the core functionality was in a library, we'd happily write the 
glue to make it accessible via QMF and dbus.


There's really a number of levels on which we could work together - if 
you're interested.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] kvm: writeback SMP TSCs on migration only

2010-10-21 Thread Marcelo Tosatti
commit 6389c45441269baa2873e6feafebd17105ddeaf6
Author: Jan Kiszka 
Date:   Mon Mar 1 18:17:26 2010 +0100

qemu-kvm: Cleanup/fix TSC and PV clock writeback

Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 06474d6..e2f7e2e 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -817,7 +817,15 @@ static int kvm_put_msrs(CPUState *env, int level)
 kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
 #endif
 if (level == KVM_PUT_FULL_STATE) {
-kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
+/*
+ * KVM is yet unable to synchronize TSC values of multiple VCPUs on
+ * writeback. Until this is fixed, we only write the offset to SMP
+ * guests after migration, desynchronizing the VCPUs, but avoiding
+ * huge jump-backs that would occur without any writeback at all.
+ */
+if (smp_cpus == 1 || env->tsc != 0) {
+kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
+}
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
   env->system_time_msr);
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] kvm: save/restore x86-64 MSRs on x86-64 kernels

2010-10-21 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   30 --
 1 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e2f7e2e..ae0a034 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -53,6 +54,8 @@
 #define BUS_MCEERR_AO 5
 #endif
 
+static int lm_capable_kernel;
+
 #ifdef KVM_CAP_EXT_CPUID
 
 static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
@@ -523,6 +526,11 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
 {
 int ret;
 
+struct utsname utsname;
+
+uname(&utsname);
+lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
+
 /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
  * directly.  In order to use vm86 mode, a TSS is needed.  Since this
  * must be part of guest physical memory, we need to allocate it.  Older
@@ -810,11 +818,12 @@ static int kvm_put_msrs(CPUState *env, int level)
 if (kvm_has_msr_hsave_pa(env))
 kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 #ifdef TARGET_X86_64
-/* FIXME if lm capable */
-kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
-kvm_msr_entry_set(&msrs[n++], MSR_KERNELGSBASE, env->kernelgsbase);
-kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
-kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
+if (lm_capable_kernel) {
+kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
+kvm_msr_entry_set(&msrs[n++], MSR_KERNELGSBASE, env->kernelgsbase);
+kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask);
+kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
+}
 #endif
 if (level == KVM_PUT_FULL_STATE) {
 /*
@@ -1046,11 +1055,12 @@ static int kvm_get_msrs(CPUState *env)
 msrs[n++].index = MSR_VM_HSAVE_PA;
 msrs[n++].index = MSR_IA32_TSC;
 #ifdef TARGET_X86_64
-/* FIXME lm_capable_kernel */
-msrs[n++].index = MSR_CSTAR;
-msrs[n++].index = MSR_KERNELGSBASE;
-msrs[n++].index = MSR_FMASK;
-msrs[n++].index = MSR_LSTAR;
+if (lm_capable_kernel) {
+msrs[n++].index = MSR_CSTAR;
+msrs[n++].index = MSR_KERNELGSBASE;
+msrs[n++].index = MSR_FMASK;
+msrs[n++].index = MSR_LSTAR;
+}
 #endif
 msrs[n++].index = MSR_KVM_SYSTEM_TIME;
 msrs[n++].index = MSR_KVM_WALL_CLOCK;
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] Fix build on !KVM_CAP_MCE

2010-10-21 Thread Marcelo Tosatti
From: Hidetoshi Seto 

This patch removes following warnings:

target-i386/kvm.c: In function 'kvm_put_msrs':
target-i386/kvm.c:782: error: unused variable 'i'
target-i386/kvm.c: In function 'kvm_get_msrs':
target-i386/kvm.c:1083: error: label at end of compound statement

Signed-off-by: Hidetoshi Seto 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9144f74..587ee19 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -783,7 +783,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 struct kvm_msr_entry entries[100];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
-int i, n = 0;
+int n = 0;
 
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
@@ -805,6 +805,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 }
 #ifdef KVM_CAP_MCE
 if (env->mcg_cap) {
+int i;
 if (level == KVM_PUT_RESET_STATE)
 kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
 else if (level == KVM_PUT_FULL_STATE) {
@@ -1089,9 +1090,9 @@ static int kvm_get_msrs(CPUState *env)
 if (msrs[i].index >= MSR_MC0_CTL &&
 msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
 env->mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data;
-break;
 }
 #endif
+break;
 }
 }
 
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] [PULL] qemu-kvm.git uq/master queue

2010-10-21 Thread Marcelo Tosatti
The following changes since commit 633aa0acfe2c4d3e56acfe28c912796bf54de6d3:

  Fix pci hotplug to generate level triggered interrupt. (2010-10-20 17:23:28 
-0500)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Hidetoshi Seto (3):
  x86, mce: ignore SRAO only when MCG_SER_P is available
  x86, mce: broadcast mce depending on the cpu version
  Fix build on !KVM_CAP_MCE

Marcelo Tosatti (4):
  kvm: add save/restore of MSR_VM_HSAVE_PA
  kvm: factor out kvm_has_msr_star
  kvm: writeback SMP TSCs on migration only
  kvm: save/restore x86-64 MSRs on x86-64 kernels

 target-i386/kvm.c |  132 +++-
 1 files changed, 99 insertions(+), 33 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] kvm: add save/restore of MSR_VM_HSAVE_PA

2010-10-21 Thread Marcelo Tosatti
commit 2bba4446746add456ceeb0e8359a43032a2ea333
Author: Alexander Graf 
Date:   Thu Dec 18 15:38:32 2008 +0100

Enable nested SVM support in userspace

Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 587ee19..e6c9a1d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -790,6 +790,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
 if (kvm_has_msr_star(env))
kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
+kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 #ifdef TARGET_X86_64
 /* FIXME if lm capable */
 kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -1015,6 +1016,7 @@ static int kvm_get_msrs(CPUState *env)
 msrs[n++].index = MSR_IA32_SYSENTER_EIP;
 if (kvm_has_msr_star(env))
msrs[n++].index = MSR_STAR;
+msrs[n++].index = MSR_VM_HSAVE_PA;
 msrs[n++].index = MSR_IA32_TSC;
 #ifdef TARGET_X86_64
 /* FIXME lm_capable_kernel */
@@ -1071,6 +1073,9 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_IA32_TSC:
 env->tsc = msrs[i].data;
 break;
+case MSR_VM_HSAVE_PA:
+env->vm_hsave = msrs[i].data;
+break;
 case MSR_KVM_SYSTEM_TIME:
 env->system_time_msr = msrs[i].data;
 break;
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] x86, mce: ignore SRAO only when MCG_SER_P is available

2010-10-21 Thread Marcelo Tosatti
From: Hidetoshi Seto 

And restruct this block to call kvm_mce_in_exception() only when it is
required.

Signed-off-by: Hidetoshi Seto 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 512d533..b813953 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -239,12 +239,16 @@ static void kvm_do_inject_x86_mce(void *_data)
 struct kvm_x86_mce_data *data = _data;
 int r;
 
-/* If there is an MCE excpetion being processed, ignore this SRAO MCE */
-r = kvm_mce_in_exception(data->env);
-if (r == -1)
-fprintf(stderr, "Failed to get MCE status\n");
-else if (r && !(data->mce->status & MCI_STATUS_AR))
-return;
+/* If there is an MCE exception being processed, ignore this SRAO MCE */
+if ((data->env->mcg_cap & MCG_SER_P) &&
+!(data->mce->status & MCI_STATUS_AR)) {
+r = kvm_mce_in_exception(data->env);
+if (r == -1) {
+fprintf(stderr, "Failed to get MCE status\n");
+} else if (r) {
+return;
+}
+}
 
 r = kvm_set_mce(data->env, data->mce);
 if (r < 0) {
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] x86, mce: broadcast mce depending on the cpu version

2010-10-21 Thread Marcelo Tosatti
From: Hidetoshi Seto 

There is no reason why SRAO event received by the main thread
is the only one that being broadcasted.

According to the x86 ASDM vol.3A 15.10.4.1,
MCE signal is broadcast on processor version 06H_EH or later.

This change is required to handle SRAR in smp guests.

Signed-off-by: Hidetoshi Seto 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   29 -
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index b813953..9144f74 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1636,6 +1636,28 @@ static void hardware_memory_error(void)
 exit(1);
 }
 
+#ifdef KVM_CAP_MCE
+static void kvm_mce_broadcast_rest(CPUState *env)
+{
+CPUState *cenv;
+int family, model, cpuver = env->cpuid_version;
+
+family = (cpuver >> 8) & 0xf;
+model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf);
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+if ((family == 6 && model >= 14) || family > 6) {
+for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
+if (cenv == env) {
+continue;
+}
+kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+}
+}
+}
+#endif
+
 int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
@@ -1693,6 +1715,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void 
*addr)
 fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
 abort();
 }
+kvm_mce_broadcast_rest(env);
 } else
 #endif
 {
@@ -1715,7 +1738,6 @@ int kvm_on_sigbus(int code, void *addr)
 void *vaddr;
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
-CPUState *cenv;
 
 /* Hope we are lucky for AO MCE */
 vaddr = addr;
@@ -1731,10 +1753,7 @@ int kvm_on_sigbus(int code, void *addr)
 kvm_inject_x86_mce(first_cpu, 9, status,
MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
(MCM_ADDR_PHYS << 6) | 0xc, 1);
-for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
-}
+kvm_mce_broadcast_rest(first_cpu);
 } else
 #endif
 {
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] kvm: factor out kvm_has_msr_star

2010-10-21 Thread Marcelo Tosatti
And add kvm_has_msr_hsave_pa(), to avoid warnings on older
kernels without support.

Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   41 ++---
 1 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e6c9a1d..06474d6 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -438,23 +438,26 @@ void kvm_arch_reset_vcpu(CPUState *env)
 }
 }
 
-static int kvm_has_msr_star(CPUState *env)
+int has_msr_star;
+int has_msr_hsave_pa;
+
+static void kvm_supported_msrs(CPUState *env)
 {
-static int has_msr_star;
+static int kvm_supported_msrs;
 int ret;
 
 /* first time */
-if (has_msr_star == 0) {
+if (kvm_supported_msrs == 0) {
 struct kvm_msr_list msr_list, *kvm_msr_list;
 
-has_msr_star = -1;
+kvm_supported_msrs = -1;
 
 /* Obtain MSR list from KVM.  These are the MSRs that we must
  * save/restore */
 msr_list.nmsrs = 0;
 ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, &msr_list);
 if (ret < 0 && ret != -E2BIG) {
-return 0;
+return;
 }
 /* Old kernel modules had a bug and could write beyond the provided
memory. Allocate at least a safe amount of 1K. */
@@ -470,7 +473,11 @@ static int kvm_has_msr_star(CPUState *env)
 for (i = 0; i < kvm_msr_list->nmsrs; i++) {
 if (kvm_msr_list->indices[i] == MSR_STAR) {
 has_msr_star = 1;
-break;
+continue;
+}
+if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) {
+has_msr_hsave_pa = 1;
+continue;
 }
 }
 }
@@ -478,9 +485,19 @@ static int kvm_has_msr_star(CPUState *env)
 free(kvm_msr_list);
 }
 
-if (has_msr_star == 1)
-return 1;
-return 0;
+return;
+}
+
+static int kvm_has_msr_hsave_pa(CPUState *env)
+{
+kvm_supported_msrs(env);
+return has_msr_hsave_pa;
+}
+
+static int kvm_has_msr_star(CPUState *env)
+{
+kvm_supported_msrs(env);
+return has_msr_star;
 }
 
 static int kvm_init_identity_map_page(KVMState *s)
@@ -790,7 +807,8 @@ static int kvm_put_msrs(CPUState *env, int level)
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
 if (kvm_has_msr_star(env))
kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
-kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
+if (kvm_has_msr_hsave_pa(env))
+kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 #ifdef TARGET_X86_64
 /* FIXME if lm capable */
 kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -1016,7 +1034,8 @@ static int kvm_get_msrs(CPUState *env)
 msrs[n++].index = MSR_IA32_SYSENTER_EIP;
 if (kvm_has_msr_star(env))
msrs[n++].index = MSR_STAR;
-msrs[n++].index = MSR_VM_HSAVE_PA;
+if (kvm_has_msr_hsave_pa(env))
+msrs[n++].index = MSR_VM_HSAVE_PA;
 msrs[n++].index = MSR_IA32_TSC;
 #ifdef TARGET_X86_64
 /* FIXME lm_capable_kernel */
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Move KVM context switch into own function

2010-10-21 Thread Marcelo Tosatti
On Wed, Oct 20, 2010 at 05:56:17PM +0200, Andi Kleen wrote:
> From: Andi Kleen 
> 
> gcc 4.5 with some special options is able to duplicate the VMX
> context switch asm in vmx_vcpu_run(). This results in a compile error
> because the inline asm sequence uses an on local label. The non local
> label is needed because other code wants to set up the return address.
> 
> This patch moves the asm code into an own function and marks
> that explicitely noinline to avoid this problem.
> 
> Better would be probably to just move it into an .S file.
> 
> The diff looks worse than the change really is, it's all just
> code movement and no logic change.
> 
> Signed-off-by: Andi Kleen 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason

2010-10-21 Thread Marcelo Tosatti
On Wed, Oct 20, 2010 at 06:34:54PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> May otherwise generates build warnings about unused
> kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST
> enabled.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/include/asm/kvm_para.h |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
> So there's no doubt in my mind that if you need a way to inventory
> physical and virtual systems, something like Matahari becomes a very
> appealing option to do that.
> 
> But that's not the problem space I'm trying to tackle.
> 
> An example of the problem I'm trying to tackle is guest reboot.

Matahari already has shutdown and reboot methods.

Inventory, reboot, filesystem freeze, cut'n paste, etc.. all are
communicating between host and guest.  Main point is to consolidate
effort to keep from having some sprawl of agents (which agent do I
install to do reboot?).

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Anthony Liguori

Hi Andrew,

On 10/21/2010 10:43 AM, Andrew Beekhof wrote:

In that case we've done a bad job of the wiki.
Windows and other distributions are a key part of the Matahari vision.

Matahari is two things
- an architecture, and
- an implementation of the most common API sets

Each set of APIs (ie. host, network, services) is an independent 
daemon/agent which attaches to a common QMF broker (more on that later).


While some of these might be platform specific, packaging would be one 
likely candidate, the intention is to be agnostic distro/platform 
wherever possible.


Take netcf for example, instead of re-inventing the wheel we wrote the 
windows port for netcf.



So what's this about QMF you ask?
Again, rather than invent our own message protocol we're leveraging an 
existing standard that supports windows and linux, is fast, reliable 
and secure.


Its also pluggable and discoverable - so simply starting a new agent 
that connects to the matahari broker makes it's API available.  Any 
QMF client/console can also interrogate the guest to see what agents 
and API calls are available.


Even better there's going to be a virtio-serial transport.  So we can 
access the same agents in the same way with or without host-to-guest 
networking.  This was a key requirement for us because of EC2-like 
cloud scenarios where we don't have access to the physical host.


I did get this much and I think I'm doing a poor job explaining myself.

I think Matahari is tackling the same space that many other frameworks 
are.  For instance, everything you say above is (supposed to be) true 
for something like OpenWBEM, Pegasus, etc.


The advantage I see in Matahari is that 1) it can take advantage of 
virtio-serial 2) it's significantly lighter than CIM 3) it's community 
driven.


So there's no doubt in my mind that if you need a way to inventory 
physical and virtual systems, something like Matahari becomes a very 
appealing option to do that.


But that's not the problem space I'm trying to tackle.

An example of the problem I'm trying to tackle is guest reboot.

On x86, there is no ACPI interface for requesting a guest reboot.  Other 
platforms do provide this and we usually try to emulate platform 
interfaces whenever possible.  In order to implement host-initiated 
reboot (aka virDomainReboot) we need to have a mechanism to execute the 
reboot command in the guest that's initiated by the hypervisor.


This is not the same thing as a remote system's management interface.  
This is something that should be dead-simple and trivially portable.


I think there's symbiotic relationship that a QEMU-based guest agent 
could play with a Matahari based agent.  But my initial impression is 
that the types of problems we're trying to tackle are fundamentally 
different than the problem that Matahari is even if it superficially 
appears like there's an overlap.


For instance, communication window locations to enable "coherence" mode 
in the QEMU GUI is something that makes no sense as a network management 
interface.  This is something that only makes sense between QEMU and the 
guest agent.


Regards,

Anthony Liguori



Thats probably enough for the moment, I'd better go make dinner :-)



It
exposes interfaces for manipulation of RPM packages, relies on netcf,
etc.

FYI netcf is not Fedora specific. There is a Win32 backend for it
too. It does need porting to other Linux distros, but that's simply
an internal implementation issue. The goal of netcf is to be the
libvirt of network config mgmt - a portable API for all OS network
config tasks. Further, Matahari itself is also being ported to Win32
and can be ported to other Linux distros too.


Yeah, I'm aware of the goals of netcf but that hasn't materialized a
port to other distros.

Let me be clear, I don't think this is a problem for libvirt,
NetworkManager, or even Matahari.

But for a QEMU guest agent where we terminate the APIs within QEMU
itself, I do think it creates a pretty nasty portability barrier.

There's nothing wrong with this if the goal of Matahari is to 
provide a

robust agent for Fedora-based Linux distributions but I don't think it
meets the requirements of a QEMU guest agent.

I don't think we can overly optimize for one Linux distribution either
so a mentality of letting other platforms contribute their own support
probably won't work.

That is not the goal of Matahari. It is intended to be generically
applicable to *all* guest OS. Obviously in areas where every distro
does different things, then it will need porting for each different
impl. You have to start somewhere and it started with Fedora. This
is all is true of any guest agent solution.


There's two approaches that could be taken for a guest agent. You could
provide very low level interfaces (read a file, execute a command, read
a registry key). This makes for a very portable guest agent at the cost
of complexity in interacting with the agent. The agent doesn't ever
really need to change much the cl

VM with two interfaces

2010-10-21 Thread Nirmal Guhan
Hi,

Am trying to create a VM using qemu-kvm with two interfaces(fedora12
is the host and vm) and running into an issue. Given below is the
command :

qemu-kvm -net nic,macaddr=$macaddress,model=pcnet -net
tap,script=/etc/qemu-ifup -net nic,model=pcnet -net
tap,script=/etc/qemu-ifup -m 1024 -hda ./vdisk.img -kernel
./bzImage-1019 -append "ip=x.y.z.w:a.b.c.d:p.q.r.s:a.b.c.d
ip=x.y.z.u:a.b.c.d:p.q.r.s:a.b.c.d root=/dev/nfs rw
nfsroot=x.y.z.v:/blahblahblah"

On boot, both eth0 and eth1 come up but the vm tries to send dhcp and
rarp requests instead of using the command line IP addresses. DHCP
would fail in my case.

With just one interface, dhcp is not attempted and nfs mount of root works fine.

Any clue on what could be wrong here?

Thanks,
Nirmal
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, mce: broadcast mce depending on the cpu version

2010-10-21 Thread Marcelo Tosatti
On Thu, Oct 21, 2010 at 05:47:06PM +0900, Hidetoshi Seto wrote:
> There is no reason why SRAO event received by the main thread
> is the only one that being broadcasted.
> 
> According to the x86 ASDM vol.3A 15.10.4.1,
> MCE signal is broadcast on processor version 06H_EH or later.
> 
> This change is required to handle SRAR in smp guests.
> 
> Signed-off-by: Hidetoshi Seto 
> ---
>  target-i386/kvm.c |   29 -
>  1 files changed, 24 insertions(+), 5 deletions(-)

Applied all to uq/master, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Add support for async page fault to qemu

2010-10-21 Thread Marcelo Tosatti
On Thu, Oct 21, 2010 at 05:08:45PM +0200, Gleb Natapov wrote:
> Add save/restore of MSR for migration and cpuid bit.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  v1->v2
>   - use vmstate subsection to migrate new msr.
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 59aacd0..145c863 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -678,6 +678,9 @@ static int get_msr_entry(struct kvm_msr_entry *entry, 
> CPUState *env)
>  env->mcg_ctl = entry->data;
>  break;
>  #endif
> +case MSR_KVM_ASYNC_PF_EN:
> +env->async_pf_en_msr = entry->data;
> +break;
>  default:
>  #ifdef KVM_CAP_MCE
>  if (entry->index >= MSR_MC0_CTL &&
> @@ -967,6 +970,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
>  }
>  kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, 
> env->system_time_msr);
>  kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, 
> env->wall_clock_msr);
> +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, 
> env->async_pf_en_msr);
>  }
>  #ifdef KVM_CAP_MCE
>  if (env->mcg_cap) {
> @@ -1186,6 +1190,7 @@ void kvm_arch_save_regs(CPUState *env)
>  #endif
>  msrs[n++].index = MSR_KVM_SYSTEM_TIME;
>  msrs[n++].index = MSR_KVM_WALL_CLOCK;
> +msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
>  
>  #ifdef KVM_CAP_MCE
>  if (env->mcg_cap) {
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 8b6efed..6d1d6a0 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -669,6 +669,7 @@ typedef struct CPUX86State {
>  #endif
>  uint64_t system_time_msr;
>  uint64_t wall_clock_msr;
> +uint64_t async_pf_en_msr;
>  
>  uint64_t tsc;
>  
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index d63fdcb..0ee1f88 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -73,7 +73,7 @@ static const char *ext3_feature_name[] = {
>  };
>  
>  static const char *kvm_feature_name[] = {
> -"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL,
> +"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, "kvm_asyncpf", NULL, 
> NULL, NULL,
>  NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f4fc063..0eb1e90 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -151,6 +151,9 @@ struct kvm_para_features {
>  #ifdef KVM_CAP_PV_MMU
>  { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
>  #endif
> +#ifdef KVM_CAP_ASYNC_PF
> +{ KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
> +#endif
>  { -1, -1 }
>  };
>  
> @@ -672,6 +675,7 @@ static int kvm_put_msrs(CPUState *env, int level)
>  kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
>env->system_time_msr);
>  kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, 
> env->wall_clock_msr);
> +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, 
> env->async_pf_en_msr);
>  }
>  
>  msr_data.info.nmsrs = n;
> @@ -880,6 +884,7 @@ static int kvm_get_msrs(CPUState *env)
>  #endif
>  msrs[n++].index = MSR_KVM_SYSTEM_TIME;
>  msrs[n++].index = MSR_KVM_WALL_CLOCK;
> +msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
>  
>  msr_data.info.nmsrs = n;
>  ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data);
> @@ -926,6 +931,9 @@ static int kvm_get_msrs(CPUState *env)
>  case MSR_VM_HSAVE_PA:
>  env->vm_hsave = msrs[i].data;
>  break;
> + case MSR_KVM_ASYNC_PF_EN:
> +env->async_pf_en_msr = msrs[i].data;
> +break;
>  }

I think this is going to break the build if MSR_KVM_ASYNC_PF_EN is not
defined.

Please regenerate against uq/master.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Andrew Beekhof

On 10/21/2010 03:32 PM, Anthony Liguori wrote:

On 10/21/2010 08:18 AM, Daniel P. Berrange wrote:

On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote:

Hi Andrew,

On 10/21/2010 05:22 AM, Andrew Beekhof wrote:

Hello from the Matahari tech-lead...
Is there any documentation on the capabilities provided guest agent
Anthony is creating? Perhaps we can combine efforts.

Mike should be posting today or tomorrow.


Also happy to provide more information on Matahari if anyone is
interested.

I'd really like to hear more about Matahari's long term vision.

For a QEMU guest agent, we need something that is very portable. The
interfaces it provides need to be reasonably guest agnostic and we need
to support a wide range of guests including Windows, Linux, *BSD, etc.

From the little bit I've read about Matahari, it seems to be pretty
specific and pretty oriented towards Fedora-like distributions.


In that case we've done a bad job of the wiki.
Windows and other distributions are a key part of the Matahari vision.

Matahari is two things
- an architecture, and
- an implementation of the most common API sets

Each set of APIs (ie. host, network, services) is an independent 
daemon/agent which attaches to a common QMF broker (more on that later).


While some of these might be platform specific, packaging would be one 
likely candidate, the intention is to be agnostic distro/platform 
wherever possible.


Take netcf for example, instead of re-inventing the wheel we wrote the 
windows port for netcf.



So what's this about QMF you ask?
Again, rather than invent our own message protocol we're leveraging an 
existing standard that supports windows and linux, is fast, reliable and 
secure.


Its also pluggable and discoverable - so simply starting a new agent 
that connects to the matahari broker makes it's API available.  Any QMF 
client/console can also interrogate the guest to see what agents and API 
calls are available.


Even better there's going to be a virtio-serial transport.  So we can 
access the same agents in the same way with or without host-to-guest 
networking.  This was a key requirement for us because of EC2-like cloud 
scenarios where we don't have access to the physical host.



Thats probably enough for the moment, I'd better go make dinner :-)



It
exposes interfaces for manipulation of RPM packages, relies on netcf,
etc.

FYI netcf is not Fedora specific. There is a Win32 backend for it
too. It does need porting to other Linux distros, but that's simply
an internal implementation issue. The goal of netcf is to be the
libvirt of network config mgmt - a portable API for all OS network
config tasks. Further, Matahari itself is also being ported to Win32
and can be ported to other Linux distros too.


Yeah, I'm aware of the goals of netcf but that hasn't materialized a
port to other distros.

Let me be clear, I don't think this is a problem for libvirt,
NetworkManager, or even Matahari.

But for a QEMU guest agent where we terminate the APIs within QEMU
itself, I do think it creates a pretty nasty portability barrier.


There's nothing wrong with this if the goal of Matahari is to provide a
robust agent for Fedora-based Linux distributions but I don't think it
meets the requirements of a QEMU guest agent.

I don't think we can overly optimize for one Linux distribution either
so a mentality of letting other platforms contribute their own support
probably won't work.

That is not the goal of Matahari. It is intended to be generically
applicable to *all* guest OS. Obviously in areas where every distro
does different things, then it will need porting for each different
impl. You have to start somewhere and it started with Fedora. This
is all is true of any guest agent solution.


There's two approaches that could be taken for a guest agent. You could
provide very low level interfaces (read a file, execute a command, read
a registry key). This makes for a very portable guest agent at the cost
of complexity in interacting with the agent. The agent doesn't ever
really need to change much the client (QEMU) needs to handle many
different types of guests, and add new functionality based on the
supported primitives.

Another approach is to put the complexity in the agent and simplify the
management interface. For system's management applications, this is
probably the right approach. For virtualization, I think this is a bad
approach.

Very specifically, netcf only really needs to read and write
configuration files and potentially run a command. Instead of linking
against netcf in the guest, we should link against netcf in QEMU so that
we don't have to constantly change the guest agent.

Regards,

Anthony Liguori



Regards,
Daniel





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] Add support for async page fault to qemu

2010-10-21 Thread Gleb Natapov
Add save/restore of MSR for migration and cpuid bit.

Signed-off-by: Gleb Natapov 
---
 v1->v2
  - use vmstate subsection to migrate new msr.

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 59aacd0..145c863 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -678,6 +678,9 @@ static int get_msr_entry(struct kvm_msr_entry *entry, 
CPUState *env)
 env->mcg_ctl = entry->data;
 break;
 #endif
+case MSR_KVM_ASYNC_PF_EN:
+env->async_pf_en_msr = entry->data;
+break;
 default:
 #ifdef KVM_CAP_MCE
 if (entry->index >= MSR_MC0_CTL &&
@@ -967,6 +970,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 }
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, 
env->system_time_msr);
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, 
env->async_pf_en_msr);
 }
 #ifdef KVM_CAP_MCE
 if (env->mcg_cap) {
@@ -1186,6 +1190,7 @@ void kvm_arch_save_regs(CPUState *env)
 #endif
 msrs[n++].index = MSR_KVM_SYSTEM_TIME;
 msrs[n++].index = MSR_KVM_WALL_CLOCK;
+msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
 
 #ifdef KVM_CAP_MCE
 if (env->mcg_cap) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 8b6efed..6d1d6a0 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -669,6 +669,7 @@ typedef struct CPUX86State {
 #endif
 uint64_t system_time_msr;
 uint64_t wall_clock_msr;
+uint64_t async_pf_en_msr;
 
 uint64_t tsc;
 
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index d63fdcb..0ee1f88 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -73,7 +73,7 @@ static const char *ext3_feature_name[] = {
 };
 
 static const char *kvm_feature_name[] = {
-"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL,
+"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, "kvm_asyncpf", NULL, NULL, 
NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f4fc063..0eb1e90 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -151,6 +151,9 @@ struct kvm_para_features {
 #ifdef KVM_CAP_PV_MMU
 { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
+#ifdef KVM_CAP_ASYNC_PF
+{ KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
+#endif
 { -1, -1 }
 };
 
@@ -672,6 +675,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
   env->system_time_msr);
 kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
+kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, 
env->async_pf_en_msr);
 }
 
 msr_data.info.nmsrs = n;
@@ -880,6 +884,7 @@ static int kvm_get_msrs(CPUState *env)
 #endif
 msrs[n++].index = MSR_KVM_SYSTEM_TIME;
 msrs[n++].index = MSR_KVM_WALL_CLOCK;
+msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
 
 msr_data.info.nmsrs = n;
 ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data);
@@ -926,6 +931,9 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_VM_HSAVE_PA:
 env->vm_hsave = msrs[i].data;
 break;
+   case MSR_KVM_ASYNC_PF_EN:
+env->async_pf_en_msr = msrs[i].data;
+break;
 }
 }
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 4398801..bf14067 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -373,6 +373,24 @@ static int cpu_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static bool async_pf_msr_needed(void *opaque)
+{
+CPUState *cpu = opaque;
+
+return cpu->async_pf_en_msr != 0;
+}
+
+static const VMStateDescription vmstate_async_pf_msr = {
+.name = "cpu/async_pf_msr",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(async_pf_en_msr, CPUState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_cpu = {
 .name = "cpu",
 .version_id = CPU_SAVE_VERSION,
@@ -476,6 +494,14 @@ static const VMStateDescription vmstate_cpu = {
 VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
 VMSTATE_END_OF_LIST()
 /* The above list is not sorted /wrt version numbers, watch out! */
+},
+.subsections = (VMStateSubsection []) {
+{
+.vmsd = &vmstate_async_pf_msr,
+.needed = async_pf_msr_needed,
+} , {
+/* empty */
+}
 }
 };
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/22] bitops: rename generic little-endian bitops functions

2010-10-21 Thread Arnd Bergmann
On Thursday 21 October 2010, Akinobu Mita wrote:
> As a preparation for providing little-endian bitops for all architectures,
> This removes generic_ prefix from little-endian bitops function names
> in asm-generic/bitops/le.h.
> 
> s/generic_find_next_le_bit/find_next_le_bit/
> s/generic_find_next_zero_le_bit/find_next_zero_le_bit/
> s/generic_find_first_zero_le_bit/find_first_zero_le_bit/
> s/generic___test_and_set_le_bit/__test_and_set_le_bit/
> s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/
> s/generic_test_le_bit/test_le_bit/
> s/generic___set_le_bit/__set_le_bit/
> s/generic___clear_le_bit/__clear_le_bit/
> s/generic_test_and_set_le_bit/test_and_set_le_bit/
> s/generic_test_and_clear_le_bit/test_and_clear_le_bit/
> 
> Signed-off-by: Akinobu Mita 
> Cc: Hans-Christian Egtvedt 
> Cc: Geert Uytterhoeven 
> Cc: Roman Zippel 
> Cc: Andreas Schwab 
> Cc: linux-m...@lists.linux-m68k.org
> Cc: Greg Ungerer 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Andy Grover 
> Cc: rds-de...@oss.oracle.com
> Cc: "David S. Miller" 
> Cc: net...@vger.kernel.org
> Cc: Avi Kivity 
> Cc: Marcelo Tosatti 
> Cc: kvm@vger.kernel.org

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/22] bitops: rename generic little-endian bitops functions

2010-10-21 Thread Akinobu Mita
As a preparation for providing little-endian bitops for all architectures,
This removes generic_ prefix from little-endian bitops function names
in asm-generic/bitops/le.h.

s/generic_find_next_le_bit/find_next_le_bit/
s/generic_find_next_zero_le_bit/find_next_zero_le_bit/
s/generic_find_first_zero_le_bit/find_first_zero_le_bit/
s/generic___test_and_set_le_bit/__test_and_set_le_bit/
s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/
s/generic_test_le_bit/test_le_bit/
s/generic___set_le_bit/__set_le_bit/
s/generic___clear_le_bit/__clear_le_bit/
s/generic_test_and_set_le_bit/test_and_set_le_bit/
s/generic_test_and_clear_le_bit/test_and_clear_le_bit/

Signed-off-by: Akinobu Mita 
Cc: Hans-Christian Egtvedt 
Cc: Geert Uytterhoeven 
Cc: Roman Zippel 
Cc: Andreas Schwab 
Cc: linux-m...@lists.linux-m68k.org
Cc: Greg Ungerer 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-...@lists.ozlabs.org
Cc: Andy Grover 
Cc: rds-de...@oss.oracle.com
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: kvm@vger.kernel.org
---
No change from previous submission

 arch/avr32/kernel/avr32_ksyms.c  |4 ++--
 arch/avr32/lib/findbit.S |4 ++--
 arch/m68k/include/asm/bitops_mm.h|8 
 arch/m68k/include/asm/bitops_no.h|2 +-
 arch/powerpc/include/asm/bitops.h|   11 ++-
 include/asm-generic/bitops/ext2-non-atomic.h |   12 ++--
 include/asm-generic/bitops/le.h  |   26 +-
 include/asm-generic/bitops/minix-le.h|   10 +-
 lib/find_next_bit.c  |9 -
 net/rds/cong.c   |6 +++---
 virt/kvm/kvm_main.c  |2 +-
 11 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c
index 11e310c..c63b943 100644
--- a/arch/avr32/kernel/avr32_ksyms.c
+++ b/arch/avr32/kernel/avr32_ksyms.c
@@ -58,8 +58,8 @@ EXPORT_SYMBOL(find_first_zero_bit);
 EXPORT_SYMBOL(find_next_zero_bit);
 EXPORT_SYMBOL(find_first_bit);
 EXPORT_SYMBOL(find_next_bit);
-EXPORT_SYMBOL(generic_find_next_le_bit);
-EXPORT_SYMBOL(generic_find_next_zero_le_bit);
+EXPORT_SYMBOL(find_next_le_bit);
+EXPORT_SYMBOL(find_next_zero_le_bit);
 
 /* I/O primitives (lib/io-*.S) */
 EXPORT_SYMBOL(__raw_readsb);
diff --git a/arch/avr32/lib/findbit.S b/arch/avr32/lib/findbit.S
index 997b33b..6880d85 100644
--- a/arch/avr32/lib/findbit.S
+++ b/arch/avr32/lib/findbit.S
@@ -123,7 +123,7 @@ ENTRY(find_next_bit)
brgt1b
retal   r11
 
-ENTRY(generic_find_next_le_bit)
+ENTRY(find_next_le_bit)
lsr r8, r10, 5
sub r9, r11, r10
retle   r11
@@ -153,7 +153,7 @@ ENTRY(generic_find_next_le_bit)
brgt1b
retal   r11
 
-ENTRY(generic_find_next_zero_le_bit)
+ENTRY(find_next_zero_le_bit)
lsr r8, r10, 5
sub r9, r11, r10
retle   r11
diff --git a/arch/m68k/include/asm/bitops_mm.h 
b/arch/m68k/include/asm/bitops_mm.h
index b4ecdaa..f1010ab 100644
--- a/arch/m68k/include/asm/bitops_mm.h
+++ b/arch/m68k/include/asm/bitops_mm.h
@@ -366,9 +366,9 @@ static inline int minix_test_bit(int nr, const void *vaddr)
 #define ext2_clear_bit(nr, addr)   __test_and_clear_bit((nr) ^ 24, 
(unsigned long *)(addr))
 #define ext2_clear_bit_atomic(lock, nr, addr)  test_and_clear_bit((nr) ^ 24, 
(unsigned long *)(addr))
 #define ext2_find_next_zero_bit(addr, size, offset) \
-   generic_find_next_zero_le_bit((unsigned long *)addr, size, offset)
+   find_next_zero_le_bit((unsigned long *)addr, size, offset)
 #define ext2_find_next_bit(addr, size, offset) \
-   generic_find_next_le_bit((unsigned long *)addr, size, offset)
+   find_next_le_bit((unsigned long *)addr, size, offset)
 
 static inline int ext2_test_bit(int nr, const void *vaddr)
 {
@@ -398,7 +398,7 @@ static inline int ext2_find_first_zero_bit(const void 
*vaddr, unsigned size)
return (p - addr) * 32 + res;
 }
 
-static inline unsigned long generic_find_next_zero_le_bit(const unsigned long 
*addr,
+static inline unsigned long find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset)
 {
const unsigned long *p = addr + (offset >> 5);
@@ -440,7 +440,7 @@ static inline int ext2_find_first_bit(const void *vaddr, 
unsigned size)
return (p - addr) * 32 + res;
 }
 
-static inline unsigned long generic_find_next_le_bit(const unsigned long *addr,
+static inline unsigned long find_next_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset)
 {
const unsigned long *p = addr + (offset >> 5);
diff --git a/arch/m68k/include/asm/bitops_no.h 
b/arch/m68k/include/asm/bitops_no.h
index 9d3cbe5..292e1ce 100644
--- a/arch/m68k/include/asm/bitops_no.h
+++ b/arch/m68k/include/asm/bitops_no.h
@@ -325,7 +32

[PATCH v2 09/22] kvm: stop including asm-generic/bitops/le.h

2010-10-21 Thread Akinobu Mita
No need to include asm-generic/bitops/le.h as all architectures
provide little-endian bit operations now.

Signed-off-by: Akinobu Mita 
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: kvm@vger.kernel.org
---
No change from previous submission

 virt/kvm/kvm_main.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2d9927c..e5d190f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -52,7 +52,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "coalesced_mmio.h"
 
-- 
1.7.1.231.gd0b16

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Anthony Liguori

On 10/21/2010 08:18 AM, Daniel P. Berrange wrote:

On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote:
   

Hi Andrew,

On 10/21/2010 05:22 AM, Andrew Beekhof wrote:
 

Hello from the Matahari tech-lead...
Is there any documentation on the capabilities provided guest agent
Anthony is creating?  Perhaps we can combine efforts.
   

Mike should be posting today or tomorrow.

 

Also happy to provide more information on Matahari if anyone is
interested.
   

I'd really like to hear more about Matahari's long term vision.

For a QEMU guest agent, we need something that is very portable.  The
interfaces it provides need to be reasonably guest agnostic and we need
to support a wide range of guests including Windows, Linux, *BSD, etc.

 From the little bit I've read about Matahari, it seems to be pretty
specific and pretty oriented towards Fedora-like distributions.  It
exposes interfaces for manipulation of RPM packages, relies on netcf, etc.
 

FYI netcf is not Fedora specific. There is a Win32 backend for it
too. It does need porting to other Linux distros, but that's simply
an internal implementation issue. The goal of netcf is to be the
libvirt of network config mgmt - a portable API for all OS network
config tasks. Further, Matahari itself is also being ported to Win32
and can be ported to other Linux distros too.
   


Yeah, I'm aware of the goals of netcf but that hasn't materialized a 
port to other distros.


Let me be clear, I don't think this is a problem for libvirt, 
NetworkManager, or even Matahari.


But for a QEMU guest agent where we terminate the APIs within QEMU 
itself, I do think it creates a pretty nasty portability barrier.



There's nothing wrong with this if the goal of Matahari is to provide a
robust agent for Fedora-based Linux distributions but I don't think it
meets the requirements of a QEMU guest agent.

I don't think we can overly optimize for one Linux distribution either
so a mentality of letting other platforms contribute their own support
probably won't work.
 

That is not the goal of Matahari. It is intended to be generically
applicable to *all* guest OS. Obviously in areas where every distro
does different things, then it will need porting for each different
impl. You have to start somewhere and it started with Fedora. This
is all is true of any guest agent solution.
   


There's two approaches that could be taken for a guest agent.  You could 
provide very low level interfaces (read a file, execute a command, read 
a registry key).  This makes for a very portable guest agent at the cost 
of complexity in interacting with the agent.  The agent doesn't ever 
really need to change much the client (QEMU) needs to handle many 
different types of guests, and add new functionality based on the 
supported primitives.


Another approach is to put the complexity in the agent and simplify the 
management interface.  For system's management applications, this is 
probably the right approach.  For virtualization, I think this is a bad 
approach.


Very specifically, netcf only really needs to read and write 
configuration files and potentially run a command.  Instead of linking 
against netcf in the guest, we should link against netcf in QEMU so that 
we don't have to constantly change the guest agent.


Regards,

Anthony Liguori



Regards,
Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Daniel P. Berrange
On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote:
> Hi Andrew,
> 
> On 10/21/2010 05:22 AM, Andrew Beekhof wrote:
> >
> >Hello from the Matahari tech-lead...
> >Is there any documentation on the capabilities provided guest agent 
> >Anthony is creating?  Perhaps we can combine efforts.
> 
> Mike should be posting today or tomorrow.
> 
> >Also happy to provide more information on Matahari if anyone is 
> >interested.
> 
> I'd really like to hear more about Matahari's long term vision.
> 
> For a QEMU guest agent, we need something that is very portable.  The 
> interfaces it provides need to be reasonably guest agnostic and we need 
> to support a wide range of guests including Windows, Linux, *BSD, etc.
> 
> From the little bit I've read about Matahari, it seems to be pretty 
> specific and pretty oriented towards Fedora-like distributions.  It 
> exposes interfaces for manipulation of RPM packages, relies on netcf, etc.

FYI netcf is not Fedora specific. There is a Win32 backend for it
too. It does need porting to other Linux distros, but that's simply
an internal implementation issue. The goal of netcf is to be the
libvirt of network config mgmt - a portable API for all OS network
config tasks. Further, Matahari itself is also being ported to Win32
and can be ported to other Linux distros too.

> There's nothing wrong with this if the goal of Matahari is to provide a 
> robust agent for Fedora-based Linux distributions but I don't think it 
> meets the requirements of a QEMU guest agent.
> 
> I don't think we can overly optimize for one Linux distribution either 
> so a mentality of letting other platforms contribute their own support 
> probably won't work.

That is not the goal of Matahari. It is intended to be generically
applicable to *all* guest OS. Obviously in areas where every distro
does different things, then it will need porting for each different
impl. You have to start somewhere and it started with Fedora. This
is all is true of any guest agent solution. 

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Anthony Liguori

Hi Andrew,

On 10/21/2010 05:22 AM, Andrew Beekhof wrote:


Hello from the Matahari tech-lead...
Is there any documentation on the capabilities provided guest agent 
Anthony is creating?  Perhaps we can combine efforts.


Mike should be posting today or tomorrow.

Also happy to provide more information on Matahari if anyone is 
interested.


I'd really like to hear more about Matahari's long term vision.

For a QEMU guest agent, we need something that is very portable.  The 
interfaces it provides need to be reasonably guest agnostic and we need 
to support a wide range of guests including Windows, Linux, *BSD, etc.


From the little bit I've read about Matahari, it seems to be pretty 
specific and pretty oriented towards Fedora-like distributions.  It 
exposes interfaces for manipulation of RPM packages, relies on netcf, etc.


There's nothing wrong with this if the goal of Matahari is to provide a 
robust agent for Fedora-based Linux distributions but I don't think it 
meets the requirements of a QEMU guest agent.


I don't think we can overly optimize for one Linux distribution either 
so a mentality of letting other platforms contribute their own support 
probably won't work.


Regards,

Anthony Liguori


-- Andrew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Dor Laor

On 10/21/2010 03:02 PM, Anthony Liguori wrote:

On 10/21/2010 02:45 AM, Paolo Bonzini wrote:

On 10/21/2010 03:14 AM, Alexander Graf wrote:

I agree that some agent code for basic stuff like live snapshot
sync with the filesystem is small enough and worth to host within
qemu. Maybe we do need more than one project?


No, please. That's exactly what I don't want to see. The
libvirt/qemu/virt-man split is killing us already. How is this going
to become with 20 driver packs for the guest?


Agreed. Not relying on Mata Hari and reinventing a dbus/WMI interface
would be yet another case of QEMU NIH.


I think we're about 10 steps ahead of where we should be right now.

The first step is just identifying what interfaces we need in a guest
agent. So far, I think we can get away with a very small number of
interfaces (mainly read/write files, execute command).


The same argument also works on the backend BTW, it can be virtio
serial but also a Xen pvconsole and that wheel should not be
reinvented either.

The guest agent should be a pluggable architecture, and QEMU can
provide plugins for sync, spice, "info balloon" and everything else it
needs.


virtio-serial is essentially our plugin interface. The core QEMU agent
would use org.qemu.guest-agent and a spice against could use
org.spice-space.guest-agent.

The QEMU agent should have an interface that terminates in QEMU itself.


I agree that for some think with semi-transaction oriented actions like 
the fs-freeze on live snapshot, we need a small, self confined code base.




Regards,

Anthony Liguori


Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Anthony Liguori

On 10/21/2010 02:45 AM, Paolo Bonzini wrote:

On 10/21/2010 03:14 AM, Alexander Graf wrote:

I agree that some agent code for basic stuff like live snapshot
sync with the filesystem is small enough and worth to host within
qemu. Maybe we do need more than one project?


No, please. That's exactly what I don't want to see. The
libvirt/qemu/virt-man split is killing us already. How is this going
to become with 20 driver packs for the guest?


Agreed.  Not relying on Mata Hari and reinventing a dbus/WMI interface 
would be yet another case of QEMU NIH.


I think we're about 10 steps ahead of where we should be right now.

The first step is just identifying what interfaces we need in a guest 
agent.  So far, I think we can get away with a very small number of 
interfaces (mainly read/write files, execute command).


  The same argument also works on the backend BTW, it can be virtio 
serial but also a Xen pvconsole and that wheel should not be 
reinvented either.


The guest agent should be a pluggable architecture, and QEMU can 
provide plugins for sync, spice, "info balloon" and everything else it 
needs.


virtio-serial is essentially our plugin interface.  The core QEMU agent 
would use org.qemu.guest-agent and a spice against could use 
org.spice-space.guest-agent.


The QEMU agent should have an interface that terminates in QEMU itself.

Regards,

Anthony Liguori


Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] kvm-kmod-2.6.36

2010-10-21 Thread Jan Kiszka
Kernel 2.6.36 is baked, and so is the corresponding kvm-kmod now. A few
fixes went in since -rc6, see below.

This release also marks the end of life for the kvm-kmod-2.6.35 series -
although there are a few unreleased kvm-kmod fixes in git. But the very
same patches can be obtained via .36 now, and there is currently no
2.6.35.8 with KVM updates in sight.

KVM changes since kvm-kmod-2.6.36-rc6:
 - x86: Fix fs/gs reload oops with invalid ldt
 - x86: Move TSC reset out of vmcb_init
 - x86: Fix SVM VMCB reset

kvm-kmod changes:
 - x86: make sure kvm_get_desc_base() doesn't sign extend
 - Improve accuracy of tboot_enabled wrapping
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing

2010-10-21 Thread Michael S. Tsirkin
On Thu, Oct 21, 2010 at 11:47:18AM +0200, Avi Kivity wrote:
>  On 10/21/2010 11:24 AM, Michael S. Tsirkin wrote:
> >On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote:
> >>   On 10/21/2010 08:46 AM, Sheng Yang wrote:
> >>  >>   >   +r = -EOPNOTSUPP;
> >>  >>
> >>  >>   If the guest assigned the device to another guest, it allows the 
> >> nested
> >>  >>   guest to kill the non-nested guest.  Need to exit in a graceful 
> >> fashion.
> >>  >
> >>  >Don't understand... It wouldn't result in kill but return to 
> >> QEmu/userspace.
> >>
> >>  What would qemu do on EOPNOTSUPP?  It has no way of knowing that
> >>  this was triggered by an unsupported msix access.  What can it do?
> >>
> >>  Best to just ignore the write.
> >>
> >>  If you're worried about debugging, we can have a trace_kvm_discard()
> >>  tracepoint that logs the address and a type enum field that explains
> >>  why an access was discarded.
> >
> >The issue is that the same page is used for mask and entry programming.
> 
> Yeah.  For that use the normal mmio exit_reason.  I was referring to
> misaligned writes.

Yes, I think we can just drop them if we like. Might be a good idea to
stick a trace point there for debugging.

> I'm not happy with partial emulation, but I'm not happy either with
> yet another interface to communicate the decoded MSI BAR writes to
> userspace.  Shall we just reprogram the irq routing table?

Well we don't need to touch the routing table at all.
We have the MSI mask, address and data in kernel.
That is enough to interrupt the guest.

For vhost-net, we would need an interface that maps an irqfd
to a vector # (not gsi), or alternatively map gsi to
a pair device/vector number. This is easy to implement.

For VFIO, we have a problem if we try to work especially without
interrupt remapping as we can't just program all entries for all
devices.  So solution could be some combination of requiring interrupt
remapping, looking at addresses and looking at the pending bit.

New vectors are added/removed rarely. So maybe we can get away with 
1. interface to read MSIX table from userspace (good for debugging anyway)
2. an eventfd to signal on MSIX table writes


> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Dor Laor

On 10/21/2010 12:22 PM, Andrew Beekhof wrote:

On 10/20/2010 02:16 PM, Dor Laor wrote:

On 10/20/2010 10:21 AM, Alexander Graf wrote:


On 19.10.2010, at 17:14, Chris Wright wrote:

Guest Agent
- have one coming RSN (poke Anthony for details)


Would there be a chance to have a single agent for everyone, so that
we actually form a Qemu agent instead of a dozen individual ones? I'm
mainly thinking Spice here.


More important than the number of instances is the usage of common
framework. Here is the link to the Matahari project:
https://fedorahosted.org/matahari/wiki/API


Hello from the Matahari tech-lead...
Is there any documentation on the capabilities provided guest agent
Anthony is creating? Perhaps we can combine efforts.


http://repo.or.cz/w/qemu/mdroth.git/tree

He said that they'll publish the info in a week.



Also happy to provide more information on Matahari if anyone is interested.


Go ahead and supply it on upstream kvm-devel / qemu-devel list



-- Andrew


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Andrew Beekhof

On 10/20/2010 02:16 PM, Dor Laor wrote:

On 10/20/2010 10:21 AM, Alexander Graf wrote:


On 19.10.2010, at 17:14, Chris Wright wrote:

Guest Agent
- have one coming RSN (poke Anthony for details)


Would there be a chance to have a single agent for everyone, so that
we actually form a Qemu agent instead of a dozen individual ones? I'm
mainly thinking Spice here.


More important than the number of instances is the usage of common
framework. Here is the link to the Matahari project:
https://fedorahosted.org/matahari/wiki/API


Hello from the Matahari tech-lead...
Is there any documentation on the capabilities provided guest agent 
Anthony is creating?  Perhaps we can combine efforts.


Also happy to provide more information on Matahari if anyone is interested.

-- Andrew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: SVM: Move svm->host_gs_base into a separate structure

2010-10-21 Thread Avi Kivity
More members will join it soon.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9d703e2..451590e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -124,7 +124,9 @@ struct vcpu_svm {
u64 next_rip;
 
u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
-   u64 host_gs_base;
+   struct {
+   u64 gs_base;
+   } host;
 
u32 *msrpm;
 
@@ -1353,14 +1355,14 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, 
struct kvm_guest_debug *dbg)
 static void load_host_msrs(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
-   wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host_gs_base);
+   wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
 #endif
 }
 
 static void save_host_msrs(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
-   rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host_gs_base);
+   rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
 #endif
 }
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: SVM: Move guest register save out of interrupts disabled section

2010-10-21 Thread Avi Kivity
Saving guest registers is just a memory copy, and does not need to be in the
critical section.  Move outside the critical section to improve latency a
bit.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2e57450..9d703e2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3412,11 +3412,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
-   vcpu->arch.cr2 = svm->vmcb->save.cr2;
-   vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
-   vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
-   vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
-
load_host_msrs(vcpu);
kvm_load_ldt(ldt_selector);
loadsegment(fs, fs_selector);
@@ -3433,6 +3428,11 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
stgi();
 
+   vcpu->arch.cr2 = svm->vmcb->save.cr2;
+   vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
+   vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
+   vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+
sync_cr8_to_lapic(vcpu);
 
svm->next_rip = 0;
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Lightweight svm vmload/vmsave (almost)

2010-10-21 Thread Avi Kivity
This patchset moves svm towards a lightweight vmload/vmsave path.  It was
hindered by CVE-2010-3698 which was discovered during its development, and
by the lack of per-cpu IDT in Linux, which makes it more or less useless.
However, even so it's a slight improvement, and merging it will reduce the
work needed when we do have per-cpu IDT.

Avi Kivity (4):
  KVM: SVM: Move guest register save out of interrupts disabled section
  KVM: SVM: Move svm->host_gs_base into a separate structure
  KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path
  KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their
callers

 arch/x86/kvm/svm.c |   61 +++-
 1 files changed, 27 insertions(+), 34 deletions(-)

-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path

2010-10-21 Thread Avi Kivity
ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386).  So save/restore them in the heavyweight exit path instead
of the lightweight path.

By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |   34 +++---
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 451590e..ec392d6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -125,6 +125,9 @@ struct vcpu_svm {
 
u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
struct {
+   u16 fs;
+   u16 gs;
+   u16 ldt;
u64 gs_base;
} host;
 
@@ -184,6 +187,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
 
+static void save_host_msrs(struct kvm_vcpu *vcpu);
+static void load_host_msrs(struct kvm_vcpu *vcpu);
+
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -996,6 +1002,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
svm->asid_generation = 0;
}
 
+   save_host_msrs(vcpu);
+   savesegment(fs, svm->host.fs);
+   savesegment(gs, svm->host.gs);
+   svm->host.ldt = kvm_read_ldt();
+
for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
@@ -1006,6 +1017,14 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
int i;
 
++vcpu->stat.host_state_reload;
+   kvm_load_ldt(svm->host.ldt);
+   loadsegment(fs, svm->host.fs);
+#ifdef CONFIG_X86_64
+   load_gs_index(svm->host.gs);
+   wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
+#else
+   loadsegment(gs, gs_selector);
+#endif
for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
@@ -3314,9 +,6 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
 static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
-   u16 fs_selector;
-   u16 gs_selector;
-   u16 ldt_selector;
 
svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
@@ -,10 +3349,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
sync_lapic_to_cr8(vcpu);
 
-   save_host_msrs(vcpu);
-   savesegment(fs, fs_selector);
-   savesegment(gs, gs_selector);
-   ldt_selector = kvm_read_ldt();
svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
clgi();
@@ -3415,14 +3427,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
);
 
load_host_msrs(vcpu);
-   kvm_load_ldt(ldt_selector);
-   loadsegment(fs, fs_selector);
-#ifdef CONFIG_X86_64
-   load_gs_index(gs_selector);
-   wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
-#else
-   loadsegment(gs, gs_selector);
-#endif
 
reload_tss(vcpu);
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers

2010-10-21 Thread Avi Kivity
This abstraction only serves to obfuscate.  Remove.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/svm.c |   25 ++---
 1 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec392d6..fad4038 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -187,9 +187,6 @@ static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
 
-static void save_host_msrs(struct kvm_vcpu *vcpu);
-static void load_host_msrs(struct kvm_vcpu *vcpu);
-
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -1002,7 +999,9 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
svm->asid_generation = 0;
}
 
-   save_host_msrs(vcpu);
+#ifdef CONFIG_X86_64
+   rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
+#endif
savesegment(fs, svm->host.fs);
savesegment(gs, svm->host.gs);
svm->host.ldt = kvm_read_ldt();
@@ -1371,20 +1370,6 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, 
struct kvm_guest_debug *dbg)
update_db_intercept(vcpu);
 }
 
-static void load_host_msrs(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
-   wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
-#endif
-}
-
-static void save_host_msrs(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
-   rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
-#endif
-}
-
 static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
 {
if (sd->next_asid > sd->max_asid) {
@@ -3426,7 +3411,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
-   load_host_msrs(vcpu);
+#ifdef CONFIG_X86_64
+   wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+#endif
 
reload_tss(vcpu);
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [PATCH] mark irq9 active high in DSDT

2010-10-21 Thread Avi Kivity

 On 10/21/2010 04:00 AM, Kevin O'Connor wrote:

On Wed, Oct 20, 2010 at 11:34:41AM +0200, Gleb Natapov wrote:
>  In PIIX4 SCI (irq9) is active high. Seabios marks it so in interrupt
>  override table, but some OSes (FreeBSD) require the same information to
>  be present in DSDT too. Make it so.
>
>  Signed-off-by: Gleb Natapov

Thanks.


How do we manage the stable series wrt this issue?

qemu-kvm-0.12.5 has a regression within the stable series that this 
patch fixes.  qemu 0.12.5 does not, but only because it does not emulate 
polarity in the I/O APIC correctly.


There are several paths we could take:

- do nothing, bug is fixed in mainline
- release a seabios 0.x.1 for qemu 0.13.1 with this patch
- same, plus seabios 0.y.1 for qemu 0.12.6 with this patch
- skip qemu (which is not truly affected), patch qemu-kvm's copy of 
seabios for both 0.12.z and 0.13.z


The third option is the most "correct" from a release engineering point 
of view, but involves more work for everyone.  The fourth is quick pain 
relief but is a little forky.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing

2010-10-21 Thread Avi Kivity

 On 10/21/2010 11:24 AM, Michael S. Tsirkin wrote:

On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote:
>   On 10/21/2010 08:46 AM, Sheng Yang wrote:
>  >>   >   +   r = -EOPNOTSUPP;
>  >>
>  >>   If the guest assigned the device to another guest, it allows the nested
>  >>   guest to kill the non-nested guest.  Need to exit in a graceful fashion.
>  >
>  >Don't understand... It wouldn't result in kill but return to QEmu/userspace.
>
>  What would qemu do on EOPNOTSUPP?  It has no way of knowing that
>  this was triggered by an unsupported msix access.  What can it do?
>
>  Best to just ignore the write.
>
>  If you're worried about debugging, we can have a trace_kvm_discard()
>  tracepoint that logs the address and a type enum field that explains
>  why an access was discarded.

The issue is that the same page is used for mask and entry programming.


Yeah.  For that use the normal mmio exit_reason.  I was referring to 
misaligned writes.


I'm not happy with partial emulation, but I'm not happy either with yet 
another interface to communicate the decoded MSI BAR writes to 
userspace.  Shall we just reprogram the irq routing table?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing

2010-10-21 Thread Michael S. Tsirkin
On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote:
>  On 10/21/2010 08:46 AM, Sheng Yang wrote:
> >>  >  +  r = -EOPNOTSUPP;
> >>
> >>  If the guest assigned the device to another guest, it allows the nested
> >>  guest to kill the non-nested guest.  Need to exit in a graceful fashion.
> >
> >Don't understand... It wouldn't result in kill but return to QEmu/userspace.
> 
> What would qemu do on EOPNOTSUPP?  It has no way of knowing that
> this was triggered by an unsupported msix access.  What can it do?
> 
> Best to just ignore the write.
> 
> If you're worried about debugging, we can have a trace_kvm_discard()
> tracepoint that logs the address and a type enum field that explains
> why an access was discarded.

The issue is that the same page is used for mask and entry programming.

> >>  >  +  if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
> >>  >  +  /* Only allow entry modification when entry was masked 
> >> */
> >>  >  +  if (!entry_masked) {
> >>  >  +  printk(KERN_WARNING
> >>  >  +  "KVM: guest try to write unmasked MSI-X 
> >> entry. "
> >>  >  +  "addr 0x%llx, len %d, val 0x%lx\n",
> >>  >  +  addr, len, new_val);
> >>  >  +  r = 0;
> >>
> >>  What does the spec says about this situation?
> >
> >As Michael pointed out. The spec said the result is "undefined" indeed.
> 
> Ok.  Then we should silently discard the write instead of allowing
> the guest to flood host dmesg.
> 
> >>
> >>  >  +  goto out;
> >>  >  +  }
> >>  >  +  if (new_val&   ~1ul) {
> >>
> >>  Is there a #define for this bit?
> >
> >Sorry I didn't find it. mask_msi_irq() also use the number 1... Maybe we can 
> >add
> >one.
> 
> Yes please.
> 
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing

2010-10-21 Thread Avi Kivity

 On 10/21/2010 08:46 AM, Sheng Yang wrote:

>  >  +   r = -EOPNOTSUPP;
>
>  If the guest assigned the device to another guest, it allows the nested
>  guest to kill the non-nested guest.  Need to exit in a graceful fashion.

Don't understand... It wouldn't result in kill but return to QEmu/userspace.


What would qemu do on EOPNOTSUPP?  It has no way of knowing that this 
was triggered by an unsupported msix access.  What can it do?


Best to just ignore the write.

If you're worried about debugging, we can have a trace_kvm_discard() 
tracepoint that logs the address and a type enum field that explains why 
an access was discarded.



>  >  +   if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
>  >  +   /* Only allow entry modification when entry was masked */
>  >  +   if (!entry_masked) {
>  >  +   printk(KERN_WARNING
>  >  +   "KVM: guest try to write unmasked MSI-X entry. 
"
>  >  +   "addr 0x%llx, len %d, val 0x%lx\n",
>  >  +   addr, len, new_val);
>  >  +   r = 0;
>
>  What does the spec says about this situation?

As Michael pointed out. The spec said the result is "undefined" indeed.


Ok.  Then we should silently discard the write instead of allowing the 
guest to flood host dmesg.



>
>  >  +   goto out;
>  >  +   }
>  >  +   if (new_val&   ~1ul) {
>
>  Is there a #define for this bit?

Sorry I didn't find it. mask_msi_irq() also use the number 1... Maybe we can add
one.


Yes please.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps

2010-10-21 Thread Takuya Yoshikawa

(2010/10/21 18:04), Avi Kivity wrote:

On 10/21/2010 10:45 AM, Takuya Yoshikawa wrote:

Well, 4K single buffer is 128MB worth of RAM. This won't help live migration 
much, but will help vga dirty logging, which is active at all times and uses 
much smaller memory slots. So I think it's worthwhile.





Thanks, the patch I sent now should might be better.

Which one do you prefer?


Well, the two combined :)

Let's apply the double-buffer first and follow with kmalloc() conversion.



OK, thanks :)

I'll send the kmalloc() patch later!

  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps

2010-10-21 Thread Avi Kivity

 On 10/21/2010 10:45 AM, Takuya Yoshikawa wrote:
Well, 4K single buffer is 128MB worth of RAM. This won't help live 
migration much, but will help vga dirty logging, which is active at 
all times and uses much smaller memory slots. So I think it's 
worthwhile.






Thanks, the patch I sent now should might be better.

Which one do you prefer?


Well, the two combined :)

Let's apply the double-buffer first and follow with kmalloc() conversion.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86, mce: broadcast mce depending on the cpu version

2010-10-21 Thread Hidetoshi Seto
There is no reason why SRAO event received by the main thread
is the only one that being broadcasted.

According to the x86 ASDM vol.3A 15.10.4.1,
MCE signal is broadcast on processor version 06H_EH or later.

This change is required to handle SRAR in smp guests.

Signed-off-by: Hidetoshi Seto 
---
 target-i386/kvm.c |   29 -
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a0d0603..00bb083 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1637,6 +1637,28 @@ static void hardware_memory_error(void)
 exit(1);
 }
 
+#ifdef KVM_CAP_MCE
+static void kvm_mce_broadcast_rest(CPUState *env)
+{
+CPUState *cenv;
+int family, model, cpuver = env->cpuid_version;
+
+family = (cpuver >> 8) & 0xf;
+model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf);
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+if ((family == 6 && model >= 14) || family > 6) {
+if (cenv == env) {
+continue;
+}
+for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
+kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+}
+}
+}
+#endif
+
 int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
@@ -1694,6 +1716,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void 
*addr)
 fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
 abort();
 }
+kvm_mce_broadcast_rest(env);
 } else
 #endif
 {
@@ -1716,7 +1739,6 @@ int kvm_on_sigbus(int code, void *addr)
 void *vaddr;
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
-CPUState *cenv;
 
 /* Hope we are lucky for AO MCE */
 vaddr = addr;
@@ -1732,10 +1754,7 @@ int kvm_on_sigbus(int code, void *addr)
 kvm_inject_x86_mce(first_cpu, 9, status,
MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
(MCM_ADDR_PHYS << 6) | 0xc, 1);
-for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) {
-kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
-}
+kvm_mce_broadcast_rest(first_cpu);
 } else
 #endif
 {
-- 
1.6.5.2


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86, mce: ignore SRAO only when MCG_SER_P is available

2010-10-21 Thread Hidetoshi Seto
And restruct this block to call kvm_mce_in_exception() only when it is
required.

Signed-off-by: Hidetoshi Seto 
---
 target-i386/kvm.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 786eeeb..a0d0603 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -239,12 +239,16 @@ static void kvm_do_inject_x86_mce(void *_data)
 struct kvm_x86_mce_data *data = _data;
 int r;
 
-/* If there is an MCE excpetion being processed, ignore this SRAO MCE */
-r = kvm_mce_in_exception(data->env);
-if (r == -1)
-fprintf(stderr, "Failed to get MCE status\n");
-else if (r && !(data->mce->status & MCI_STATUS_AR))
-return;
+/* If there is an MCE exception being processed, ignore this SRAO MCE */
+if ((data->env->mcg_cap & MCG_SER_P) &&
+!(data->mce->status & MCI_STATUS_AR)) {
+r = kvm_mce_in_exception(data->env);
+if (r == -1) {
+fprintf(stderr, "Failed to get MCE status\n");
+} else if (r) {
+return;
+}
+}
 
 r = kvm_set_mce(data->env, data->mce);
 if (r < 0) {
-- 
1.6.5.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices

2010-10-21 Thread Michael S. Tsirkin
On Thu, Oct 21, 2010 at 04:30:02PM +0800, Sheng Yang wrote:
> On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> 
> The basic idea of kernel and QEmu's responsibilities are:
> 
> 1. Because QEmu owned the irq routing table, so the change of table should 
> still 
> go to the QEmu, like we did in msix_mmio_write().
> 
> 2. And the others things can be done in kernel, for performance. Here we 
> covered 
> the reading(converted entry from routing table and mask bit state of enabled 
> MSI-X 
> entries), and writing the mask bit for enabled MSI-X entries. Originally we 
> only 
> has mask bit handled in kernel, but later we found that Linux kernel would 
> read 
> MSI-X mmio just after every writing to mask bit, in order to flush the 
> writing. So 
> we add reading MSI data/addr as well.
> 
> 3. Disabled entries's mask bit accessing would go to QEmu, because it may 
> result 
> in disable/enable MSI-X. Explained later.
> 
> 4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to 
> decide enable/disable MSI-X for device.
> .

Config space yes, but it's a simple global yes/no after all.

> 5. There is an distinction between enabled entry and disabled entry of MSI-X 
> table.

That's my point. There's no such thing as 'enabled entries'
in the spec. There are only masked and unmasked entries.

Current interface deals with gsi numbers so qemu had to work around
this. The hack used there is removing gsi for masked vector which has 0
address and data.  It works because this is what linux and windows
guests happen to do, but it is out of spec: vector/data value for a
masked entry have no meaning.

Since you are building a new interface, can design it without
constraints...

> The entries we had used for pci_enable_msix()(not necessary in sequence 
> number) are already enabled, the others are disabled. When device's MSI-X is 
> enabled and guest want to enable an disabled entry, we would go back to QEmu  
> because this vector didn't exist in the routing table. Also due to 
> pci_enable_msix() in kernel didn't allow us to enable vectors one by one, but 
> all 
> at once. So we have to disable MSI-X first, then enable it with new entries, 
> which 
> contained the new vector guest want to use. This situation is only happen 
> when 
> device is being initialized. After that, kernel can know and handle the mask 
> bit 
> of the enabled entry.
> 
> I've also considered handle all MMIO operation in kernel, and changing irq 
> routing 
> in kernel directly. But as long as irq routing is owned by QEmu, I think it's 
> better to leave to it...

Yes, this is my suggestion, except we don't need no routing :)
To inject MSI you just need address/data pairs.
Look at kvm_set_msi: given address/data you can just inject
the interrupt. No need for table lookups.

> Notice the mask/unmask bits must be handled together, either in kernel or in 
> userspace. Because if kernel has handled enabled vector's mask bit directly, 
> it 
> would be unsync with QEmu's records. It doesn't matter when QEmu don't access 
> the 
> related record. And the only place QEmu want to consult it's enabled entries' 
> mask 
> bit state is writing to MSI addr/data. The writing should be discarded if the 
> entry is unmasked. This checking has already been done by kernel in this 
> patchset, 
> so we are fine here.
> 
> If we want to access the enabled entries' mask bit in the future, we can 
> directly 
> access device's MMIO.

We really must implement this for correctness, btw. If you do not pass
reads to the device, messages intended for the masked entry
might still be in flight.

> That's the reason why I have followed Michael's advice to use
> mask/unmask directly.
> Hope this would make the patches more clear. I meant to add comments for this 
> changeset, but miss it later.
> 
> --
> regards
> Yang, Sheng
> 
> > 
> > Signed-off-by: Sheng Yang 
> > ---
> >  Documentation/kvm/api.txt |   22 
> >  arch/x86/kvm/x86.c|6 
> >  include/linux/kvm.h   |8 +-
> >  virt/kvm/assigned-dev.c   |   60
> > + 4 files changed, 95
> > insertions(+), 1 deletions(-)
> > 
> > diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
> > index d82d637..f324a50 100644
> > --- a/Documentation/kvm/api.txt
> > +++ b/Documentation/kvm/api.txt
> > @@ -1087,6 +1087,28 @@ of 4 instructions that make up a hypercall.
> >  If any additional field gets added to this structure later on, a bit for
> > that additional piece of information will be set in the flags bitmap.
> > 
> > +4.47 KVM_ASSIGN_REG_MSIX_MMIO
> > +
> > +Capability: KVM_CAP_DEVICE_MSIX_MASK
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_assigned_msix_mmio (in)
> > +Returns: 0 on success, !0 on error
> > +
> > +struct kvm_assigned_msix_mmio {
> > +   /* Assigned device's ID */
> > +   __u32 assigned_dev_id;
> > 

Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps

2010-10-21 Thread Takuya Yoshikawa

(2010/10/21 17:38), Avi Kivity wrote:

On 10/21/2010 03:51 AM, Takuya Yoshikawa wrote:

(2010/10/20 18:34), Takuya Yoshikawa wrote:

** NOT TESTED WELL YET **

Currently, we are using vmalloc() for all dirty bitmaps even though
they are small enough, say 256 bytes or less.

So we use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE.



Ah, I forgot about the plan to do double buffering.

Making the size of bitmap twice may reduce the benefit of this patch.


Well, 4K single buffer is 128MB worth of RAM. This won't help live migration 
much, but will help vga dirty logging, which is active at all times and uses 
much smaller memory slots. So I think it's worthwhile.




Thanks, the patch I sent now should might be better.

Which one do you prefer?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: pre-allocate one more dirty bitmap to avoid vmalloc() in x86's kvm_vm_ioctl_get_dirty_log()

2010-10-21 Thread Takuya Yoshikawa
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
  I measured performance for the case of VGA update by trace-cmd.

 - Without this patch
   |  kvm_vm_ioctl_get_dirty_log() {
   |mutex_lock() {
0.195 us   |  _cond_resched();
0.683 us   |}
0.207 us   |_raw_spin_lock();
   |kvm_mmu_slot_remove_write_access() {
...   ...
2.916 us   |}
   |vmalloc() {
...   ...
  + 43.731 us  |}
0.222 us   |memset();
   |T.1632() {
   |  __kmalloc() {
...   ...
2.870 us   |  }
3.257 us   |}
   |synchronize_srcu_expedited() {
...   ...
  ! 143.147 us |}
0.480 us   |kfree();
   |copy_to_user() {
0.196 us   |  _cond_resched();
0.635 us   |}
   |vfree() {
...   ...
  + 12.103 us  |  }
  + 12.508 us  |}
0.218 us   |mutex_unlock();
  ! 211.323 us |  }


 - With this patch

   |  kvm_vm_ioctl_get_dirty_log() {
   |mutex_lock() {
0.199 us   |  _cond_resched();
0.703 us   |}
0.222 us   |_raw_spin_lock();
   |kvm_mmu_slot_remove_write_access() {
...   ...
2.179 us   |}
0.225 us   |memset();
   |T.1634() {
   |  __kmalloc() {
...   ...
2.367 us   |  }
2.791 us   |}
   |synchronize_srcu_expedited() {
...   ...
  ! 125.299 us |}
0.263 us   |kfree();
   |copy_to_user() {
0.196 us   |  _cond_resched();
0.647 us   |}
0.214 us   |mutex_unlock();
  ! 135.223 us |  }

  So the result was 1.5 times faster than the original.


  In the case of live migration, the improvement ratio depends on the workload
  and the guest memory size.

Note:
  This does not change other architectures's logic but the allocation size
  becomes twice. This will increase the actual memory consumption only when
  the new size changes the number of pages allocated by vmalloc().

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
---
 arch/x86/kvm/x86.c   |   16 +---
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |   11 +--
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3f86b2..c4d2e0b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3171,18 +3171,15 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
kvm_mmu_slot_remove_write_access(kvm, log->slot);
spin_unlock(&kvm->mmu_lock);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
-   goto out;
+   dirty_bitmap = memslot->dirty_bitmap_head;
+   if (memslot->dirty_bitmap == dirty_bitmap)
+   dirty_bitmap += n / sizeof(long);
memset(dirty_bitmap, 0, n);
 
r = -ENOMEM;
slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots) {
-   vfree(dirty_bitmap);
+   if (!slots)
goto out;
-   }
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
 
@@ -3193,11 +3190,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
kfree(old_slots);
 
r = -EFAULT;
-   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
-   vfree(dirty_bitmap);
+   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
goto out;
-   }
-   vfree(dirty_bitmap);
} else {
r = -EFAULT;
if (clear_user(log->dirty_bitmap, n))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b89d00..7c956d8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -119,6 +119,7 @@ struct kvm_memory_slot {
unsigned long flags;
unsigned long *rmap;
unsigned long *dirty_bitmap;
+   unsigned long *dirty_bitmap_head;
struct {
unsigned long rmap_pde;
int write_count;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 62ae13f..b15d1eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -447,8 +447,9 @@ static void kvm_destroy_

Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps

2010-10-21 Thread Avi Kivity

 On 10/21/2010 03:51 AM, Takuya Yoshikawa wrote:

(2010/10/20 18:34), Takuya Yoshikawa wrote:

** NOT TESTED WELL YET **

Currently, we are using vmalloc() for all dirty bitmaps even though
they are small enough, say 256 bytes or less.

So we use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE.



Ah, I forgot about the plan to do double buffering.

Making the size of bitmap twice may reduce the benefit of this patch.


Well, 4K single buffer is 128MB worth of RAM.  This won't help live 
migration much, but will help vga dirty logging, which is active at all 
times and uses much smaller memory slots.  So I think it's worthwhile.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: introduce wrapper function for creating/destroying dirty bitmaps

2010-10-21 Thread Takuya Yoshikawa
This makes it easy to change the way of allocating/freeing dirty bitmaps.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Fernando Luis Vazquez Cao 
---
 virt/kvm/kvm_main.c |   30 +++---
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1aeeb7f..62ae13f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -442,6 +442,15 @@ out_err_nodisable:
return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   if (!memslot->dirty_bitmap)
+   return;
+
+   vfree(memslot->dirty_bitmap);
+   memslot->dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -454,7 +463,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
vfree(free->rmap);
 
if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-   vfree(free->dirty_bitmap);
+   kvm_destroy_dirty_bitmap(free);
 
 
for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -465,7 +474,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
}
 
free->npages = 0;
-   free->dirty_bitmap = NULL;
free->rmap = NULL;
 }
 
@@ -527,6 +535,18 @@ static int kvm_vm_release(struct inode *inode, struct file 
*filp)
return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+   memslot->dirty_bitmap = vmalloc(dirty_bytes);
+   if (!memslot->dirty_bitmap)
+   return -ENOMEM;
+
+   memset(memslot->dirty_bitmap, 0, dirty_bytes);
+   return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -661,12 +681,8 @@ skip_lpage:
 
/* Allocate page dirty bitmap if needed */
if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new);
-
-   new.dirty_bitmap = vmalloc(dirty_bytes);
-   if (!new.dirty_bitmap)
+   if (kvm_create_dirty_bitmap(&new) < 0)
goto out_free;
-   memset(new.dirty_bitmap, 0, dirty_bytes);
/* destroy any largepage mappings for dirty tracking */
if (old.npages)
flush_shadow = 1;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices

2010-10-21 Thread Sheng Yang
On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.

The basic idea of kernel and QEmu's responsibilities are:

1. Because QEmu owned the irq routing table, so the change of table should 
still 
go to the QEmu, like we did in msix_mmio_write().

2. And the others things can be done in kernel, for performance. Here we 
covered 
the reading(converted entry from routing table and mask bit state of enabled 
MSI-X 
entries), and writing the mask bit for enabled MSI-X entries. Originally we 
only 
has mask bit handled in kernel, but later we found that Linux kernel would read 
MSI-X mmio just after every writing to mask bit, in order to flush the writing. 
So 
we add reading MSI data/addr as well.

3. Disabled entries's mask bit accessing would go to QEmu, because it may 
result 
in disable/enable MSI-X. Explained later.

4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to 
decide enable/disable MSI-X for device.
.
5. There is an distinction between enabled entry and disabled entry of MSI-X 
table. The entries we had used for pci_enable_msix()(not necessary in sequence 
number) are already enabled, the others are disabled. When device's MSI-X is 
enabled and guest want to enable an disabled entry, we would go back to QEmu  
because this vector didn't exist in the routing table. Also due to 
pci_enable_msix() in kernel didn't allow us to enable vectors one by one, but 
all 
at once. So we have to disable MSI-X first, then enable it with new entries, 
which 
contained the new vector guest want to use. This situation is only happen when 
device is being initialized. After that, kernel can know and handle the mask 
bit 
of the enabled entry.

I've also considered handle all MMIO operation in kernel, and changing irq 
routing 
in kernel directly. But as long as irq routing is owned by QEmu, I think it's 
better to leave to it...

Notice the mask/unmask bits must be handled together, either in kernel or in 
userspace. Because if kernel has handled enabled vector's mask bit directly, it 
would be unsync with QEmu's records. It doesn't matter when QEmu don't access 
the 
related record. And the only place QEmu want to consult it's enabled entries' 
mask 
bit state is writing to MSI addr/data. The writing should be discarded if the 
entry is unmasked. This checking has already been done by kernel in this 
patchset, 
so we are fine here.

If we want to access the enabled entries' mask bit in the future, we can 
directly 
access device's MMIO. That's the reason why I have followed Michael's advice to 
use mask/unmask directly.

Hope this would make the patches more clear. I meant to add comments for this 
changeset, but miss it later.

--
regards
Yang, Sheng

> 
> Signed-off-by: Sheng Yang 
> ---
>  Documentation/kvm/api.txt |   22 
>  arch/x86/kvm/x86.c|6 
>  include/linux/kvm.h   |8 +-
>  virt/kvm/assigned-dev.c   |   60
> + 4 files changed, 95
> insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
> index d82d637..f324a50 100644
> --- a/Documentation/kvm/api.txt
> +++ b/Documentation/kvm/api.txt
> @@ -1087,6 +1087,28 @@ of 4 instructions that make up a hypercall.
>  If any additional field gets added to this structure later on, a bit for
> that additional piece of information will be set in the flags bitmap.
> 
> +4.47 KVM_ASSIGN_REG_MSIX_MMIO
> +
> +Capability: KVM_CAP_DEVICE_MSIX_MASK
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_assigned_msix_mmio (in)
> +Returns: 0 on success, !0 on error
> +
> +struct kvm_assigned_msix_mmio {
> + /* Assigned device's ID */
> + __u32 assigned_dev_id;
> + /* MSI-X table MMIO address */
> + __u64 base_addr;
> + /* Must be 0 */
> + __u32 flags;
> + /* Must be 0, reserved for future use */
> + __u64 reserved;
> +};
> +
> +This ioctl would enable in-kernel MSI-X emulation, which would handle
> MSI-X +mask bit in the kernel.
> +
>  5. The kvm_run structure
> 
>  Application code obtains a pointer to the kvm_run structure by
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fc62546..ba07a2f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1927,6 +1927,8 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_X86_ROBUST_SINGLESTEP:
>   case KVM_CAP_XSAVE:
>   case KVM_CAP_ENABLE_CAP:
> + case KVM_CAP_DEVICE_MSIX_EXT:
> + case KVM_CAP_DEVICE_MSIX_MASK:
>   r = 1;
>   break;
>   case KVM_CAP_COALESCED_MMIO:
> @@ -2717,6 +2719,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu
> *vcpu, return -EINVAL;
> 
>   switch (cap->cap) {
> + case KVM_CAP_DEVICE_MSIX_EXT:
> + vcpu->kvm->arch.msix_flags_enabled = true;
> + r = 0;
> + break;
>   default:
>   r = -EINVAL;
> 

Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device

2010-10-21 Thread Michael S. Tsirkin
On Thu, Oct 21, 2010 at 03:10:19PM +0800, Sheng Yang wrote:
> On Thursday 21 October 2010 03:02:24 Marcelo Tosatti wrote:
> > On Wed, Oct 20, 2010 at 04:26:24PM +0800, Sheng Yang wrote:
> > > Here is v2.
> > > 
> > > Changelog:
> > > 
> > > v1->v2
> > > 
> > > The major change from v1 is I've added the in-kernel MSI-X mask emulation
> > > support, as well as adding shortcuts for reading MSI-X table.
> > > 
> > > I've taken Michael's advice to use mask/unmask directly, but unsure about
> > > exporting irq_to_desc() for module...
> > > 
> > > Also add flush_work() according to Marcelo's comments.
> > > 
> > > Sheng Yang (8):
> > >   PCI: MSI: Move MSI-X entry definition to pci_regs.h
> > >   irq: Export irq_to_desc() to modules
> > >   KVM: x86: Enable ENABLE_CAP capability for x86
> > >   KVM: Move struct kvm_io_device to kvm_host.h
> > >   KVM: Add kvm_get_irq_routing_entry() func
> > >   KVM: assigned dev: Preparation for mask support in userspace
> > >   KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
> > >   KVM: Emulation MSI-X mask bits for assigned devices
> > 
> > Why does the current scheme, without msix per-vector mask support, is
> > functional at all? Luck?
> 
> Well, I believe we are lucky... We just ignored the operation in the past.
> 
> I had raised this issue when Michael begin to work on MSI-X support in QEmu 
> long 
> ago, but then I was busy on some other things. Until now when Eddie want to 
> add 
> MSI-X in-kernel acceleration, we back to it...
> 
> And about the "flush_work()" you commented, I still think even for the native 
> device, it's possible that the short time after OS write to the mask bit, the 
> interrupt may be delivered, if the device already send the message out on the 
> bus(I just guess, haven't observed)... The spec didn't say that the finish of 
> writing mask bit behavior would also get all message on the bus delivered. So 
> I 
> think leave the work a little late would also be fine.

Yes ... but e.g. bus read afterwards would have to flush the
write up the bus...

> --
> regards
> Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix build on !KVM_CAP_MCE

2010-10-21 Thread Hidetoshi Seto
This patch removes following warnings:

target-i386/kvm.c: In function 'kvm_put_msrs':
target-i386/kvm.c:782: error: unused variable 'i'
target-i386/kvm.c: In function 'kvm_get_msrs':
target-i386/kvm.c:1083: error: label at end of compound statement

Signed-off-by: Hidetoshi Seto 
---
 target-i386/kvm.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 512d533..786eeeb 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -779,7 +779,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 struct kvm_msr_entry entries[100];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
-int i, n = 0;
+int n = 0;
 
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
@@ -801,6 +801,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 }
 #ifdef KVM_CAP_MCE
 if (env->mcg_cap) {
+int i;
 if (level == KVM_PUT_RESET_STATE)
 kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
 else if (level == KVM_PUT_FULL_STATE) {
@@ -1085,9 +1086,9 @@ static int kvm_get_msrs(CPUState *env)
 if (msrs[i].index >= MSR_MC0_CTL &&
 msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
 env->mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data;
-break;
 }
 #endif
+break;
 }
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL net-2.6] vhost-net: access_ok fix

2010-10-21 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 19 Oct 2010 16:59:01 +0200

> David,
> Not sure if it's too late for 2.6.36 - in case it's not, the following tree
> includes a last minute bugfix for vhost-net, found by code inspection.
> It is on top of net-2.6.
> Thanks!
> 
> The following changes since commit b0057c51db66c5f0f38059f242c57d61c4741d89:
> 
>   tg3: restore rx_dropped accounting (2010-10-11 16:06:24 -0700)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Even though it's too late, I've pulled this.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Oct 19

2010-10-21 Thread Paolo Bonzini

On 10/21/2010 03:14 AM, Alexander Graf wrote:

I agree that some agent code for basic stuff like live snapshot
sync with the filesystem is small enough and worth to host within
qemu. Maybe we do need more than one project?


No, please. That's exactly what I don't want to see. The
libvirt/qemu/virt-man split is killing us already. How is this going
to become with 20 driver packs for the guest?


Agreed.  Not relying on Mata Hari and reinventing a dbus/WMI interface 
would be yet another case of QEMU NIH.  The same argument also works on 
the backend BTW, it can be virtio serial but also a Xen pvconsole and 
that wheel should not be reinvented either.


The guest agent should be a pluggable architecture, and QEMU can provide 
plugins for sync, spice, "info balloon" and everything else it needs.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing

2010-10-21 Thread Sheng Yang
On Thursday 21 October 2010 06:35:11 Michael S. Tsirkin wrote:
> On Wed, Oct 20, 2010 at 04:26:31PM +0800, Sheng Yang wrote:
> > It would be work with KVM_CAP_DEVICE_MSIX_MASK, which we would enable in
> > the last patch.
> > 
> > Signed-off-by: Sheng Yang 
> 
> Merge this with patch 8 - it does not make sense to add a bunch
> of users of the field msix_mmio_base but init it in the next patch.

I just meant to make the reviewer easier, seems I am fail. :)
> 
> > ---
> > 
> >  include/linux/kvm.h  |7 +++
> >  include/linux/kvm_host.h |2 +
> >  virt/kvm/assigned-dev.c  |  131
> >  ++ 3 files changed, 
140
> >  insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index a699ec9..0a7bd34 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -798,4 +798,11 @@ struct kvm_assigned_msix_entry {
> > 
> > __u16 padding[2];
> >  
> >  };
> > 
> > +struct kvm_assigned_msix_mmio {
> > +   __u32 assigned_dev_id;
> 
> I think avi commented - there's padding here.
> 
> > +   __u64 base_addr;
> > +   __u32 flags;
> > +   __u32 reserved[2];
> > +};
> > +
> > 
> >  #endif /* __LINUX_KVM_H */
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 81a6284..b67082f 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -465,6 +465,8 @@ struct kvm_assigned_dev_kernel {
> > 
> > struct pci_dev *dev;
> > struct kvm *kvm;
> > spinlock_t assigned_dev_lock;
> > 
> > +   u64 msix_mmio_base;
> > +   struct kvm_io_device msix_mmio_dev;
> > 
> >  };
> >  
> >  struct kvm_irq_mask_notifier {
> > 
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index bf96ea7..5d2adc4 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > 
> > @@ -739,6 +739,137 @@ msix_entry_out:
> > return r;
> >  
> >  }
> > 
> > +
> > +static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
> > + gpa_t addr, int len, int *idx)
> > +{
> > +   int i;
> > +
> > +   if (!(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > +   return false;
> > +   BUG_ON(adev->msix_mmio_base == 0);
> > +   for (i = 0; i < adev->entries_nr; i++) {
> > +   u64 start, end;
> > +   start = adev->msix_mmio_base +
> > +   adev->guest_msix_entries[i].entry * PCI_MSIX_ENTRY_SIZE;
> > +   end = start + PCI_MSIX_ENTRY_SIZE;
> > +   if (addr >= start && addr + len <= end) {
> > +   *idx = i;
> > +   return true;
> > +   }
> > +   }
> 
> We really should not need guest_msix_entries at all:
> if we are emulating MSIX in kernel anyway, let us just
> emulate it there. Doing half setup from qemu
> and half from kvm will just create problems.
> 
> If you do it all in kernel, you will simply need a single
> range check to see whether this is mask write.

Would explain it in the an separate mail. And please comments as well.

--
regards
Yang, Sheng

> 
> > +   return false;
> > +}
> > +
> > +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int
> > len, +void *val)
> > +{
> > +   struct kvm_assigned_dev_kernel *adev =
> > +   container_of(this, struct kvm_assigned_dev_kernel,
> > +msix_mmio_dev);
> > +   int idx, r = 0;
> > +   u32 entry[4];
> > +   struct kvm_kernel_irq_routing_entry *e;
> > +
> > +   mutex_lock(&adev->kvm->lock);
> > +   if (!msix_mmio_in_range(adev, addr, len, &idx)) {
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +   if ((addr & 0x3) || len != 4) {
> > +   printk(KERN_WARNING
> > +   "KVM: Unaligned reading for device MSI-X MMIO! "
> > +   "addr 0x%llx, len %d\n", addr, len);
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +
> > +   e = kvm_get_irq_routing_entry(adev->kvm,
> > +   adev->guest_msix_entries[idx].vector);
> > +   if (!e || e->type != KVM_IRQ_ROUTING_MSI) {
> > +   printk(KERN_WARNING "KVM: Wrong MSI-X routing entry! "
> > +   "addr 0x%llx, len %d\n", addr, len);
> > +   r = -EOPNOTSUPP;
> > +   goto out;
> > +   }
> > +   entry[0] = e->msi.address_lo;
> > +   entry[1] = e->msi.address_hi;
> > +   entry[2] = e->msi.data;
> > +   entry[3] = !!(adev->guest_msix_entries[idx].flags &
> > +   KVM_ASSIGNED_MSIX_MASK);
> > +   memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / 4], len);
> > +
> > +out:
> > +   mutex_unlock(&adev->kvm->lock);
> > +   return r;
> > +}
> > +
> > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
> > len, + const void *val)
> > +{
> > +   struct kvm_assigned_dev_kernel *adev =
> > +   container_of(this, struct kvm_assigned_dev_kernel,
> > +msi

Re: [PATCH 08/10] MCE: Relay UCR MCE to guest

2010-10-21 Thread Paolo Bonzini

On 10/21/2010 12:03 AM, Anthony Liguori wrote:


The timeout of qemu_kvm_eat_signal is always zero.


So then qemu_kvm_eat_signal purely polls and it will happily keep
polling as long as there is a signal pending.

So what's the point of doing a sigtimedwait() and dropping qemu_mutex?


I agree that keeping the qemu_mutex makes sense if you remove the 
timeout argument (which I even have a patch for, as part of my Win32 
iothread series).  Until there is the theoretical possibility of 
suspending the process, qemu_kvm_eat_signal should drop the mutex.



Why not just check sigpending in a loop?


Because sigtimedwait eats the signal, unlike sigpending (and sigsuspend).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device

2010-10-21 Thread Sheng Yang
On Wednesday 20 October 2010 17:51:01 Avi Kivity wrote:
>   On 10/20/2010 10:26 AM, Sheng Yang wrote:
> > Here is v2.
> > 
> > Changelog:
> > 
> > v1->v2
> > 
> > The major change from v1 is I've added the in-kernel MSI-X mask emulation
> > support, as well as adding shortcuts for reading MSI-X table.
> > 
> > I've taken Michael's advice to use mask/unmask directly, but unsure about
> > exporting irq_to_desc() for module...
> > 
> > Also add flush_work() according to Marcelo's comments.
> 
> Any performance numbers?  What are the affected guests?  just RHEL 4, or
> any others?

At least current RHEL5 series would be affected. I have done an simple 
benchmark on 
RHEL5u5 guest with 512m memory and 1 cpu. Device is one 10G NIC with SRIOV. One 
VF 
was assigned to guest to communicate the PF in the host. 3 threads had been 
used 
in iperf of guest to push the CPU utilization to 100%. In this condition, QEmu 
method's bandwidth is about 20% lower than in-kernel one(~7.5G vs ~9G). 
Interrupt 
rate in this condition is about 20k/sec.

The reason is 2.6.18 kernel used mask_msi in ack() for MSI chip, caused 
significant 
mask bit operations when interrupt rate is high.

We have also reproduced the issue on some large scale benchmark on the guest 
with 
newer kernel like 2.6.30 on Xen, under very high interrupt rate, due to some 
interrupt rate limitation mechanism in kernel.
> 
> Alex, Michael, how would you do this with vfio?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM Test: add -w parameter in nc command in kvm_utils.py

2010-10-21 Thread Michael Goldish
On 10/21/2010 03:57 AM, Feng Yang wrote:
> 
> - "Michael Goldish"  wrote:
> 
>> From: "Michael Goldish" 
>> To: "Feng Yang" 
>> Cc: autot...@test.kernel.org, kvm@vger.kernel.org
>> Sent: Wednesday, October 20, 2010 6:48:42 PM GMT +08:00 Beijing / Chongqing 
>> / Hong Kong / Urumqi
>> Subject: Re: [PATCH] KVM Test: add -w parameter in nc command in kvm_utils.py
>>
>> On 10/20/2010 12:18 PM, Feng Yang wrote:
>>>
>>> - "Michael Goldish"  wrote:
>>>
 From: "Michael Goldish" 
 To: "Feng Yang" 
 Cc: autot...@test.kernel.org, kvm@vger.kernel.org
 Sent: Wednesday, October 20, 2010 5:11:32 PM GMT +08:00 Beijing /
>> Chongqing / Hong Kong / Urumqi
 Subject: Re: [PATCH] KVM Test: add -w parameter in nc command in
>> kvm_utils.py

 On 10/20/2010 08:55 AM, Feng Yang wrote:
> If a connection and stdin are idle for more than timeout seconds,
> then the connection is silently closed. withou this paramter,
> nc will listen forever for a connection. This may cause our test
> hang. redmine issue:
> http://redmine.englab.nay.redhat.com/issues/show/6947
> Add -w parameter should fix this issue.
>
> Signed-off-by: Feng Yang 
> ---
>  client/tests/kvm/kvm_utils.py |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/client/tests/kvm/kvm_utils.py
 b/client/tests/kvm/kvm_utils.py
> index a51c857..526d38d 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -733,7 +733,7 @@ def remote_login(client, host, port,
>> username,
 password, prompt, linesep="\n",
>  elif client == "telnet":
>  cmd = "telnet -l %s %s %s" % (username, host, port)
>  elif client == "nc":
> -cmd = "nc %s %s" % (host, port)
> +cmd = "nc %s %s -w %s" % (host, port, timeout)
>  else:
>  logging.error("Unknown remote shell client: %s" %
>> client)
>  return

 I don't understand how remote_login() can stall here.
 kvm_utils._remote_login() doesn't rely on nc's self-termination. 
>> If
 no
 shell prompt is available after 10 seconds, nc is forcefully
 terminated
 and _remote_login() fails.  If it somehow stalls this might
>> indicate
 a
 bug somewhere.
>>> It is really rarely reproduce. Only meet this issue when guest panic
>> or core dump.
>>> nc stall. Do not know why it is not terminated after 10s.
>>>
>>> I think -w parameter is helpful in this situation.
>>> If it does not cause other issue,  we'd better add -w in nc
>> command.
>>>
>>> Thanks for your command. What do you think?
>>>
>>> Feng Yang
>>
>> Adding -w 10 may cause trouble.  If a good functional session is idle
>> for more than 10 seconds during a test, it will be closed.  For
>> example,
>> if you run a command that takes more than 10 seconds to complete, and
>> produces no output while it runs, the session will be closed.
> Seems the session also will be closed in our current code in this situation.
> If no output for 10s, _remote_login() will return False. Then session will be 
> closed in remote_login().

If the login process takes more than 10s, yes, the session will be
closed.  However, if we use -w 10, the session will be closed whenever
it becomes idle, not just during login.  So if we login successfully and
then run a test (e.g. an autoit script), after 10s of no output the test
will be terminated.

> I have run some cases, it works well.  But I think we'd better first post 
> this code, in case it really cause trouble.
> 
> I will do more test on it.
> 
>>
>> Also, I doubt it'll solve the problem you've experienced.  As it is,
>> the
>> code should (and usually does) properly handle the case of nc not
>> terminating.
>>
>> If you manage to reproduce this again please save the log so I can
>> have
>> a look at it.
> 
> When meet this issue again, I will send log to you. Thanks for your help!
> 
> Feng Yang
> 
> 
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] qemu-kvm: device assignment: Enable in-kernel MSI-X mask support

2010-10-21 Thread Sheng Yang
On Thursday 21 October 2010 06:36:56 Michael S. Tsirkin wrote:
> On Wed, Oct 20, 2010 at 04:29:55PM +0800, Sheng Yang wrote:
> > Signed-off-by: Sheng Yang 
> > ---
> > 
> >  hw/device-assignment.c |   13 +
> >  1 files changed, 13 insertions(+), 0 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index d1a6282..aa3358e 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -269,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >  AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> >  PCIRegion *real_region = &r_dev->real_device.regions[region_num];
> >  int ret = 0;
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +struct kvm_assigned_msix_mmio msix_mmio;
> > +#endif
> > 
> >  DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> >  FMT_PCIBUS " region_num=%d \n",
> >  
> >e_phys, region->u.r_virtbase, type, e_size, region_num);
> > 
> > @@ -287,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >  cpu_register_physical_memory(e_phys + offset,
> >  
> >  TARGET_PAGE_SIZE, r_dev->mmio_index);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +   memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > +   msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> > +   r_dev->h_busnr, r_dev->h_devfn);
> > +   msix_mmio.base_addr = e_phys + offset;
> > +   if (kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio))
> > +fprintf(stderr, "fail to register in-kernel
> > msix_mmio!\n"); +/* We can still continue because the MMIO
> > accessing can fall + * back to QEmu */
> 
> So let's not print scary messages ...

OK...

--
regards
Yang, Sheng

> 
> > +#endif
> > 
> >  }
> >  
> >  }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device

2010-10-21 Thread Sheng Yang
On Thursday 21 October 2010 03:02:24 Marcelo Tosatti wrote:
> On Wed, Oct 20, 2010 at 04:26:24PM +0800, Sheng Yang wrote:
> > Here is v2.
> > 
> > Changelog:
> > 
> > v1->v2
> > 
> > The major change from v1 is I've added the in-kernel MSI-X mask emulation
> > support, as well as adding shortcuts for reading MSI-X table.
> > 
> > I've taken Michael's advice to use mask/unmask directly, but unsure about
> > exporting irq_to_desc() for module...
> > 
> > Also add flush_work() according to Marcelo's comments.
> > 
> > Sheng Yang (8):
> >   PCI: MSI: Move MSI-X entry definition to pci_regs.h
> >   irq: Export irq_to_desc() to modules
> >   KVM: x86: Enable ENABLE_CAP capability for x86
> >   KVM: Move struct kvm_io_device to kvm_host.h
> >   KVM: Add kvm_get_irq_routing_entry() func
> >   KVM: assigned dev: Preparation for mask support in userspace
> >   KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
> >   KVM: Emulation MSI-X mask bits for assigned devices
> 
> Why does the current scheme, without msix per-vector mask support, is
> functional at all? Luck?

Well, I believe we are lucky... We just ignored the operation in the past.

I had raised this issue when Michael begin to work on MSI-X support in QEmu 
long 
ago, but then I was busy on some other things. Until now when Eddie want to add 
MSI-X in-kernel acceleration, we back to it...

And about the "flush_work()" you commented, I still think even for the native 
device, it's possible that the short time after OS write to the mask bit, the 
interrupt may be delivered, if the device already send the message out on the 
bus(I just guess, haven't observed)... The spec didn't say that the finish of 
writing mask bit behavior would also get all message on the bus delivered. So I 
think leave the work a little late would also be fine.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html