Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices
On Thursday 21 October 2010 16:39:07 Michael S. Tsirkin wrote: > On Thu, Oct 21, 2010 at 04:30:02PM +0800, Sheng Yang wrote: > > On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote: > > > This patch enable per-vector mask for assigned devices using MSI-X. > > > > The basic idea of kernel and QEmu's responsibilities are: > > > > 1. Because QEmu owned the irq routing table, so the change of table > > should still go to the QEmu, like we did in msix_mmio_write(). > > > > 2. And the others things can be done in kernel, for performance. Here we > > covered the reading(converted entry from routing table and mask bit > > state of enabled MSI-X entries), and writing the mask bit for enabled > > MSI-X entries. Originally we only has mask bit handled in kernel, but > > later we found that Linux kernel would read MSI-X mmio just after every > > writing to mask bit, in order to flush the writing. So we add reading > > MSI data/addr as well. > > > > 3. Disabled entries's mask bit accessing would go to QEmu, because it may > > result in disable/enable MSI-X. Explained later. > > > > 4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to > > decide enable/disable MSI-X for device. > > . > > Config space yes, but it's a simple global yes/no after all. > > > 5. There is an distinction between enabled entry and disabled entry of > > MSI-X table. > > That's my point. There's no such thing as 'enabled entries' > in the spec. There are only masked and unmasked entries. > > Current interface deals with gsi numbers so qemu had to work around > this. The hack used there is removing gsi for masked vector which has 0 > address and data. It works because this is what linux and windows > guests happen to do, but it is out of spec: vector/data value for a > masked entry have no meaning. Well, I just realized something unnatural about the 0 contained data/address entry. So I checked spec again, and found the mask bit should be set after reset. So after fix this, I think unmasked 0 address/data entry shouldn't be there anymore. > Since you are building a new interface, can design it without > constraints... A constraint is pci_enable_msix(). We have to use it to allocate irq for each entry, as well as program the entry in the real hardware. pci_enable_msix() is only a yes/no choice. We can't add new enabled entries after pci_enable_msix(), and we can only enable/disable/mask/unmask one IRQ through kernel API, not the entry in the MSI-X table. And we still have to allocate new IRQ for new entry. When guest unmask "disabled entry", we have to disable and enable MSI-X again in order to use the new entry. That's why "enabled/disabled entry" concept existed. So even guest only unmasked one entry, it's a totally different work for KVM underlaying. This logic won't change no matter where the MMIO handling is. And in fact I don't like this kind of tricky things in kernel... > > The entries we had used for pci_enable_msix()(not necessary in sequence > > number) are already enabled, the others are disabled. When device's MSI-X > > is enabled and guest want to enable an disabled entry, we would go back > > to QEmu because this vector didn't exist in the routing table. Also due > > to pci_enable_msix() in kernel didn't allow us to enable vectors one by > > one, but all at once. So we have to disable MSI-X first, then enable it > > with new entries, which contained the new vector guest want to use. This > > situation is only happen when device is being initialized. After that, > > kernel can know and handle the mask bit of the enabled entry. > > > > I've also considered handle all MMIO operation in kernel, and changing > > irq routing in kernel directly. But as long as irq routing is owned by > > QEmu, I think it's better to leave to it... > > Yes, this is my suggestion, except we don't need no routing :) > To inject MSI you just need address/data pairs. > Look at kvm_set_msi: given address/data you can just inject > the interrupt. No need for table lookups. You still need to look up data/address pair in the guest MSI-X table. The routing table used here is just an replacement for the table, because we can construct the entry according to the routing table. Two choices, using the routing table, or creating an new MSI-X table. Still, the key is about who to own the routing/MSI-X table. If kernel own it, it would be straightforward to intercept all the MMIO in the kernel; but if it's QEmu, we still need go back to QEmu for it. > > Notice the mask/unmask bits must be handled together, either in kernel or > > in userspace. Because if kernel has handled enabled vector's mask bit > > directly, it would be unsync with QEmu's records. It doesn't matter when > > QEmu don't access the related record. And the only place QEmu want to > > consult it's enabled entries' mask bit state is writing to MSI > > addr/data. The writing should be discarded if the entry is unmasked. > > This checking has alrea
buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/549 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/549 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/601 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/600 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: pre-allocate one more dirty bitmap to avoid vmalloc() in x86's kvm_vm_ioctl_get_dirty_log()
On Thu, Oct 21, 2010 at 05:40:33PM +0900, Takuya Yoshikawa wrote: > Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by > vmalloc() which will be used in the next logging and this has been causing > bad effect to VGA and live-migration: vmalloc() consumes extra systime, > triggers tlb flush, etc. > > This patch resolves this issue by pre-allocating one more bitmap and switching > between two bitmaps during dirty logging. > > Performance improvement: > I measured performance for the case of VGA update by trace-cmd. > So the result was 1.5 times faster than the original. > > > In the case of live migration, the improvement ratio depends on the workload > and the guest memory size. > > Note: > This does not change other architectures's logic but the allocation size > becomes twice. This will increase the actual memory consumption only when > the new size changes the number of pages allocated by vmalloc(). > > Signed-off-by: Takuya Yoshikawa > Signed-off-by: Fernando Luis Vazquez Cao Looks good to me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VM with two interfaces
On Thu, Oct 21, 2010 at 9:23 AM, Nirmal Guhan wrote: > Hi, > > Am trying to create a VM using qemu-kvm with two interfaces(fedora12 > is the host and vm) and running into an issue. Given below is the > command : > > qemu-kvm -net nic,macaddr=$macaddress,model=pcnet -net > tap,script=/etc/qemu-ifup -net nic,model=pcnet -net > tap,script=/etc/qemu-ifup -m 1024 -hda ./vdisk.img -kernel > ./bzImage-1019 -append "ip=x.y.z.w:a.b.c.d:p.q.r.s:a.b.c.d > ip=x.y.z.u:a.b.c.d:p.q.r.s:a.b.c.d root=/dev/nfs rw > nfsroot=x.y.z.v:/blahblahblah" > > On boot, both eth0 and eth1 come up but the vm tries to send dhcp and > rarp requests instead of using the command line IP addresses. DHCP > would fail in my case. > > With just one interface, dhcp is not attempted and nfs mount of root works > fine. > > Any clue on what could be wrong here? > > Thanks, > Nirmal > Can someone help please? Hard pressed on time... sorry -Nirmal -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSC host kernel message with kvm.git as host kernel and qemu-kvm.git as userspace
On 10/21/2010 09:49 AM, Lucas Meneghel Rodrigues wrote: Hi folks, I was doing some work at one of our host machines, the one that runs upstream (Fedora 14, kvm.git as the kernel and qemu-kvm.git as userspace), and I noticed the kernel message: kvm: unreliable cycle conversion on adjustable rate TSC kvm: SMP vm created on host with unstable TSC: guest TSC will not be reliable Yes, the message is a warning; without TSC trapping, SMP VMs will have unreliable TSC. In this case, you are advised to use kvmclock or another non-TSC clocksource in the guest. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2 hosts sharing iSCSI
Hi. I want to build two KVM servers sharing an iSCSI storage backend. The goal is if one server falls the other starts all the guests. I've been searching for documents about best practices with KVM and iSCSI. I already read http://pve.proxmox.com/wiki/Storage_Model as adviced but I'd appreciate if someone could point me to some specifics about iSCSI. I am not sure if I should build DRDB on top of it or should I use something like heartbeat ? Or maybe the iSCSI vendor should provide me some software to do this ? Any hint would be appreciated, thank you very much. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
TSC host kernel message with kvm.git as host kernel and qemu-kvm.git as userspace
Hi folks, I was doing some work at one of our host machines, the one that runs upstream (Fedora 14, kvm.git as the kernel and qemu-kvm.git as userspace), and I noticed the kernel message: kvm: unreliable cycle conversion on adjustable rate TSC kvm: SMP vm created on host with unstable TSC: guest TSC will not be reliable The host being tested in this case was RHEL 5.5 i386. Command line of the install test: /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20101021-160403-PrKT',server,nowait -qmp unix:'/tmp/monitor-qmpmonitor1-20101021-160403-PrKT',server,nowait -serial unix:'/tmp/serial-20101021-160403-PrKT',server,nowait -drive file='/tmp/kvm_autotest_root/images/rhel5-32.qcow2',index=0,if=ide,cache=none -device rtl8139,mac=9a:47:46:1a:e6:b9,netdev=idAcPzhW -netdev user,id=idAcPzhW,tftp='/usr/local/autotest/tests/kvm/images/rhel55-32/tftpboot',bootfile='/pxelinux.0',hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:12323 -m 1024 -smp 2 -drive file='/tmp/kvm_autotest_root/isos/linux/RHEL-5.5-i386-DVD.iso',media=cdrom,index=2 -drive file='/tmp/kvm_autotest_root/images/rhel55-32/ks.iso',media=cdrom,index=1 -vnc :0 -boot d -boot cn I was oriented by Marcelo to report this to the kvm list and copy Zach on the e-mail. There, done :) Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 06:25 PM, Anthony Liguori wrote: Hi Andrew, On 10/21/2010 10:43 AM, Andrew Beekhof wrote: In that case we've done a bad job of the wiki. Windows and other distributions are a key part of the Matahari vision. Matahari is two things - an architecture, and - an implementation of the most common API sets Each set of APIs (ie. host, network, services) is an independent daemon/agent which attaches to a common QMF broker (more on that later). While some of these might be platform specific, packaging would be one likely candidate, the intention is to be agnostic distro/platform wherever possible. Take netcf for example, instead of re-inventing the wheel we wrote the windows port for netcf. So what's this about QMF you ask? Again, rather than invent our own message protocol we're leveraging an existing standard that supports windows and linux, is fast, reliable and secure. Its also pluggable and discoverable - so simply starting a new agent that connects to the matahari broker makes it's API available. Any QMF client/console can also interrogate the guest to see what agents and API calls are available. Even better there's going to be a virtio-serial transport. So we can access the same agents in the same way with or without host-to-guest networking. This was a key requirement for us because of EC2-like cloud scenarios where we don't have access to the physical host. I did get this much and I think I'm doing a poor job explaining myself. I think Matahari is tackling the same space that many other frameworks are. For instance, everything you say above is (supposed to be) true for something like OpenWBEM, Pegasus, etc. I think the scope is also different. Our focus is on satisfying concrete needs rather than nebulous all-encompassing goals. The advantage I see in Matahari is that 1) it can take advantage of virtio-serial 2) it's significantly lighter than CIM 3) it's community driven. So there's no doubt in my mind that if you need a way to inventory physical and virtual systems, something like Matahari becomes a very appealing option to do that. But that's not the problem space I'm trying to tackle. Neither are we really. I came to Matahari came from clustering (I wrote Pacemaker), so service management is my original area of interest. But for the sake of argument, assume inventory was our sole focus. There is, by design, a place in the architecture for third-party agents. Just because the agent wasn't built from matahari.git doesn't mean it can't make use of "our" bus. An example of the problem I'm trying to tackle is guest reboot. Reboot was one of the first things we implemented :-) On x86, there is no ACPI interface for requesting a guest reboot. Other platforms do provide this and we usually try to emulate platform interfaces whenever possible. In order to implement host-initiated reboot (aka virDomainReboot) we need to have a mechanism to execute the reboot command in the guest that's initiated by the hypervisor. This is not the same thing as a remote system's management interface. This is something that should be dead-simple and trivially portable. I think there's symbiotic relationship that a QEMU-based guest agent could play with a Matahari based agent. But my initial impression is that the types of problems we're trying to tackle are fundamentally different than the problem that Matahari is Honestly, we're interested in whatever the teams that want to work with us are interested in. Our primary mission is to consolidate common functionality and avoid N implementations of essentially the same thing. And I suspect that many people would be interested in having what you're working on exposed remotely. So if you'd like to do this agent as part of Matahari, great. But if you want to keep it separate and just want to leverage the design, that also not a problem. Or if the core functionality was in a library, we'd happily write the glue to make it accessible via QMF and dbus. There's really a number of levels on which we could work together - if you're interested. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] kvm: writeback SMP TSCs on migration only
commit 6389c45441269baa2873e6feafebd17105ddeaf6 Author: Jan Kiszka Date: Mon Mar 1 18:17:26 2010 +0100 qemu-kvm: Cleanup/fix TSC and PV clock writeback Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 06474d6..e2f7e2e 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -817,7 +817,15 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar); #endif if (level == KVM_PUT_FULL_STATE) { -kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); +/* + * KVM is yet unable to synchronize TSC values of multiple VCPUs on + * writeback. Until this is fixed, we only write the offset to SMP + * guests after migration, desynchronizing the VCPUs, but avoiding + * huge jump-backs that would occur without any writeback at all. + */ +if (smp_cpus == 1 || env->tsc != 0) { +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc); +} kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] kvm: save/restore x86-64 MSRs on x86-64 kernels
Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 30 -- 1 files changed, 20 insertions(+), 10 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index e2f7e2e..ae0a034 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -53,6 +54,8 @@ #define BUS_MCEERR_AO 5 #endif +static int lm_capable_kernel; + #ifdef KVM_CAP_EXT_CPUID static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max) @@ -523,6 +526,11 @@ int kvm_arch_init(KVMState *s, int smp_cpus) { int ret; +struct utsname utsname; + +uname(&utsname); +lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0; + /* create vm86 tss. KVM uses vm86 mode to emulate 16-bit code * directly. In order to use vm86 mode, a TSS is needed. Since this * must be part of guest physical memory, we need to allocate it. Older @@ -810,11 +818,12 @@ static int kvm_put_msrs(CPUState *env, int level) if (kvm_has_msr_hsave_pa(env)) kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); #ifdef TARGET_X86_64 -/* FIXME if lm capable */ -kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); -kvm_msr_entry_set(&msrs[n++], MSR_KERNELGSBASE, env->kernelgsbase); -kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask); -kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar); +if (lm_capable_kernel) { +kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); +kvm_msr_entry_set(&msrs[n++], MSR_KERNELGSBASE, env->kernelgsbase); +kvm_msr_entry_set(&msrs[n++], MSR_FMASK, env->fmask); +kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar); +} #endif if (level == KVM_PUT_FULL_STATE) { /* @@ -1046,11 +1055,12 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_VM_HSAVE_PA; msrs[n++].index = MSR_IA32_TSC; #ifdef TARGET_X86_64 -/* FIXME lm_capable_kernel */ -msrs[n++].index = MSR_CSTAR; -msrs[n++].index = MSR_KERNELGSBASE; -msrs[n++].index = MSR_FMASK; -msrs[n++].index = MSR_LSTAR; +if (lm_capable_kernel) { +msrs[n++].index = MSR_CSTAR; +msrs[n++].index = MSR_KERNELGSBASE; +msrs[n++].index = MSR_FMASK; +msrs[n++].index = MSR_LSTAR; +} #endif msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] Fix build on !KVM_CAP_MCE
From: Hidetoshi Seto This patch removes following warnings: target-i386/kvm.c: In function 'kvm_put_msrs': target-i386/kvm.c:782: error: unused variable 'i' target-i386/kvm.c: In function 'kvm_get_msrs': target-i386/kvm.c:1083: error: label at end of compound statement Signed-off-by: Hidetoshi Seto Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 9144f74..587ee19 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -783,7 +783,7 @@ static int kvm_put_msrs(CPUState *env, int level) struct kvm_msr_entry entries[100]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; -int i, n = 0; +int n = 0; kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs); kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp); @@ -805,6 +805,7 @@ static int kvm_put_msrs(CPUState *env, int level) } #ifdef KVM_CAP_MCE if (env->mcg_cap) { +int i; if (level == KVM_PUT_RESET_STATE) kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status); else if (level == KVM_PUT_FULL_STATE) { @@ -1089,9 +1090,9 @@ static int kvm_get_msrs(CPUState *env) if (msrs[i].index >= MSR_MC0_CTL && msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) { env->mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data; -break; } #endif +break; } } -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] [PULL] qemu-kvm.git uq/master queue
The following changes since commit 633aa0acfe2c4d3e56acfe28c912796bf54de6d3: Fix pci hotplug to generate level triggered interrupt. (2010-10-20 17:23:28 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Hidetoshi Seto (3): x86, mce: ignore SRAO only when MCG_SER_P is available x86, mce: broadcast mce depending on the cpu version Fix build on !KVM_CAP_MCE Marcelo Tosatti (4): kvm: add save/restore of MSR_VM_HSAVE_PA kvm: factor out kvm_has_msr_star kvm: writeback SMP TSCs on migration only kvm: save/restore x86-64 MSRs on x86-64 kernels target-i386/kvm.c | 132 +++- 1 files changed, 99 insertions(+), 33 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] kvm: add save/restore of MSR_VM_HSAVE_PA
commit 2bba4446746add456ceeb0e8359a43032a2ea333 Author: Alexander Graf Date: Thu Dec 18 15:38:32 2008 +0100 Enable nested SVM support in userspace Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 587ee19..e6c9a1d 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -790,6 +790,7 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip); if (kvm_has_msr_star(env)) kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star); +kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); #ifdef TARGET_X86_64 /* FIXME if lm capable */ kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); @@ -1015,6 +1016,7 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_IA32_SYSENTER_EIP; if (kvm_has_msr_star(env)) msrs[n++].index = MSR_STAR; +msrs[n++].index = MSR_VM_HSAVE_PA; msrs[n++].index = MSR_IA32_TSC; #ifdef TARGET_X86_64 /* FIXME lm_capable_kernel */ @@ -1071,6 +1073,9 @@ static int kvm_get_msrs(CPUState *env) case MSR_IA32_TSC: env->tsc = msrs[i].data; break; +case MSR_VM_HSAVE_PA: +env->vm_hsave = msrs[i].data; +break; case MSR_KVM_SYSTEM_TIME: env->system_time_msr = msrs[i].data; break; -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] x86, mce: ignore SRAO only when MCG_SER_P is available
From: Hidetoshi Seto And restruct this block to call kvm_mce_in_exception() only when it is required. Signed-off-by: Hidetoshi Seto Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 16 ++-- 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 512d533..b813953 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -239,12 +239,16 @@ static void kvm_do_inject_x86_mce(void *_data) struct kvm_x86_mce_data *data = _data; int r; -/* If there is an MCE excpetion being processed, ignore this SRAO MCE */ -r = kvm_mce_in_exception(data->env); -if (r == -1) -fprintf(stderr, "Failed to get MCE status\n"); -else if (r && !(data->mce->status & MCI_STATUS_AR)) -return; +/* If there is an MCE exception being processed, ignore this SRAO MCE */ +if ((data->env->mcg_cap & MCG_SER_P) && +!(data->mce->status & MCI_STATUS_AR)) { +r = kvm_mce_in_exception(data->env); +if (r == -1) { +fprintf(stderr, "Failed to get MCE status\n"); +} else if (r) { +return; +} +} r = kvm_set_mce(data->env, data->mce); if (r < 0) { -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] x86, mce: broadcast mce depending on the cpu version
From: Hidetoshi Seto There is no reason why SRAO event received by the main thread is the only one that being broadcasted. According to the x86 ASDM vol.3A 15.10.4.1, MCE signal is broadcast on processor version 06H_EH or later. This change is required to handle SRAR in smp guests. Signed-off-by: Hidetoshi Seto Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 29 - 1 files changed, 24 insertions(+), 5 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index b813953..9144f74 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1636,6 +1636,28 @@ static void hardware_memory_error(void) exit(1); } +#ifdef KVM_CAP_MCE +static void kvm_mce_broadcast_rest(CPUState *env) +{ +CPUState *cenv; +int family, model, cpuver = env->cpuid_version; + +family = (cpuver >> 8) & 0xf; +model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf); + +/* Broadcast MCA signal for processor version 06H_EH and above */ +if ((family == 6 && model >= 14) || family > 6) { +for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) { +if (cenv == env) { +continue; +} +kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, + MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1); +} +} +} +#endif + int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr) { #if defined(KVM_CAP_MCE) @@ -1693,6 +1715,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr) fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno)); abort(); } +kvm_mce_broadcast_rest(env); } else #endif { @@ -1715,7 +1738,6 @@ int kvm_on_sigbus(int code, void *addr) void *vaddr; ram_addr_t ram_addr; target_phys_addr_t paddr; -CPUState *cenv; /* Hope we are lucky for AO MCE */ vaddr = addr; @@ -1731,10 +1753,7 @@ int kvm_on_sigbus(int code, void *addr) kvm_inject_x86_mce(first_cpu, 9, status, MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr, (MCM_ADDR_PHYS << 6) | 0xc, 1); -for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) { -kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, - MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1); -} +kvm_mce_broadcast_rest(first_cpu); } else #endif { -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] kvm: factor out kvm_has_msr_star
And add kvm_has_msr_hsave_pa(), to avoid warnings on older kernels without support. Signed-off-by: Marcelo Tosatti --- target-i386/kvm.c | 41 ++--- 1 files changed, 30 insertions(+), 11 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index e6c9a1d..06474d6 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -438,23 +438,26 @@ void kvm_arch_reset_vcpu(CPUState *env) } } -static int kvm_has_msr_star(CPUState *env) +int has_msr_star; +int has_msr_hsave_pa; + +static void kvm_supported_msrs(CPUState *env) { -static int has_msr_star; +static int kvm_supported_msrs; int ret; /* first time */ -if (has_msr_star == 0) { +if (kvm_supported_msrs == 0) { struct kvm_msr_list msr_list, *kvm_msr_list; -has_msr_star = -1; +kvm_supported_msrs = -1; /* Obtain MSR list from KVM. These are the MSRs that we must * save/restore */ msr_list.nmsrs = 0; ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, &msr_list); if (ret < 0 && ret != -E2BIG) { -return 0; +return; } /* Old kernel modules had a bug and could write beyond the provided memory. Allocate at least a safe amount of 1K. */ @@ -470,7 +473,11 @@ static int kvm_has_msr_star(CPUState *env) for (i = 0; i < kvm_msr_list->nmsrs; i++) { if (kvm_msr_list->indices[i] == MSR_STAR) { has_msr_star = 1; -break; +continue; +} +if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) { +has_msr_hsave_pa = 1; +continue; } } } @@ -478,9 +485,19 @@ static int kvm_has_msr_star(CPUState *env) free(kvm_msr_list); } -if (has_msr_star == 1) -return 1; -return 0; +return; +} + +static int kvm_has_msr_hsave_pa(CPUState *env) +{ +kvm_supported_msrs(env); +return has_msr_hsave_pa; +} + +static int kvm_has_msr_star(CPUState *env) +{ +kvm_supported_msrs(env); +return has_msr_star; } static int kvm_init_identity_map_page(KVMState *s) @@ -790,7 +807,8 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip); if (kvm_has_msr_star(env)) kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star); -kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); +if (kvm_has_msr_hsave_pa(env)) +kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); #ifdef TARGET_X86_64 /* FIXME if lm capable */ kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); @@ -1016,7 +1034,8 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_IA32_SYSENTER_EIP; if (kvm_has_msr_star(env)) msrs[n++].index = MSR_STAR; -msrs[n++].index = MSR_VM_HSAVE_PA; +if (kvm_has_msr_hsave_pa(env)) +msrs[n++].index = MSR_VM_HSAVE_PA; msrs[n++].index = MSR_IA32_TSC; #ifdef TARGET_X86_64 /* FIXME lm_capable_kernel */ -- 1.7.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Move KVM context switch into own function
On Wed, Oct 20, 2010 at 05:56:17PM +0200, Andi Kleen wrote: > From: Andi Kleen > > gcc 4.5 with some special options is able to duplicate the VMX > context switch asm in vmx_vcpu_run(). This results in a compile error > because the inline asm sequence uses an on local label. The non local > label is needed because other code wants to set up the return address. > > This patch moves the asm code into an own function and marks > that explicitely noinline to avoid this problem. > > Better would be probably to just move it into an .S file. > > The diff looks worse than the change really is, it's all just > code movement and no logic change. > > Signed-off-by: Andi Kleen Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason
On Wed, Oct 20, 2010 at 06:34:54PM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > May otherwise generates build warnings about unused > kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST > enabled. > > Signed-off-by: Jan Kiszka > --- > arch/x86/include/asm/kvm_para.h |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
* Anthony Liguori (anth...@codemonkey.ws) wrote: > So there's no doubt in my mind that if you need a way to inventory > physical and virtual systems, something like Matahari becomes a very > appealing option to do that. > > But that's not the problem space I'm trying to tackle. > > An example of the problem I'm trying to tackle is guest reboot. Matahari already has shutdown and reboot methods. Inventory, reboot, filesystem freeze, cut'n paste, etc.. all are communicating between host and guest. Main point is to consolidate effort to keep from having some sprawl of agents (which agent do I install to do reboot?). thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
Hi Andrew, On 10/21/2010 10:43 AM, Andrew Beekhof wrote: In that case we've done a bad job of the wiki. Windows and other distributions are a key part of the Matahari vision. Matahari is two things - an architecture, and - an implementation of the most common API sets Each set of APIs (ie. host, network, services) is an independent daemon/agent which attaches to a common QMF broker (more on that later). While some of these might be platform specific, packaging would be one likely candidate, the intention is to be agnostic distro/platform wherever possible. Take netcf for example, instead of re-inventing the wheel we wrote the windows port for netcf. So what's this about QMF you ask? Again, rather than invent our own message protocol we're leveraging an existing standard that supports windows and linux, is fast, reliable and secure. Its also pluggable and discoverable - so simply starting a new agent that connects to the matahari broker makes it's API available. Any QMF client/console can also interrogate the guest to see what agents and API calls are available. Even better there's going to be a virtio-serial transport. So we can access the same agents in the same way with or without host-to-guest networking. This was a key requirement for us because of EC2-like cloud scenarios where we don't have access to the physical host. I did get this much and I think I'm doing a poor job explaining myself. I think Matahari is tackling the same space that many other frameworks are. For instance, everything you say above is (supposed to be) true for something like OpenWBEM, Pegasus, etc. The advantage I see in Matahari is that 1) it can take advantage of virtio-serial 2) it's significantly lighter than CIM 3) it's community driven. So there's no doubt in my mind that if you need a way to inventory physical and virtual systems, something like Matahari becomes a very appealing option to do that. But that's not the problem space I'm trying to tackle. An example of the problem I'm trying to tackle is guest reboot. On x86, there is no ACPI interface for requesting a guest reboot. Other platforms do provide this and we usually try to emulate platform interfaces whenever possible. In order to implement host-initiated reboot (aka virDomainReboot) we need to have a mechanism to execute the reboot command in the guest that's initiated by the hypervisor. This is not the same thing as a remote system's management interface. This is something that should be dead-simple and trivially portable. I think there's symbiotic relationship that a QEMU-based guest agent could play with a Matahari based agent. But my initial impression is that the types of problems we're trying to tackle are fundamentally different than the problem that Matahari is even if it superficially appears like there's an overlap. For instance, communication window locations to enable "coherence" mode in the QEMU GUI is something that makes no sense as a network management interface. This is something that only makes sense between QEMU and the guest agent. Regards, Anthony Liguori Thats probably enough for the moment, I'd better go make dinner :-) It exposes interfaces for manipulation of RPM packages, relies on netcf, etc. FYI netcf is not Fedora specific. There is a Win32 backend for it too. It does need porting to other Linux distros, but that's simply an internal implementation issue. The goal of netcf is to be the libvirt of network config mgmt - a portable API for all OS network config tasks. Further, Matahari itself is also being ported to Win32 and can be ported to other Linux distros too. Yeah, I'm aware of the goals of netcf but that hasn't materialized a port to other distros. Let me be clear, I don't think this is a problem for libvirt, NetworkManager, or even Matahari. But for a QEMU guest agent where we terminate the APIs within QEMU itself, I do think it creates a pretty nasty portability barrier. There's nothing wrong with this if the goal of Matahari is to provide a robust agent for Fedora-based Linux distributions but I don't think it meets the requirements of a QEMU guest agent. I don't think we can overly optimize for one Linux distribution either so a mentality of letting other platforms contribute their own support probably won't work. That is not the goal of Matahari. It is intended to be generically applicable to *all* guest OS. Obviously in areas where every distro does different things, then it will need porting for each different impl. You have to start somewhere and it started with Fedora. This is all is true of any guest agent solution. There's two approaches that could be taken for a guest agent. You could provide very low level interfaces (read a file, execute a command, read a registry key). This makes for a very portable guest agent at the cost of complexity in interacting with the agent. The agent doesn't ever really need to change much the cl
VM with two interfaces
Hi, Am trying to create a VM using qemu-kvm with two interfaces(fedora12 is the host and vm) and running into an issue. Given below is the command : qemu-kvm -net nic,macaddr=$macaddress,model=pcnet -net tap,script=/etc/qemu-ifup -net nic,model=pcnet -net tap,script=/etc/qemu-ifup -m 1024 -hda ./vdisk.img -kernel ./bzImage-1019 -append "ip=x.y.z.w:a.b.c.d:p.q.r.s:a.b.c.d ip=x.y.z.u:a.b.c.d:p.q.r.s:a.b.c.d root=/dev/nfs rw nfsroot=x.y.z.v:/blahblahblah" On boot, both eth0 and eth1 come up but the vm tries to send dhcp and rarp requests instead of using the command line IP addresses. DHCP would fail in my case. With just one interface, dhcp is not attempted and nfs mount of root works fine. Any clue on what could be wrong here? Thanks, Nirmal -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, mce: broadcast mce depending on the cpu version
On Thu, Oct 21, 2010 at 05:47:06PM +0900, Hidetoshi Seto wrote: > There is no reason why SRAO event received by the main thread > is the only one that being broadcasted. > > According to the x86 ASDM vol.3A 15.10.4.1, > MCE signal is broadcast on processor version 06H_EH or later. > > This change is required to handle SRAR in smp guests. > > Signed-off-by: Hidetoshi Seto > --- > target-i386/kvm.c | 29 - > 1 files changed, 24 insertions(+), 5 deletions(-) Applied all to uq/master, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] Add support for async page fault to qemu
On Thu, Oct 21, 2010 at 05:08:45PM +0200, Gleb Natapov wrote: > Add save/restore of MSR for migration and cpuid bit. > > Signed-off-by: Gleb Natapov > --- > v1->v2 > - use vmstate subsection to migrate new msr. > > diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > index 59aacd0..145c863 100644 > --- a/qemu-kvm-x86.c > +++ b/qemu-kvm-x86.c > @@ -678,6 +678,9 @@ static int get_msr_entry(struct kvm_msr_entry *entry, > CPUState *env) > env->mcg_ctl = entry->data; > break; > #endif > +case MSR_KVM_ASYNC_PF_EN: > +env->async_pf_en_msr = entry->data; > +break; > default: > #ifdef KVM_CAP_MCE > if (entry->index >= MSR_MC0_CTL && > @@ -967,6 +970,7 @@ void kvm_arch_load_regs(CPUState *env, int level) > } > kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, > env->system_time_msr); > kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, > env->wall_clock_msr); > +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, > env->async_pf_en_msr); > } > #ifdef KVM_CAP_MCE > if (env->mcg_cap) { > @@ -1186,6 +1190,7 @@ void kvm_arch_save_regs(CPUState *env) > #endif > msrs[n++].index = MSR_KVM_SYSTEM_TIME; > msrs[n++].index = MSR_KVM_WALL_CLOCK; > +msrs[n++].index = MSR_KVM_ASYNC_PF_EN; > > #ifdef KVM_CAP_MCE > if (env->mcg_cap) { > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 8b6efed..6d1d6a0 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -669,6 +669,7 @@ typedef struct CPUX86State { > #endif > uint64_t system_time_msr; > uint64_t wall_clock_msr; > +uint64_t async_pf_en_msr; > > uint64_t tsc; > > diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c > index d63fdcb..0ee1f88 100644 > --- a/target-i386/cpuid.c > +++ b/target-i386/cpuid.c > @@ -73,7 +73,7 @@ static const char *ext3_feature_name[] = { > }; > > static const char *kvm_feature_name[] = { > -"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL, > +"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, "kvm_asyncpf", NULL, > NULL, NULL, > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index f4fc063..0eb1e90 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -151,6 +151,9 @@ struct kvm_para_features { > #ifdef KVM_CAP_PV_MMU > { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP }, > #endif > +#ifdef KVM_CAP_ASYNC_PF > +{ KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF }, > +#endif > { -1, -1 } > }; > > @@ -672,6 +675,7 @@ static int kvm_put_msrs(CPUState *env, int level) > kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, >env->system_time_msr); > kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, > env->wall_clock_msr); > +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, > env->async_pf_en_msr); > } > > msr_data.info.nmsrs = n; > @@ -880,6 +884,7 @@ static int kvm_get_msrs(CPUState *env) > #endif > msrs[n++].index = MSR_KVM_SYSTEM_TIME; > msrs[n++].index = MSR_KVM_WALL_CLOCK; > +msrs[n++].index = MSR_KVM_ASYNC_PF_EN; > > msr_data.info.nmsrs = n; > ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data); > @@ -926,6 +931,9 @@ static int kvm_get_msrs(CPUState *env) > case MSR_VM_HSAVE_PA: > env->vm_hsave = msrs[i].data; > break; > + case MSR_KVM_ASYNC_PF_EN: > +env->async_pf_en_msr = msrs[i].data; > +break; > } I think this is going to break the build if MSR_KVM_ASYNC_PF_EN is not defined. Please regenerate against uq/master. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 03:32 PM, Anthony Liguori wrote: On 10/21/2010 08:18 AM, Daniel P. Berrange wrote: On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote: Hi Andrew, On 10/21/2010 05:22 AM, Andrew Beekhof wrote: Hello from the Matahari tech-lead... Is there any documentation on the capabilities provided guest agent Anthony is creating? Perhaps we can combine efforts. Mike should be posting today or tomorrow. Also happy to provide more information on Matahari if anyone is interested. I'd really like to hear more about Matahari's long term vision. For a QEMU guest agent, we need something that is very portable. The interfaces it provides need to be reasonably guest agnostic and we need to support a wide range of guests including Windows, Linux, *BSD, etc. From the little bit I've read about Matahari, it seems to be pretty specific and pretty oriented towards Fedora-like distributions. In that case we've done a bad job of the wiki. Windows and other distributions are a key part of the Matahari vision. Matahari is two things - an architecture, and - an implementation of the most common API sets Each set of APIs (ie. host, network, services) is an independent daemon/agent which attaches to a common QMF broker (more on that later). While some of these might be platform specific, packaging would be one likely candidate, the intention is to be agnostic distro/platform wherever possible. Take netcf for example, instead of re-inventing the wheel we wrote the windows port for netcf. So what's this about QMF you ask? Again, rather than invent our own message protocol we're leveraging an existing standard that supports windows and linux, is fast, reliable and secure. Its also pluggable and discoverable - so simply starting a new agent that connects to the matahari broker makes it's API available. Any QMF client/console can also interrogate the guest to see what agents and API calls are available. Even better there's going to be a virtio-serial transport. So we can access the same agents in the same way with or without host-to-guest networking. This was a key requirement for us because of EC2-like cloud scenarios where we don't have access to the physical host. Thats probably enough for the moment, I'd better go make dinner :-) It exposes interfaces for manipulation of RPM packages, relies on netcf, etc. FYI netcf is not Fedora specific. There is a Win32 backend for it too. It does need porting to other Linux distros, but that's simply an internal implementation issue. The goal of netcf is to be the libvirt of network config mgmt - a portable API for all OS network config tasks. Further, Matahari itself is also being ported to Win32 and can be ported to other Linux distros too. Yeah, I'm aware of the goals of netcf but that hasn't materialized a port to other distros. Let me be clear, I don't think this is a problem for libvirt, NetworkManager, or even Matahari. But for a QEMU guest agent where we terminate the APIs within QEMU itself, I do think it creates a pretty nasty portability barrier. There's nothing wrong with this if the goal of Matahari is to provide a robust agent for Fedora-based Linux distributions but I don't think it meets the requirements of a QEMU guest agent. I don't think we can overly optimize for one Linux distribution either so a mentality of letting other platforms contribute their own support probably won't work. That is not the goal of Matahari. It is intended to be generically applicable to *all* guest OS. Obviously in areas where every distro does different things, then it will need porting for each different impl. You have to start somewhere and it started with Fedora. This is all is true of any guest agent solution. There's two approaches that could be taken for a guest agent. You could provide very low level interfaces (read a file, execute a command, read a registry key). This makes for a very portable guest agent at the cost of complexity in interacting with the agent. The agent doesn't ever really need to change much the client (QEMU) needs to handle many different types of guests, and add new functionality based on the supported primitives. Another approach is to put the complexity in the agent and simplify the management interface. For system's management applications, this is probably the right approach. For virtualization, I think this is a bad approach. Very specifically, netcf only really needs to read and write configuration files and potentially run a command. Instead of linking against netcf in the guest, we should link against netcf in QEMU so that we don't have to constantly change the guest agent. Regards, Anthony Liguori Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] Add support for async page fault to qemu
Add save/restore of MSR for migration and cpuid bit. Signed-off-by: Gleb Natapov --- v1->v2 - use vmstate subsection to migrate new msr. diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 59aacd0..145c863 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -678,6 +678,9 @@ static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env) env->mcg_ctl = entry->data; break; #endif +case MSR_KVM_ASYNC_PF_EN: +env->async_pf_en_msr = entry->data; +break; default: #ifdef KVM_CAP_MCE if (entry->index >= MSR_MC0_CTL && @@ -967,6 +970,7 @@ void kvm_arch_load_regs(CPUState *env, int level) } kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr); } #ifdef KVM_CAP_MCE if (env->mcg_cap) { @@ -1186,6 +1190,7 @@ void kvm_arch_save_regs(CPUState *env) #endif msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; +msrs[n++].index = MSR_KVM_ASYNC_PF_EN; #ifdef KVM_CAP_MCE if (env->mcg_cap) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 8b6efed..6d1d6a0 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -669,6 +669,7 @@ typedef struct CPUX86State { #endif uint64_t system_time_msr; uint64_t wall_clock_msr; +uint64_t async_pf_en_msr; uint64_t tsc; diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index d63fdcb..0ee1f88 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -73,7 +73,7 @@ static const char *ext3_feature_name[] = { }; static const char *kvm_feature_name[] = { -"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, NULL, NULL, NULL, NULL, +"kvmclock", "kvm_nopiodelay", "kvm_mmu", NULL, "kvm_asyncpf", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, diff --git a/target-i386/kvm.c b/target-i386/kvm.c index f4fc063..0eb1e90 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -151,6 +151,9 @@ struct kvm_para_features { #ifdef KVM_CAP_PV_MMU { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP }, #endif +#ifdef KVM_CAP_ASYNC_PF +{ KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF }, +#endif { -1, -1 } }; @@ -672,6 +675,7 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); +kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr); } msr_data.info.nmsrs = n; @@ -880,6 +884,7 @@ static int kvm_get_msrs(CPUState *env) #endif msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; +msrs[n++].index = MSR_KVM_ASYNC_PF_EN; msr_data.info.nmsrs = n; ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data); @@ -926,6 +931,9 @@ static int kvm_get_msrs(CPUState *env) case MSR_VM_HSAVE_PA: env->vm_hsave = msrs[i].data; break; + case MSR_KVM_ASYNC_PF_EN: +env->async_pf_en_msr = msrs[i].data; +break; } } diff --git a/target-i386/machine.c b/target-i386/machine.c index 4398801..bf14067 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -373,6 +373,24 @@ static int cpu_post_load(void *opaque, int version_id) return 0; } +static bool async_pf_msr_needed(void *opaque) +{ +CPUState *cpu = opaque; + +return cpu->async_pf_en_msr != 0; +} + +static const VMStateDescription vmstate_async_pf_msr = { +.name = "cpu/async_pf_msr", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(async_pf_en_msr, CPUState), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_cpu = { .name = "cpu", .version_id = CPU_SAVE_VERSION, @@ -476,6 +494,14 @@ static const VMStateDescription vmstate_cpu = { VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12), VMSTATE_END_OF_LIST() /* The above list is not sorted /wrt version numbers, watch out! */ +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = &vmstate_async_pf_msr, +.needed = async_pf_msr_needed, +} , { +/* empty */ +} } }; -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 02/22] bitops: rename generic little-endian bitops functions
On Thursday 21 October 2010, Akinobu Mita wrote: > As a preparation for providing little-endian bitops for all architectures, > This removes generic_ prefix from little-endian bitops function names > in asm-generic/bitops/le.h. > > s/generic_find_next_le_bit/find_next_le_bit/ > s/generic_find_next_zero_le_bit/find_next_zero_le_bit/ > s/generic_find_first_zero_le_bit/find_first_zero_le_bit/ > s/generic___test_and_set_le_bit/__test_and_set_le_bit/ > s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/ > s/generic_test_le_bit/test_le_bit/ > s/generic___set_le_bit/__set_le_bit/ > s/generic___clear_le_bit/__clear_le_bit/ > s/generic_test_and_set_le_bit/test_and_set_le_bit/ > s/generic_test_and_clear_le_bit/test_and_clear_le_bit/ > > Signed-off-by: Akinobu Mita > Cc: Hans-Christian Egtvedt > Cc: Geert Uytterhoeven > Cc: Roman Zippel > Cc: Andreas Schwab > Cc: linux-m...@lists.linux-m68k.org > Cc: Greg Ungerer > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: linuxppc-...@lists.ozlabs.org > Cc: Andy Grover > Cc: rds-de...@oss.oracle.com > Cc: "David S. Miller" > Cc: net...@vger.kernel.org > Cc: Avi Kivity > Cc: Marcelo Tosatti > Cc: kvm@vger.kernel.org Acked-by: Arnd Bergmann -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 02/22] bitops: rename generic little-endian bitops functions
As a preparation for providing little-endian bitops for all architectures, This removes generic_ prefix from little-endian bitops function names in asm-generic/bitops/le.h. s/generic_find_next_le_bit/find_next_le_bit/ s/generic_find_next_zero_le_bit/find_next_zero_le_bit/ s/generic_find_first_zero_le_bit/find_first_zero_le_bit/ s/generic___test_and_set_le_bit/__test_and_set_le_bit/ s/generic___test_and_clear_le_bit/__test_and_clear_le_bit/ s/generic_test_le_bit/test_le_bit/ s/generic___set_le_bit/__set_le_bit/ s/generic___clear_le_bit/__clear_le_bit/ s/generic_test_and_set_le_bit/test_and_set_le_bit/ s/generic_test_and_clear_le_bit/test_and_clear_le_bit/ Signed-off-by: Akinobu Mita Cc: Hans-Christian Egtvedt Cc: Geert Uytterhoeven Cc: Roman Zippel Cc: Andreas Schwab Cc: linux-m...@lists.linux-m68k.org Cc: Greg Ungerer Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-...@lists.ozlabs.org Cc: Andy Grover Cc: rds-de...@oss.oracle.com Cc: "David S. Miller" Cc: net...@vger.kernel.org Cc: Avi Kivity Cc: Marcelo Tosatti Cc: kvm@vger.kernel.org --- No change from previous submission arch/avr32/kernel/avr32_ksyms.c |4 ++-- arch/avr32/lib/findbit.S |4 ++-- arch/m68k/include/asm/bitops_mm.h|8 arch/m68k/include/asm/bitops_no.h|2 +- arch/powerpc/include/asm/bitops.h| 11 ++- include/asm-generic/bitops/ext2-non-atomic.h | 12 ++-- include/asm-generic/bitops/le.h | 26 +- include/asm-generic/bitops/minix-le.h| 10 +- lib/find_next_bit.c |9 - net/rds/cong.c |6 +++--- virt/kvm/kvm_main.c |2 +- 11 files changed, 47 insertions(+), 47 deletions(-) diff --git a/arch/avr32/kernel/avr32_ksyms.c b/arch/avr32/kernel/avr32_ksyms.c index 11e310c..c63b943 100644 --- a/arch/avr32/kernel/avr32_ksyms.c +++ b/arch/avr32/kernel/avr32_ksyms.c @@ -58,8 +58,8 @@ EXPORT_SYMBOL(find_first_zero_bit); EXPORT_SYMBOL(find_next_zero_bit); EXPORT_SYMBOL(find_first_bit); EXPORT_SYMBOL(find_next_bit); -EXPORT_SYMBOL(generic_find_next_le_bit); -EXPORT_SYMBOL(generic_find_next_zero_le_bit); +EXPORT_SYMBOL(find_next_le_bit); +EXPORT_SYMBOL(find_next_zero_le_bit); /* I/O primitives (lib/io-*.S) */ EXPORT_SYMBOL(__raw_readsb); diff --git a/arch/avr32/lib/findbit.S b/arch/avr32/lib/findbit.S index 997b33b..6880d85 100644 --- a/arch/avr32/lib/findbit.S +++ b/arch/avr32/lib/findbit.S @@ -123,7 +123,7 @@ ENTRY(find_next_bit) brgt1b retal r11 -ENTRY(generic_find_next_le_bit) +ENTRY(find_next_le_bit) lsr r8, r10, 5 sub r9, r11, r10 retle r11 @@ -153,7 +153,7 @@ ENTRY(generic_find_next_le_bit) brgt1b retal r11 -ENTRY(generic_find_next_zero_le_bit) +ENTRY(find_next_zero_le_bit) lsr r8, r10, 5 sub r9, r11, r10 retle r11 diff --git a/arch/m68k/include/asm/bitops_mm.h b/arch/m68k/include/asm/bitops_mm.h index b4ecdaa..f1010ab 100644 --- a/arch/m68k/include/asm/bitops_mm.h +++ b/arch/m68k/include/asm/bitops_mm.h @@ -366,9 +366,9 @@ static inline int minix_test_bit(int nr, const void *vaddr) #define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 24, (unsigned long *)(addr)) #define ext2_clear_bit_atomic(lock, nr, addr) test_and_clear_bit((nr) ^ 24, (unsigned long *)(addr)) #define ext2_find_next_zero_bit(addr, size, offset) \ - generic_find_next_zero_le_bit((unsigned long *)addr, size, offset) + find_next_zero_le_bit((unsigned long *)addr, size, offset) #define ext2_find_next_bit(addr, size, offset) \ - generic_find_next_le_bit((unsigned long *)addr, size, offset) + find_next_le_bit((unsigned long *)addr, size, offset) static inline int ext2_test_bit(int nr, const void *vaddr) { @@ -398,7 +398,7 @@ static inline int ext2_find_first_zero_bit(const void *vaddr, unsigned size) return (p - addr) * 32 + res; } -static inline unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, +static inline unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + (offset >> 5); @@ -440,7 +440,7 @@ static inline int ext2_find_first_bit(const void *vaddr, unsigned size) return (p - addr) * 32 + res; } -static inline unsigned long generic_find_next_le_bit(const unsigned long *addr, +static inline unsigned long find_next_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset) { const unsigned long *p = addr + (offset >> 5); diff --git a/arch/m68k/include/asm/bitops_no.h b/arch/m68k/include/asm/bitops_no.h index 9d3cbe5..292e1ce 100644 --- a/arch/m68k/include/asm/bitops_no.h +++ b/arch/m68k/include/asm/bitops_no.h @@ -325,7 +32
[PATCH v2 09/22] kvm: stop including asm-generic/bitops/le.h
No need to include asm-generic/bitops/le.h as all architectures provide little-endian bit operations now. Signed-off-by: Akinobu Mita Cc: Avi Kivity Cc: Marcelo Tosatti Cc: kvm@vger.kernel.org --- No change from previous submission virt/kvm/kvm_main.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2d9927c..e5d190f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -52,7 +52,6 @@ #include #include #include -#include #include "coalesced_mmio.h" -- 1.7.1.231.gd0b16 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 08:18 AM, Daniel P. Berrange wrote: On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote: Hi Andrew, On 10/21/2010 05:22 AM, Andrew Beekhof wrote: Hello from the Matahari tech-lead... Is there any documentation on the capabilities provided guest agent Anthony is creating? Perhaps we can combine efforts. Mike should be posting today or tomorrow. Also happy to provide more information on Matahari if anyone is interested. I'd really like to hear more about Matahari's long term vision. For a QEMU guest agent, we need something that is very portable. The interfaces it provides need to be reasonably guest agnostic and we need to support a wide range of guests including Windows, Linux, *BSD, etc. From the little bit I've read about Matahari, it seems to be pretty specific and pretty oriented towards Fedora-like distributions. It exposes interfaces for manipulation of RPM packages, relies on netcf, etc. FYI netcf is not Fedora specific. There is a Win32 backend for it too. It does need porting to other Linux distros, but that's simply an internal implementation issue. The goal of netcf is to be the libvirt of network config mgmt - a portable API for all OS network config tasks. Further, Matahari itself is also being ported to Win32 and can be ported to other Linux distros too. Yeah, I'm aware of the goals of netcf but that hasn't materialized a port to other distros. Let me be clear, I don't think this is a problem for libvirt, NetworkManager, or even Matahari. But for a QEMU guest agent where we terminate the APIs within QEMU itself, I do think it creates a pretty nasty portability barrier. There's nothing wrong with this if the goal of Matahari is to provide a robust agent for Fedora-based Linux distributions but I don't think it meets the requirements of a QEMU guest agent. I don't think we can overly optimize for one Linux distribution either so a mentality of letting other platforms contribute their own support probably won't work. That is not the goal of Matahari. It is intended to be generically applicable to *all* guest OS. Obviously in areas where every distro does different things, then it will need porting for each different impl. You have to start somewhere and it started with Fedora. This is all is true of any guest agent solution. There's two approaches that could be taken for a guest agent. You could provide very low level interfaces (read a file, execute a command, read a registry key). This makes for a very portable guest agent at the cost of complexity in interacting with the agent. The agent doesn't ever really need to change much the client (QEMU) needs to handle many different types of guests, and add new functionality based on the supported primitives. Another approach is to put the complexity in the agent and simplify the management interface. For system's management applications, this is probably the right approach. For virtualization, I think this is a bad approach. Very specifically, netcf only really needs to read and write configuration files and potentially run a command. Instead of linking against netcf in the guest, we should link against netcf in QEMU so that we don't have to constantly change the guest agent. Regards, Anthony Liguori Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On Thu, Oct 21, 2010 at 08:09:44AM -0500, Anthony Liguori wrote: > Hi Andrew, > > On 10/21/2010 05:22 AM, Andrew Beekhof wrote: > > > >Hello from the Matahari tech-lead... > >Is there any documentation on the capabilities provided guest agent > >Anthony is creating? Perhaps we can combine efforts. > > Mike should be posting today or tomorrow. > > >Also happy to provide more information on Matahari if anyone is > >interested. > > I'd really like to hear more about Matahari's long term vision. > > For a QEMU guest agent, we need something that is very portable. The > interfaces it provides need to be reasonably guest agnostic and we need > to support a wide range of guests including Windows, Linux, *BSD, etc. > > From the little bit I've read about Matahari, it seems to be pretty > specific and pretty oriented towards Fedora-like distributions. It > exposes interfaces for manipulation of RPM packages, relies on netcf, etc. FYI netcf is not Fedora specific. There is a Win32 backend for it too. It does need porting to other Linux distros, but that's simply an internal implementation issue. The goal of netcf is to be the libvirt of network config mgmt - a portable API for all OS network config tasks. Further, Matahari itself is also being ported to Win32 and can be ported to other Linux distros too. > There's nothing wrong with this if the goal of Matahari is to provide a > robust agent for Fedora-based Linux distributions but I don't think it > meets the requirements of a QEMU guest agent. > > I don't think we can overly optimize for one Linux distribution either > so a mentality of letting other platforms contribute their own support > probably won't work. That is not the goal of Matahari. It is intended to be generically applicable to *all* guest OS. Obviously in areas where every distro does different things, then it will need porting for each different impl. You have to start somewhere and it started with Fedora. This is all is true of any guest agent solution. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
Hi Andrew, On 10/21/2010 05:22 AM, Andrew Beekhof wrote: Hello from the Matahari tech-lead... Is there any documentation on the capabilities provided guest agent Anthony is creating? Perhaps we can combine efforts. Mike should be posting today or tomorrow. Also happy to provide more information on Matahari if anyone is interested. I'd really like to hear more about Matahari's long term vision. For a QEMU guest agent, we need something that is very portable. The interfaces it provides need to be reasonably guest agnostic and we need to support a wide range of guests including Windows, Linux, *BSD, etc. From the little bit I've read about Matahari, it seems to be pretty specific and pretty oriented towards Fedora-like distributions. It exposes interfaces for manipulation of RPM packages, relies on netcf, etc. There's nothing wrong with this if the goal of Matahari is to provide a robust agent for Fedora-based Linux distributions but I don't think it meets the requirements of a QEMU guest agent. I don't think we can overly optimize for one Linux distribution either so a mentality of letting other platforms contribute their own support probably won't work. Regards, Anthony Liguori -- Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 03:02 PM, Anthony Liguori wrote: On 10/21/2010 02:45 AM, Paolo Bonzini wrote: On 10/21/2010 03:14 AM, Alexander Graf wrote: I agree that some agent code for basic stuff like live snapshot sync with the filesystem is small enough and worth to host within qemu. Maybe we do need more than one project? No, please. That's exactly what I don't want to see. The libvirt/qemu/virt-man split is killing us already. How is this going to become with 20 driver packs for the guest? Agreed. Not relying on Mata Hari and reinventing a dbus/WMI interface would be yet another case of QEMU NIH. I think we're about 10 steps ahead of where we should be right now. The first step is just identifying what interfaces we need in a guest agent. So far, I think we can get away with a very small number of interfaces (mainly read/write files, execute command). The same argument also works on the backend BTW, it can be virtio serial but also a Xen pvconsole and that wheel should not be reinvented either. The guest agent should be a pluggable architecture, and QEMU can provide plugins for sync, spice, "info balloon" and everything else it needs. virtio-serial is essentially our plugin interface. The core QEMU agent would use org.qemu.guest-agent and a spice against could use org.spice-space.guest-agent. The QEMU agent should have an interface that terminates in QEMU itself. I agree that for some think with semi-transaction oriented actions like the fs-freeze on live snapshot, we need a small, self confined code base. Regards, Anthony Liguori Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 02:45 AM, Paolo Bonzini wrote: On 10/21/2010 03:14 AM, Alexander Graf wrote: I agree that some agent code for basic stuff like live snapshot sync with the filesystem is small enough and worth to host within qemu. Maybe we do need more than one project? No, please. That's exactly what I don't want to see. The libvirt/qemu/virt-man split is killing us already. How is this going to become with 20 driver packs for the guest? Agreed. Not relying on Mata Hari and reinventing a dbus/WMI interface would be yet another case of QEMU NIH. I think we're about 10 steps ahead of where we should be right now. The first step is just identifying what interfaces we need in a guest agent. So far, I think we can get away with a very small number of interfaces (mainly read/write files, execute command). The same argument also works on the backend BTW, it can be virtio serial but also a Xen pvconsole and that wheel should not be reinvented either. The guest agent should be a pluggable architecture, and QEMU can provide plugins for sync, spice, "info balloon" and everything else it needs. virtio-serial is essentially our plugin interface. The core QEMU agent would use org.qemu.guest-agent and a spice against could use org.spice-space.guest-agent. The QEMU agent should have an interface that terminates in QEMU itself. Regards, Anthony Liguori Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] kvm-kmod-2.6.36
Kernel 2.6.36 is baked, and so is the corresponding kvm-kmod now. A few fixes went in since -rc6, see below. This release also marks the end of life for the kvm-kmod-2.6.35 series - although there are a few unreleased kvm-kmod fixes in git. But the very same patches can be obtained via .36 now, and there is currently no 2.6.35.8 with KVM updates in sight. KVM changes since kvm-kmod-2.6.36-rc6: - x86: Fix fs/gs reload oops with invalid ldt - x86: Move TSC reset out of vmcb_init - x86: Fix SVM VMCB reset kvm-kmod changes: - x86: make sure kvm_get_desc_base() doesn't sign extend - Improve accuracy of tboot_enabled wrapping -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
On Thu, Oct 21, 2010 at 11:47:18AM +0200, Avi Kivity wrote: > On 10/21/2010 11:24 AM, Michael S. Tsirkin wrote: > >On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote: > >> On 10/21/2010 08:46 AM, Sheng Yang wrote: > >> >> > +r = -EOPNOTSUPP; > >> >> > >> >> If the guest assigned the device to another guest, it allows the > >> nested > >> >> guest to kill the non-nested guest. Need to exit in a graceful > >> fashion. > >> > > >> >Don't understand... It wouldn't result in kill but return to > >> QEmu/userspace. > >> > >> What would qemu do on EOPNOTSUPP? It has no way of knowing that > >> this was triggered by an unsupported msix access. What can it do? > >> > >> Best to just ignore the write. > >> > >> If you're worried about debugging, we can have a trace_kvm_discard() > >> tracepoint that logs the address and a type enum field that explains > >> why an access was discarded. > > > >The issue is that the same page is used for mask and entry programming. > > Yeah. For that use the normal mmio exit_reason. I was referring to > misaligned writes. Yes, I think we can just drop them if we like. Might be a good idea to stick a trace point there for debugging. > I'm not happy with partial emulation, but I'm not happy either with > yet another interface to communicate the decoded MSI BAR writes to > userspace. Shall we just reprogram the irq routing table? Well we don't need to touch the routing table at all. We have the MSI mask, address and data in kernel. That is enough to interrupt the guest. For vhost-net, we would need an interface that maps an irqfd to a vector # (not gsi), or alternatively map gsi to a pair device/vector number. This is easy to implement. For VFIO, we have a problem if we try to work especially without interrupt remapping as we can't just program all entries for all devices. So solution could be some combination of requiring interrupt remapping, looking at addresses and looking at the pending bit. New vectors are added/removed rarely. So maybe we can get away with 1. interface to read MSIX table from userspace (good for debugging anyway) 2. an eventfd to signal on MSIX table writes > > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 12:22 PM, Andrew Beekhof wrote: On 10/20/2010 02:16 PM, Dor Laor wrote: On 10/20/2010 10:21 AM, Alexander Graf wrote: On 19.10.2010, at 17:14, Chris Wright wrote: Guest Agent - have one coming RSN (poke Anthony for details) Would there be a chance to have a single agent for everyone, so that we actually form a Qemu agent instead of a dozen individual ones? I'm mainly thinking Spice here. More important than the number of instances is the usage of common framework. Here is the link to the Matahari project: https://fedorahosted.org/matahari/wiki/API Hello from the Matahari tech-lead... Is there any documentation on the capabilities provided guest agent Anthony is creating? Perhaps we can combine efforts. http://repo.or.cz/w/qemu/mdroth.git/tree He said that they'll publish the info in a week. Also happy to provide more information on Matahari if anyone is interested. Go ahead and supply it on upstream kvm-devel / qemu-devel list -- Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/20/2010 02:16 PM, Dor Laor wrote: On 10/20/2010 10:21 AM, Alexander Graf wrote: On 19.10.2010, at 17:14, Chris Wright wrote: Guest Agent - have one coming RSN (poke Anthony for details) Would there be a chance to have a single agent for everyone, so that we actually form a Qemu agent instead of a dozen individual ones? I'm mainly thinking Spice here. More important than the number of instances is the usage of common framework. Here is the link to the Matahari project: https://fedorahosted.org/matahari/wiki/API Hello from the Matahari tech-lead... Is there any documentation on the capabilities provided guest agent Anthony is creating? Perhaps we can combine efforts. Also happy to provide more information on Matahari if anyone is interested. -- Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: SVM: Move svm->host_gs_base into a separate structure
More members will join it soon. Signed-off-by: Avi Kivity --- arch/x86/kvm/svm.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9d703e2..451590e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -124,7 +124,9 @@ struct vcpu_svm { u64 next_rip; u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS]; - u64 host_gs_base; + struct { + u64 gs_base; + } host; u32 *msrpm; @@ -1353,14 +1355,14 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) static void load_host_msrs(struct kvm_vcpu *vcpu) { #ifdef CONFIG_X86_64 - wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host_gs_base); + wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); #endif } static void save_host_msrs(struct kvm_vcpu *vcpu) { #ifdef CONFIG_X86_64 - rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host_gs_base); + rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); #endif } -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: SVM: Move guest register save out of interrupts disabled section
Saving guest registers is just a memory copy, and does not need to be in the critical section. Move outside the critical section to improve latency a bit. Signed-off-by: Avi Kivity --- arch/x86/kvm/svm.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2e57450..9d703e2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3412,11 +3412,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); - vcpu->arch.cr2 = svm->vmcb->save.cr2; - vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; - vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; - vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip; - load_host_msrs(vcpu); kvm_load_ldt(ldt_selector); loadsegment(fs, fs_selector); @@ -3433,6 +3428,11 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) stgi(); + vcpu->arch.cr2 = svm->vmcb->save.cr2; + vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; + vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; + vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip; + sync_cr8_to_lapic(vcpu); svm->next_rip = 0; -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Lightweight svm vmload/vmsave (almost)
This patchset moves svm towards a lightweight vmload/vmsave path. It was hindered by CVE-2010-3698 which was discovered during its development, and by the lack of per-cpu IDT in Linux, which makes it more or less useless. However, even so it's a slight improvement, and merging it will reduce the work needed when we do have per-cpu IDT. Avi Kivity (4): KVM: SVM: Move guest register save out of interrupts disabled section KVM: SVM: Move svm->host_gs_base into a separate structure KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers arch/x86/kvm/svm.c | 61 +++- 1 files changed, 27 insertions(+), 34 deletions(-) -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path
ldt is never used in the kernel context; same goes for fs (x86_64) and gs (i386). So save/restore them in the heavyweight exit path instead of the lightweight path. By itself, this doesn't buy us much, but it paves the way for moving vmload and vmsave to the heavyweight exit path, since they modify the same registers. Signed-off-by: Avi Kivity --- arch/x86/kvm/svm.c | 34 +++--- 1 files changed, 19 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 451590e..ec392d6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -125,6 +125,9 @@ struct vcpu_svm { u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS]; struct { + u16 fs; + u16 gs; + u16 ldt; u64 gs_base; } host; @@ -184,6 +187,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm); static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, bool has_error_code, u32 error_code); +static void save_host_msrs(struct kvm_vcpu *vcpu); +static void load_host_msrs(struct kvm_vcpu *vcpu); + static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_svm, vcpu); @@ -996,6 +1002,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) svm->asid_generation = 0; } + save_host_msrs(vcpu); + savesegment(fs, svm->host.fs); + savesegment(gs, svm->host.gs); + svm->host.ldt = kvm_read_ldt(); + for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]); } @@ -1006,6 +1017,14 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) int i; ++vcpu->stat.host_state_reload; + kvm_load_ldt(svm->host.ldt); + loadsegment(fs, svm->host.fs); +#ifdef CONFIG_X86_64 + load_gs_index(svm->host.gs); + wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs); +#else + loadsegment(gs, gs_selector); +#endif for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]); } @@ -3314,9 +,6 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu) static void svm_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - u16 fs_selector; - u16 gs_selector; - u16 ldt_selector; svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX]; svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP]; @@ -,10 +3349,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) sync_lapic_to_cr8(vcpu); - save_host_msrs(vcpu); - savesegment(fs, fs_selector); - savesegment(gs, gs_selector); - ldt_selector = kvm_read_ldt(); svm->vmcb->save.cr2 = vcpu->arch.cr2; clgi(); @@ -3415,14 +3427,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) ); load_host_msrs(vcpu); - kvm_load_ldt(ldt_selector); - loadsegment(fs, fs_selector); -#ifdef CONFIG_X86_64 - load_gs_index(gs_selector); - wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs); -#else - loadsegment(gs, gs_selector); -#endif reload_tss(vcpu); -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers
This abstraction only serves to obfuscate. Remove. Signed-off-by: Avi Kivity --- arch/x86/kvm/svm.c | 25 ++--- 1 files changed, 6 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ec392d6..fad4038 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -187,9 +187,6 @@ static int nested_svm_vmexit(struct vcpu_svm *svm); static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, bool has_error_code, u32 error_code); -static void save_host_msrs(struct kvm_vcpu *vcpu); -static void load_host_msrs(struct kvm_vcpu *vcpu); - static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_svm, vcpu); @@ -1002,7 +999,9 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) svm->asid_generation = 0; } - save_host_msrs(vcpu); +#ifdef CONFIG_X86_64 + rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); +#endif savesegment(fs, svm->host.fs); savesegment(gs, svm->host.gs); svm->host.ldt = kvm_read_ldt(); @@ -1371,20 +1370,6 @@ static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) update_db_intercept(vcpu); } -static void load_host_msrs(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_X86_64 - wrmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); -#endif -} - -static void save_host_msrs(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_X86_64 - rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base); -#endif -} - static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) { if (sd->next_asid > sd->max_asid) { @@ -3426,7 +3411,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); - load_host_msrs(vcpu); +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, svm->host.gs_base); +#endif reload_tss(vcpu); -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] [PATCH] mark irq9 active high in DSDT
On 10/21/2010 04:00 AM, Kevin O'Connor wrote: On Wed, Oct 20, 2010 at 11:34:41AM +0200, Gleb Natapov wrote: > In PIIX4 SCI (irq9) is active high. Seabios marks it so in interrupt > override table, but some OSes (FreeBSD) require the same information to > be present in DSDT too. Make it so. > > Signed-off-by: Gleb Natapov Thanks. How do we manage the stable series wrt this issue? qemu-kvm-0.12.5 has a regression within the stable series that this patch fixes. qemu 0.12.5 does not, but only because it does not emulate polarity in the I/O APIC correctly. There are several paths we could take: - do nothing, bug is fixed in mainline - release a seabios 0.x.1 for qemu 0.13.1 with this patch - same, plus seabios 0.y.1 for qemu 0.12.6 with this patch - skip qemu (which is not truly affected), patch qemu-kvm's copy of seabios for both 0.12.z and 0.13.z The third option is the most "correct" from a release engineering point of view, but involves more work for everyone. The fourth is quick pain relief but is a little forky. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
On 10/21/2010 11:24 AM, Michael S. Tsirkin wrote: On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote: > On 10/21/2010 08:46 AM, Sheng Yang wrote: > >> > + r = -EOPNOTSUPP; > >> > >> If the guest assigned the device to another guest, it allows the nested > >> guest to kill the non-nested guest. Need to exit in a graceful fashion. > > > >Don't understand... It wouldn't result in kill but return to QEmu/userspace. > > What would qemu do on EOPNOTSUPP? It has no way of knowing that > this was triggered by an unsupported msix access. What can it do? > > Best to just ignore the write. > > If you're worried about debugging, we can have a trace_kvm_discard() > tracepoint that logs the address and a type enum field that explains > why an access was discarded. The issue is that the same page is used for mask and entry programming. Yeah. For that use the normal mmio exit_reason. I was referring to misaligned writes. I'm not happy with partial emulation, but I'm not happy either with yet another interface to communicate the decoded MSI BAR writes to userspace. Shall we just reprogram the irq routing table? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
On Thu, Oct 21, 2010 at 11:27:30AM +0200, Avi Kivity wrote: > On 10/21/2010 08:46 AM, Sheng Yang wrote: > >> > + r = -EOPNOTSUPP; > >> > >> If the guest assigned the device to another guest, it allows the nested > >> guest to kill the non-nested guest. Need to exit in a graceful fashion. > > > >Don't understand... It wouldn't result in kill but return to QEmu/userspace. > > What would qemu do on EOPNOTSUPP? It has no way of knowing that > this was triggered by an unsupported msix access. What can it do? > > Best to just ignore the write. > > If you're worried about debugging, we can have a trace_kvm_discard() > tracepoint that logs the address and a type enum field that explains > why an access was discarded. The issue is that the same page is used for mask and entry programming. > >> > + if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) { > >> > + /* Only allow entry modification when entry was masked > >> */ > >> > + if (!entry_masked) { > >> > + printk(KERN_WARNING > >> > + "KVM: guest try to write unmasked MSI-X > >> entry. " > >> > + "addr 0x%llx, len %d, val 0x%lx\n", > >> > + addr, len, new_val); > >> > + r = 0; > >> > >> What does the spec says about this situation? > > > >As Michael pointed out. The spec said the result is "undefined" indeed. > > Ok. Then we should silently discard the write instead of allowing > the guest to flood host dmesg. > > >> > >> > + goto out; > >> > + } > >> > + if (new_val& ~1ul) { > >> > >> Is there a #define for this bit? > > > >Sorry I didn't find it. mask_msi_irq() also use the number 1... Maybe we can > >add > >one. > > Yes please. > > > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
On 10/21/2010 08:46 AM, Sheng Yang wrote: > > + r = -EOPNOTSUPP; > > If the guest assigned the device to another guest, it allows the nested > guest to kill the non-nested guest. Need to exit in a graceful fashion. Don't understand... It wouldn't result in kill but return to QEmu/userspace. What would qemu do on EOPNOTSUPP? It has no way of knowing that this was triggered by an unsupported msix access. What can it do? Best to just ignore the write. If you're worried about debugging, we can have a trace_kvm_discard() tracepoint that logs the address and a type enum field that explains why an access was discarded. > > + if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) { > > + /* Only allow entry modification when entry was masked */ > > + if (!entry_masked) { > > + printk(KERN_WARNING > > + "KVM: guest try to write unmasked MSI-X entry. " > > + "addr 0x%llx, len %d, val 0x%lx\n", > > + addr, len, new_val); > > + r = 0; > > What does the spec says about this situation? As Michael pointed out. The spec said the result is "undefined" indeed. Ok. Then we should silently discard the write instead of allowing the guest to flood host dmesg. > > > + goto out; > > + } > > + if (new_val& ~1ul) { > > Is there a #define for this bit? Sorry I didn't find it. mask_msi_irq() also use the number 1... Maybe we can add one. Yes please. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps
(2010/10/21 18:04), Avi Kivity wrote: On 10/21/2010 10:45 AM, Takuya Yoshikawa wrote: Well, 4K single buffer is 128MB worth of RAM. This won't help live migration much, but will help vga dirty logging, which is active at all times and uses much smaller memory slots. So I think it's worthwhile. Thanks, the patch I sent now should might be better. Which one do you prefer? Well, the two combined :) Let's apply the double-buffer first and follow with kmalloc() conversion. OK, thanks :) I'll send the kmalloc() patch later! Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps
On 10/21/2010 10:45 AM, Takuya Yoshikawa wrote: Well, 4K single buffer is 128MB worth of RAM. This won't help live migration much, but will help vga dirty logging, which is active at all times and uses much smaller memory slots. So I think it's worthwhile. Thanks, the patch I sent now should might be better. Which one do you prefer? Well, the two combined :) Let's apply the double-buffer first and follow with kmalloc() conversion. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86, mce: broadcast mce depending on the cpu version
There is no reason why SRAO event received by the main thread is the only one that being broadcasted. According to the x86 ASDM vol.3A 15.10.4.1, MCE signal is broadcast on processor version 06H_EH or later. This change is required to handle SRAR in smp guests. Signed-off-by: Hidetoshi Seto --- target-i386/kvm.c | 29 - 1 files changed, 24 insertions(+), 5 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index a0d0603..00bb083 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1637,6 +1637,28 @@ static void hardware_memory_error(void) exit(1); } +#ifdef KVM_CAP_MCE +static void kvm_mce_broadcast_rest(CPUState *env) +{ +CPUState *cenv; +int family, model, cpuver = env->cpuid_version; + +family = (cpuver >> 8) & 0xf; +model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf); + +/* Broadcast MCA signal for processor version 06H_EH and above */ +if ((family == 6 && model >= 14) || family > 6) { +if (cenv == env) { +continue; +} +for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) { +kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, + MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1); +} +} +} +#endif + int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr) { #if defined(KVM_CAP_MCE) @@ -1694,6 +1716,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr) fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno)); abort(); } +kvm_mce_broadcast_rest(env); } else #endif { @@ -1716,7 +1739,6 @@ int kvm_on_sigbus(int code, void *addr) void *vaddr; ram_addr_t ram_addr; target_phys_addr_t paddr; -CPUState *cenv; /* Hope we are lucky for AO MCE */ vaddr = addr; @@ -1732,10 +1754,7 @@ int kvm_on_sigbus(int code, void *addr) kvm_inject_x86_mce(first_cpu, 9, status, MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr, (MCM_ADDR_PHYS << 6) | 0xc, 1); -for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) { -kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, - MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1); -} +kvm_mce_broadcast_rest(first_cpu); } else #endif { -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86, mce: ignore SRAO only when MCG_SER_P is available
And restruct this block to call kvm_mce_in_exception() only when it is required. Signed-off-by: Hidetoshi Seto --- target-i386/kvm.c | 16 ++-- 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 786eeeb..a0d0603 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -239,12 +239,16 @@ static void kvm_do_inject_x86_mce(void *_data) struct kvm_x86_mce_data *data = _data; int r; -/* If there is an MCE excpetion being processed, ignore this SRAO MCE */ -r = kvm_mce_in_exception(data->env); -if (r == -1) -fprintf(stderr, "Failed to get MCE status\n"); -else if (r && !(data->mce->status & MCI_STATUS_AR)) -return; +/* If there is an MCE exception being processed, ignore this SRAO MCE */ +if ((data->env->mcg_cap & MCG_SER_P) && +!(data->mce->status & MCI_STATUS_AR)) { +r = kvm_mce_in_exception(data->env); +if (r == -1) { +fprintf(stderr, "Failed to get MCE status\n"); +} else if (r) { +return; +} +} r = kvm_set_mce(data->env, data->mce); if (r < 0) { -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices
On Thu, Oct 21, 2010 at 04:30:02PM +0800, Sheng Yang wrote: > On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote: > > This patch enable per-vector mask for assigned devices using MSI-X. > > The basic idea of kernel and QEmu's responsibilities are: > > 1. Because QEmu owned the irq routing table, so the change of table should > still > go to the QEmu, like we did in msix_mmio_write(). > > 2. And the others things can be done in kernel, for performance. Here we > covered > the reading(converted entry from routing table and mask bit state of enabled > MSI-X > entries), and writing the mask bit for enabled MSI-X entries. Originally we > only > has mask bit handled in kernel, but later we found that Linux kernel would > read > MSI-X mmio just after every writing to mask bit, in order to flush the > writing. So > we add reading MSI data/addr as well. > > 3. Disabled entries's mask bit accessing would go to QEmu, because it may > result > in disable/enable MSI-X. Explained later. > > 4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to > decide enable/disable MSI-X for device. > . Config space yes, but it's a simple global yes/no after all. > 5. There is an distinction between enabled entry and disabled entry of MSI-X > table. That's my point. There's no such thing as 'enabled entries' in the spec. There are only masked and unmasked entries. Current interface deals with gsi numbers so qemu had to work around this. The hack used there is removing gsi for masked vector which has 0 address and data. It works because this is what linux and windows guests happen to do, but it is out of spec: vector/data value for a masked entry have no meaning. Since you are building a new interface, can design it without constraints... > The entries we had used for pci_enable_msix()(not necessary in sequence > number) are already enabled, the others are disabled. When device's MSI-X is > enabled and guest want to enable an disabled entry, we would go back to QEmu > because this vector didn't exist in the routing table. Also due to > pci_enable_msix() in kernel didn't allow us to enable vectors one by one, but > all > at once. So we have to disable MSI-X first, then enable it with new entries, > which > contained the new vector guest want to use. This situation is only happen > when > device is being initialized. After that, kernel can know and handle the mask > bit > of the enabled entry. > > I've also considered handle all MMIO operation in kernel, and changing irq > routing > in kernel directly. But as long as irq routing is owned by QEmu, I think it's > better to leave to it... Yes, this is my suggestion, except we don't need no routing :) To inject MSI you just need address/data pairs. Look at kvm_set_msi: given address/data you can just inject the interrupt. No need for table lookups. > Notice the mask/unmask bits must be handled together, either in kernel or in > userspace. Because if kernel has handled enabled vector's mask bit directly, > it > would be unsync with QEmu's records. It doesn't matter when QEmu don't access > the > related record. And the only place QEmu want to consult it's enabled entries' > mask > bit state is writing to MSI addr/data. The writing should be discarded if the > entry is unmasked. This checking has already been done by kernel in this > patchset, > so we are fine here. > > If we want to access the enabled entries' mask bit in the future, we can > directly > access device's MMIO. We really must implement this for correctness, btw. If you do not pass reads to the device, messages intended for the masked entry might still be in flight. > That's the reason why I have followed Michael's advice to use > mask/unmask directly. > Hope this would make the patches more clear. I meant to add comments for this > changeset, but miss it later. > > -- > regards > Yang, Sheng > > > > > Signed-off-by: Sheng Yang > > --- > > Documentation/kvm/api.txt | 22 > > arch/x86/kvm/x86.c|6 > > include/linux/kvm.h |8 +- > > virt/kvm/assigned-dev.c | 60 > > + 4 files changed, 95 > > insertions(+), 1 deletions(-) > > > > diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt > > index d82d637..f324a50 100644 > > --- a/Documentation/kvm/api.txt > > +++ b/Documentation/kvm/api.txt > > @@ -1087,6 +1087,28 @@ of 4 instructions that make up a hypercall. > > If any additional field gets added to this structure later on, a bit for > > that additional piece of information will be set in the flags bitmap. > > > > +4.47 KVM_ASSIGN_REG_MSIX_MMIO > > + > > +Capability: KVM_CAP_DEVICE_MSIX_MASK > > +Architectures: x86 > > +Type: vm ioctl > > +Parameters: struct kvm_assigned_msix_mmio (in) > > +Returns: 0 on success, !0 on error > > + > > +struct kvm_assigned_msix_mmio { > > + /* Assigned device's ID */ > > + __u32 assigned_dev_id; > >
Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps
(2010/10/21 17:38), Avi Kivity wrote: On 10/21/2010 03:51 AM, Takuya Yoshikawa wrote: (2010/10/20 18:34), Takuya Yoshikawa wrote: ** NOT TESTED WELL YET ** Currently, we are using vmalloc() for all dirty bitmaps even though they are small enough, say 256 bytes or less. So we use kmalloc() if dirty bitmap size is less than or equal to PAGE_SIZE. Ah, I forgot about the plan to do double buffering. Making the size of bitmap twice may reduce the benefit of this patch. Well, 4K single buffer is 128MB worth of RAM. This won't help live migration much, but will help vga dirty logging, which is active at all times and uses much smaller memory slots. So I think it's worthwhile. Thanks, the patch I sent now should might be better. Which one do you prefer? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: pre-allocate one more dirty bitmap to avoid vmalloc() in x86's kvm_vm_ioctl_get_dirty_log()
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by vmalloc() which will be used in the next logging and this has been causing bad effect to VGA and live-migration: vmalloc() consumes extra systime, triggers tlb flush, etc. This patch resolves this issue by pre-allocating one more bitmap and switching between two bitmaps during dirty logging. Performance improvement: I measured performance for the case of VGA update by trace-cmd. - Without this patch | kvm_vm_ioctl_get_dirty_log() { |mutex_lock() { 0.195 us | _cond_resched(); 0.683 us |} 0.207 us |_raw_spin_lock(); |kvm_mmu_slot_remove_write_access() { ... ... 2.916 us |} |vmalloc() { ... ... + 43.731 us |} 0.222 us |memset(); |T.1632() { | __kmalloc() { ... ... 2.870 us | } 3.257 us |} |synchronize_srcu_expedited() { ... ... ! 143.147 us |} 0.480 us |kfree(); |copy_to_user() { 0.196 us | _cond_resched(); 0.635 us |} |vfree() { ... ... + 12.103 us | } + 12.508 us |} 0.218 us |mutex_unlock(); ! 211.323 us | } - With this patch | kvm_vm_ioctl_get_dirty_log() { |mutex_lock() { 0.199 us | _cond_resched(); 0.703 us |} 0.222 us |_raw_spin_lock(); |kvm_mmu_slot_remove_write_access() { ... ... 2.179 us |} 0.225 us |memset(); |T.1634() { | __kmalloc() { ... ... 2.367 us | } 2.791 us |} |synchronize_srcu_expedited() { ... ... ! 125.299 us |} 0.263 us |kfree(); |copy_to_user() { 0.196 us | _cond_resched(); 0.647 us |} 0.214 us |mutex_unlock(); ! 135.223 us | } So the result was 1.5 times faster than the original. In the case of live migration, the improvement ratio depends on the workload and the guest memory size. Note: This does not change other architectures's logic but the allocation size becomes twice. This will increase the actual memory consumption only when the new size changes the number of pages allocated by vmalloc(). Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao --- arch/x86/kvm/x86.c | 16 +--- include/linux/kvm_host.h |1 + virt/kvm/kvm_main.c | 11 +-- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f3f86b2..c4d2e0b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3171,18 +3171,15 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kvm_mmu_slot_remove_write_access(kvm, log->slot); spin_unlock(&kvm->mmu_lock); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; + dirty_bitmap = memslot->dirty_bitmap_head; + if (memslot->dirty_bitmap == dirty_bitmap) + dirty_bitmap += n / sizeof(long); memset(dirty_bitmap, 0, n); r = -ENOMEM; slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) { - vfree(dirty_bitmap); + if (!slots) goto out; - } memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; @@ -3193,11 +3190,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kfree(old_slots); r = -EFAULT; - if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { - vfree(dirty_bitmap); + if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) goto out; - } - vfree(dirty_bitmap); } else { r = -EFAULT; if (clear_user(log->dirty_bitmap, n)) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0b89d00..7c956d8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -119,6 +119,7 @@ struct kvm_memory_slot { unsigned long flags; unsigned long *rmap; unsigned long *dirty_bitmap; + unsigned long *dirty_bitmap_head; struct { unsigned long rmap_pde; int write_count; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 62ae13f..b15d1eb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -447,8 +447,9 @@ static void kvm_destroy_
Re: [PATCH RFC] KVM: use kmalloc() for small dirty bitmaps
On 10/21/2010 03:51 AM, Takuya Yoshikawa wrote: (2010/10/20 18:34), Takuya Yoshikawa wrote: ** NOT TESTED WELL YET ** Currently, we are using vmalloc() for all dirty bitmaps even though they are small enough, say 256 bytes or less. So we use kmalloc() if dirty bitmap size is less than or equal to PAGE_SIZE. Ah, I forgot about the plan to do double buffering. Making the size of bitmap twice may reduce the benefit of this patch. Well, 4K single buffer is 128MB worth of RAM. This won't help live migration much, but will help vga dirty logging, which is active at all times and uses much smaller memory slots. So I think it's worthwhile. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: introduce wrapper function for creating/destroying dirty bitmaps
This makes it easy to change the way of allocating/freeing dirty bitmaps. Signed-off-by: Takuya Yoshikawa Signed-off-by: Fernando Luis Vazquez Cao --- virt/kvm/kvm_main.c | 30 +++--- 1 files changed, 23 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1aeeb7f..62ae13f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -442,6 +442,15 @@ out_err_nodisable: return ERR_PTR(r); } +static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + if (!memslot->dirty_bitmap) + return; + + vfree(memslot->dirty_bitmap); + memslot->dirty_bitmap = NULL; +} + /* * Free any memory in @free but not in @dont. */ @@ -454,7 +463,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, vfree(free->rmap); if (!dont || free->dirty_bitmap != dont->dirty_bitmap) - vfree(free->dirty_bitmap); + kvm_destroy_dirty_bitmap(free); for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { @@ -465,7 +474,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, } free->npages = 0; - free->dirty_bitmap = NULL; free->rmap = NULL; } @@ -527,6 +535,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) return 0; } +static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot); + + memslot->dirty_bitmap = vmalloc(dirty_bytes); + if (!memslot->dirty_bitmap) + return -ENOMEM; + + memset(memslot->dirty_bitmap, 0, dirty_bytes); + return 0; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -661,12 +681,8 @@ skip_lpage: /* Allocate page dirty bitmap if needed */ if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) { - unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(&new); - - new.dirty_bitmap = vmalloc(dirty_bytes); - if (!new.dirty_bitmap) + if (kvm_create_dirty_bitmap(&new) < 0) goto out_free; - memset(new.dirty_bitmap, 0, dirty_bytes); /* destroy any largepage mappings for dirty tracking */ if (old.npages) flush_shadow = 1; -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices
On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote: > This patch enable per-vector mask for assigned devices using MSI-X. The basic idea of kernel and QEmu's responsibilities are: 1. Because QEmu owned the irq routing table, so the change of table should still go to the QEmu, like we did in msix_mmio_write(). 2. And the others things can be done in kernel, for performance. Here we covered the reading(converted entry from routing table and mask bit state of enabled MSI-X entries), and writing the mask bit for enabled MSI-X entries. Originally we only has mask bit handled in kernel, but later we found that Linux kernel would read MSI-X mmio just after every writing to mask bit, in order to flush the writing. So we add reading MSI data/addr as well. 3. Disabled entries's mask bit accessing would go to QEmu, because it may result in disable/enable MSI-X. Explained later. 4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to decide enable/disable MSI-X for device. . 5. There is an distinction between enabled entry and disabled entry of MSI-X table. The entries we had used for pci_enable_msix()(not necessary in sequence number) are already enabled, the others are disabled. When device's MSI-X is enabled and guest want to enable an disabled entry, we would go back to QEmu because this vector didn't exist in the routing table. Also due to pci_enable_msix() in kernel didn't allow us to enable vectors one by one, but all at once. So we have to disable MSI-X first, then enable it with new entries, which contained the new vector guest want to use. This situation is only happen when device is being initialized. After that, kernel can know and handle the mask bit of the enabled entry. I've also considered handle all MMIO operation in kernel, and changing irq routing in kernel directly. But as long as irq routing is owned by QEmu, I think it's better to leave to it... Notice the mask/unmask bits must be handled together, either in kernel or in userspace. Because if kernel has handled enabled vector's mask bit directly, it would be unsync with QEmu's records. It doesn't matter when QEmu don't access the related record. And the only place QEmu want to consult it's enabled entries' mask bit state is writing to MSI addr/data. The writing should be discarded if the entry is unmasked. This checking has already been done by kernel in this patchset, so we are fine here. If we want to access the enabled entries' mask bit in the future, we can directly access device's MMIO. That's the reason why I have followed Michael's advice to use mask/unmask directly. Hope this would make the patches more clear. I meant to add comments for this changeset, but miss it later. -- regards Yang, Sheng > > Signed-off-by: Sheng Yang > --- > Documentation/kvm/api.txt | 22 > arch/x86/kvm/x86.c|6 > include/linux/kvm.h |8 +- > virt/kvm/assigned-dev.c | 60 > + 4 files changed, 95 > insertions(+), 1 deletions(-) > > diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt > index d82d637..f324a50 100644 > --- a/Documentation/kvm/api.txt > +++ b/Documentation/kvm/api.txt > @@ -1087,6 +1087,28 @@ of 4 instructions that make up a hypercall. > If any additional field gets added to this structure later on, a bit for > that additional piece of information will be set in the flags bitmap. > > +4.47 KVM_ASSIGN_REG_MSIX_MMIO > + > +Capability: KVM_CAP_DEVICE_MSIX_MASK > +Architectures: x86 > +Type: vm ioctl > +Parameters: struct kvm_assigned_msix_mmio (in) > +Returns: 0 on success, !0 on error > + > +struct kvm_assigned_msix_mmio { > + /* Assigned device's ID */ > + __u32 assigned_dev_id; > + /* MSI-X table MMIO address */ > + __u64 base_addr; > + /* Must be 0 */ > + __u32 flags; > + /* Must be 0, reserved for future use */ > + __u64 reserved; > +}; > + > +This ioctl would enable in-kernel MSI-X emulation, which would handle > MSI-X +mask bit in the kernel. > + > 5. The kvm_run structure > > Application code obtains a pointer to the kvm_run structure by > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fc62546..ba07a2f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1927,6 +1927,8 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_X86_ROBUST_SINGLESTEP: > case KVM_CAP_XSAVE: > case KVM_CAP_ENABLE_CAP: > + case KVM_CAP_DEVICE_MSIX_EXT: > + case KVM_CAP_DEVICE_MSIX_MASK: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > @@ -2717,6 +2719,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu > *vcpu, return -EINVAL; > > switch (cap->cap) { > + case KVM_CAP_DEVICE_MSIX_EXT: > + vcpu->kvm->arch.msix_flags_enabled = true; > + r = 0; > + break; > default: > r = -EINVAL; >
Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device
On Thu, Oct 21, 2010 at 03:10:19PM +0800, Sheng Yang wrote: > On Thursday 21 October 2010 03:02:24 Marcelo Tosatti wrote: > > On Wed, Oct 20, 2010 at 04:26:24PM +0800, Sheng Yang wrote: > > > Here is v2. > > > > > > Changelog: > > > > > > v1->v2 > > > > > > The major change from v1 is I've added the in-kernel MSI-X mask emulation > > > support, as well as adding shortcuts for reading MSI-X table. > > > > > > I've taken Michael's advice to use mask/unmask directly, but unsure about > > > exporting irq_to_desc() for module... > > > > > > Also add flush_work() according to Marcelo's comments. > > > > > > Sheng Yang (8): > > > PCI: MSI: Move MSI-X entry definition to pci_regs.h > > > irq: Export irq_to_desc() to modules > > > KVM: x86: Enable ENABLE_CAP capability for x86 > > > KVM: Move struct kvm_io_device to kvm_host.h > > > KVM: Add kvm_get_irq_routing_entry() func > > > KVM: assigned dev: Preparation for mask support in userspace > > > KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing > > > KVM: Emulation MSI-X mask bits for assigned devices > > > > Why does the current scheme, without msix per-vector mask support, is > > functional at all? Luck? > > Well, I believe we are lucky... We just ignored the operation in the past. > > I had raised this issue when Michael begin to work on MSI-X support in QEmu > long > ago, but then I was busy on some other things. Until now when Eddie want to > add > MSI-X in-kernel acceleration, we back to it... > > And about the "flush_work()" you commented, I still think even for the native > device, it's possible that the short time after OS write to the mask bit, the > interrupt may be delivered, if the device already send the message out on the > bus(I just guess, haven't observed)... The spec didn't say that the finish of > writing mask bit behavior would also get all message on the bus delivered. So > I > think leave the work a little late would also be fine. Yes ... but e.g. bus read afterwards would have to flush the write up the bus... > -- > regards > Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix build on !KVM_CAP_MCE
This patch removes following warnings: target-i386/kvm.c: In function 'kvm_put_msrs': target-i386/kvm.c:782: error: unused variable 'i' target-i386/kvm.c: In function 'kvm_get_msrs': target-i386/kvm.c:1083: error: label at end of compound statement Signed-off-by: Hidetoshi Seto --- target-i386/kvm.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 512d533..786eeeb 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -779,7 +779,7 @@ static int kvm_put_msrs(CPUState *env, int level) struct kvm_msr_entry entries[100]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; -int i, n = 0; +int n = 0; kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs); kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp); @@ -801,6 +801,7 @@ static int kvm_put_msrs(CPUState *env, int level) } #ifdef KVM_CAP_MCE if (env->mcg_cap) { +int i; if (level == KVM_PUT_RESET_STATE) kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status); else if (level == KVM_PUT_FULL_STATE) { @@ -1085,9 +1086,9 @@ static int kvm_get_msrs(CPUState *env) if (msrs[i].index >= MSR_MC0_CTL && msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) { env->mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data; -break; } #endif +break; } } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL net-2.6] vhost-net: access_ok fix
From: "Michael S. Tsirkin" Date: Tue, 19 Oct 2010 16:59:01 +0200 > David, > Not sure if it's too late for 2.6.36 - in case it's not, the following tree > includes a last minute bugfix for vhost-net, found by code inspection. > It is on top of net-2.6. > Thanks! > > The following changes since commit b0057c51db66c5f0f38059f242c57d61c4741d89: > > tg3: restore rx_dropped accounting (2010-10-11 16:06:24 -0700) > > are available in the git repository at: > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Even though it's too late, I've pulled this. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for Oct 19
On 10/21/2010 03:14 AM, Alexander Graf wrote: I agree that some agent code for basic stuff like live snapshot sync with the filesystem is small enough and worth to host within qemu. Maybe we do need more than one project? No, please. That's exactly what I don't want to see. The libvirt/qemu/virt-man split is killing us already. How is this going to become with 20 driver packs for the guest? Agreed. Not relying on Mata Hari and reinventing a dbus/WMI interface would be yet another case of QEMU NIH. The same argument also works on the backend BTW, it can be virtio serial but also a Xen pvconsole and that wheel should not be reinvented either. The guest agent should be a pluggable architecture, and QEMU can provide plugins for sync, spice, "info balloon" and everything else it needs. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing
On Thursday 21 October 2010 06:35:11 Michael S. Tsirkin wrote: > On Wed, Oct 20, 2010 at 04:26:31PM +0800, Sheng Yang wrote: > > It would be work with KVM_CAP_DEVICE_MSIX_MASK, which we would enable in > > the last patch. > > > > Signed-off-by: Sheng Yang > > Merge this with patch 8 - it does not make sense to add a bunch > of users of the field msix_mmio_base but init it in the next patch. I just meant to make the reviewer easier, seems I am fail. :) > > > --- > > > > include/linux/kvm.h |7 +++ > > include/linux/kvm_host.h |2 + > > virt/kvm/assigned-dev.c | 131 > > ++ 3 files changed, 140 > > insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > index a699ec9..0a7bd34 100644 > > --- a/include/linux/kvm.h > > +++ b/include/linux/kvm.h > > @@ -798,4 +798,11 @@ struct kvm_assigned_msix_entry { > > > > __u16 padding[2]; > > > > }; > > > > +struct kvm_assigned_msix_mmio { > > + __u32 assigned_dev_id; > > I think avi commented - there's padding here. > > > + __u64 base_addr; > > + __u32 flags; > > + __u32 reserved[2]; > > +}; > > + > > > > #endif /* __LINUX_KVM_H */ > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 81a6284..b67082f 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -465,6 +465,8 @@ struct kvm_assigned_dev_kernel { > > > > struct pci_dev *dev; > > struct kvm *kvm; > > spinlock_t assigned_dev_lock; > > > > + u64 msix_mmio_base; > > + struct kvm_io_device msix_mmio_dev; > > > > }; > > > > struct kvm_irq_mask_notifier { > > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c > > index bf96ea7..5d2adc4 100644 > > --- a/virt/kvm/assigned-dev.c > > +++ b/virt/kvm/assigned-dev.c > > > > @@ -739,6 +739,137 @@ msix_entry_out: > > return r; > > > > } > > > > + > > +static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev, > > + gpa_t addr, int len, int *idx) > > +{ > > + int i; > > + > > + if (!(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX)) > > + return false; > > + BUG_ON(adev->msix_mmio_base == 0); > > + for (i = 0; i < adev->entries_nr; i++) { > > + u64 start, end; > > + start = adev->msix_mmio_base + > > + adev->guest_msix_entries[i].entry * PCI_MSIX_ENTRY_SIZE; > > + end = start + PCI_MSIX_ENTRY_SIZE; > > + if (addr >= start && addr + len <= end) { > > + *idx = i; > > + return true; > > + } > > + } > > We really should not need guest_msix_entries at all: > if we are emulating MSIX in kernel anyway, let us just > emulate it there. Doing half setup from qemu > and half from kvm will just create problems. > > If you do it all in kernel, you will simply need a single > range check to see whether this is mask write. Would explain it in the an separate mail. And please comments as well. -- regards Yang, Sheng > > > + return false; > > +} > > + > > +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int > > len, +void *val) > > +{ > > + struct kvm_assigned_dev_kernel *adev = > > + container_of(this, struct kvm_assigned_dev_kernel, > > +msix_mmio_dev); > > + int idx, r = 0; > > + u32 entry[4]; > > + struct kvm_kernel_irq_routing_entry *e; > > + > > + mutex_lock(&adev->kvm->lock); > > + if (!msix_mmio_in_range(adev, addr, len, &idx)) { > > + r = -EOPNOTSUPP; > > + goto out; > > + } > > + if ((addr & 0x3) || len != 4) { > > + printk(KERN_WARNING > > + "KVM: Unaligned reading for device MSI-X MMIO! " > > + "addr 0x%llx, len %d\n", addr, len); > > + r = -EOPNOTSUPP; > > + goto out; > > + } > > + > > + e = kvm_get_irq_routing_entry(adev->kvm, > > + adev->guest_msix_entries[idx].vector); > > + if (!e || e->type != KVM_IRQ_ROUTING_MSI) { > > + printk(KERN_WARNING "KVM: Wrong MSI-X routing entry! " > > + "addr 0x%llx, len %d\n", addr, len); > > + r = -EOPNOTSUPP; > > + goto out; > > + } > > + entry[0] = e->msi.address_lo; > > + entry[1] = e->msi.address_hi; > > + entry[2] = e->msi.data; > > + entry[3] = !!(adev->guest_msix_entries[idx].flags & > > + KVM_ASSIGNED_MSIX_MASK); > > + memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / 4], len); > > + > > +out: > > + mutex_unlock(&adev->kvm->lock); > > + return r; > > +} > > + > > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int > > len, + const void *val) > > +{ > > + struct kvm_assigned_dev_kernel *adev = > > + container_of(this, struct kvm_assigned_dev_kernel, > > +msi
Re: [PATCH 08/10] MCE: Relay UCR MCE to guest
On 10/21/2010 12:03 AM, Anthony Liguori wrote: The timeout of qemu_kvm_eat_signal is always zero. So then qemu_kvm_eat_signal purely polls and it will happily keep polling as long as there is a signal pending. So what's the point of doing a sigtimedwait() and dropping qemu_mutex? I agree that keeping the qemu_mutex makes sense if you remove the timeout argument (which I even have a patch for, as part of my Win32 iothread series). Until there is the theoretical possibility of suspending the process, qemu_kvm_eat_signal should drop the mutex. Why not just check sigpending in a loop? Because sigtimedwait eats the signal, unlike sigpending (and sigsuspend). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device
On Wednesday 20 October 2010 17:51:01 Avi Kivity wrote: > On 10/20/2010 10:26 AM, Sheng Yang wrote: > > Here is v2. > > > > Changelog: > > > > v1->v2 > > > > The major change from v1 is I've added the in-kernel MSI-X mask emulation > > support, as well as adding shortcuts for reading MSI-X table. > > > > I've taken Michael's advice to use mask/unmask directly, but unsure about > > exporting irq_to_desc() for module... > > > > Also add flush_work() according to Marcelo's comments. > > Any performance numbers? What are the affected guests? just RHEL 4, or > any others? At least current RHEL5 series would be affected. I have done an simple benchmark on RHEL5u5 guest with 512m memory and 1 cpu. Device is one 10G NIC with SRIOV. One VF was assigned to guest to communicate the PF in the host. 3 threads had been used in iperf of guest to push the CPU utilization to 100%. In this condition, QEmu method's bandwidth is about 20% lower than in-kernel one(~7.5G vs ~9G). Interrupt rate in this condition is about 20k/sec. The reason is 2.6.18 kernel used mask_msi in ack() for MSI chip, caused significant mask bit operations when interrupt rate is high. We have also reproduced the issue on some large scale benchmark on the guest with newer kernel like 2.6.30 on Xen, under very high interrupt rate, due to some interrupt rate limitation mechanism in kernel. > > Alex, Michael, how would you do this with vfio? -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM Test: add -w parameter in nc command in kvm_utils.py
On 10/21/2010 03:57 AM, Feng Yang wrote: > > - "Michael Goldish" wrote: > >> From: "Michael Goldish" >> To: "Feng Yang" >> Cc: autot...@test.kernel.org, kvm@vger.kernel.org >> Sent: Wednesday, October 20, 2010 6:48:42 PM GMT +08:00 Beijing / Chongqing >> / Hong Kong / Urumqi >> Subject: Re: [PATCH] KVM Test: add -w parameter in nc command in kvm_utils.py >> >> On 10/20/2010 12:18 PM, Feng Yang wrote: >>> >>> - "Michael Goldish" wrote: >>> From: "Michael Goldish" To: "Feng Yang" Cc: autot...@test.kernel.org, kvm@vger.kernel.org Sent: Wednesday, October 20, 2010 5:11:32 PM GMT +08:00 Beijing / >> Chongqing / Hong Kong / Urumqi Subject: Re: [PATCH] KVM Test: add -w parameter in nc command in >> kvm_utils.py On 10/20/2010 08:55 AM, Feng Yang wrote: > If a connection and stdin are idle for more than timeout seconds, > then the connection is silently closed. withou this paramter, > nc will listen forever for a connection. This may cause our test > hang. redmine issue: > http://redmine.englab.nay.redhat.com/issues/show/6947 > Add -w parameter should fix this issue. > > Signed-off-by: Feng Yang > --- > client/tests/kvm/kvm_utils.py |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index a51c857..526d38d 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -733,7 +733,7 @@ def remote_login(client, host, port, >> username, password, prompt, linesep="\n", > elif client == "telnet": > cmd = "telnet -l %s %s %s" % (username, host, port) > elif client == "nc": > -cmd = "nc %s %s" % (host, port) > +cmd = "nc %s %s -w %s" % (host, port, timeout) > else: > logging.error("Unknown remote shell client: %s" % >> client) > return I don't understand how remote_login() can stall here. kvm_utils._remote_login() doesn't rely on nc's self-termination. >> If no shell prompt is available after 10 seconds, nc is forcefully terminated and _remote_login() fails. If it somehow stalls this might >> indicate a bug somewhere. >>> It is really rarely reproduce. Only meet this issue when guest panic >> or core dump. >>> nc stall. Do not know why it is not terminated after 10s. >>> >>> I think -w parameter is helpful in this situation. >>> If it does not cause other issue, we'd better add -w in nc >> command. >>> >>> Thanks for your command. What do you think? >>> >>> Feng Yang >> >> Adding -w 10 may cause trouble. If a good functional session is idle >> for more than 10 seconds during a test, it will be closed. For >> example, >> if you run a command that takes more than 10 seconds to complete, and >> produces no output while it runs, the session will be closed. > Seems the session also will be closed in our current code in this situation. > If no output for 10s, _remote_login() will return False. Then session will be > closed in remote_login(). If the login process takes more than 10s, yes, the session will be closed. However, if we use -w 10, the session will be closed whenever it becomes idle, not just during login. So if we login successfully and then run a test (e.g. an autoit script), after 10s of no output the test will be terminated. > I have run some cases, it works well. But I think we'd better first post > this code, in case it really cause trouble. > > I will do more test on it. > >> >> Also, I doubt it'll solve the problem you've experienced. As it is, >> the >> code should (and usually does) properly handle the case of nc not >> terminating. >> >> If you manage to reproduce this again please save the log so I can >> have >> a look at it. > > When meet this issue again, I will send log to you. Thanks for your help! > > Feng Yang > > > >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] qemu-kvm: device assignment: Enable in-kernel MSI-X mask support
On Thursday 21 October 2010 06:36:56 Michael S. Tsirkin wrote: > On Wed, Oct 20, 2010 at 04:29:55PM +0800, Sheng Yang wrote: > > Signed-off-by: Sheng Yang > > --- > > > > hw/device-assignment.c | 13 + > > 1 files changed, 13 insertions(+), 0 deletions(-) > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > > index d1a6282..aa3358e 100644 > > --- a/hw/device-assignment.c > > +++ b/hw/device-assignment.c > > @@ -269,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice > > *pci_dev, int region_num, > > > > AssignedDevRegion *region = &r_dev->v_addrs[region_num]; > > PCIRegion *real_region = &r_dev->real_device.regions[region_num]; > > int ret = 0; > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK > > +struct kvm_assigned_msix_mmio msix_mmio; > > +#endif > > > > DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08" > > FMT_PCIBUS " region_num=%d \n", > > > >e_phys, region->u.r_virtbase, type, e_size, region_num); > > > > @@ -287,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice > > *pci_dev, int region_num, > > > > cpu_register_physical_memory(e_phys + offset, > > > > TARGET_PAGE_SIZE, r_dev->mmio_index); > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK > > + memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio)); > > + msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr, > > + r_dev->h_busnr, r_dev->h_devfn); > > + msix_mmio.base_addr = e_phys + offset; > > + if (kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio)) > > +fprintf(stderr, "fail to register in-kernel > > msix_mmio!\n"); +/* We can still continue because the MMIO > > accessing can fall + * back to QEmu */ > > So let's not print scary messages ... OK... -- regards Yang, Sheng > > > +#endif > > > > } > > > > } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][v2] MSI-X mask emulation support for assigned device
On Thursday 21 October 2010 03:02:24 Marcelo Tosatti wrote: > On Wed, Oct 20, 2010 at 04:26:24PM +0800, Sheng Yang wrote: > > Here is v2. > > > > Changelog: > > > > v1->v2 > > > > The major change from v1 is I've added the in-kernel MSI-X mask emulation > > support, as well as adding shortcuts for reading MSI-X table. > > > > I've taken Michael's advice to use mask/unmask directly, but unsure about > > exporting irq_to_desc() for module... > > > > Also add flush_work() according to Marcelo's comments. > > > > Sheng Yang (8): > > PCI: MSI: Move MSI-X entry definition to pci_regs.h > > irq: Export irq_to_desc() to modules > > KVM: x86: Enable ENABLE_CAP capability for x86 > > KVM: Move struct kvm_io_device to kvm_host.h > > KVM: Add kvm_get_irq_routing_entry() func > > KVM: assigned dev: Preparation for mask support in userspace > > KVM: assigned dev: Introduce io_device for MSI-X MMIO accessing > > KVM: Emulation MSI-X mask bits for assigned devices > > Why does the current scheme, without msix per-vector mask support, is > functional at all? Luck? Well, I believe we are lucky... We just ignored the operation in the past. I had raised this issue when Michael begin to work on MSI-X support in QEmu long ago, but then I was busy on some other things. Until now when Eddie want to add MSI-X in-kernel acceleration, we back to it... And about the "flush_work()" you commented, I still think even for the native device, it's possible that the short time after OS write to the mask bit, the interrupt may be delivered, if the device already send the message out on the bus(I just guess, haven't observed)... The spec didn't say that the finish of writing mask bit behavior would also get all message on the bus delivered. So I think leave the work a little late would also be fine. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html