Re: [RFC][ PATCH 3/3] vhost-net: Add mergeable RX buffer support to vhost-net
On Sun, Mar 07, 2010 at 06:06:51PM -0800, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 03/07/2010 08:26:33 AM: On Tue, Mar 02, 2010 at 05:20:34PM -0700, David Stevens wrote: This patch glues them all together and makes sure we notify whenever we don't have enough buffers to receive a max-sized packet, and adds the feature bit. Signed-off-by: David L Stevens dlstev...@us.ibm.com Maybe split this up? I can. I was looking mostly at size (and this is the smallest of the bunch). But the feature requires all of them together, of course. This last one is just everything left over from the other two. @@ -110,6 +90,7 @@ size_t len, total_len = 0; int err, wmem; struct socket *sock = rcu_dereference(vq-private_data); + I tend not to add empty lines if line below it is already short. This leaves no blank line between the declarations and the start of code. It's habit for me-- not sure of kernel coding standards address that or not, but I don't think I've seen it anywhere else. if (!sock) return; @@ -166,11 +147,11 @@ /* Skip header. TODO: support TSO. */ msg.msg_iovlen = out; head.iov_len = len = iov_length(vq-iov, out); + I tend not to add empty lines if line below it is a comment. I added this to separate the logical skip header block from the next, unrelated piece. Not important to me, though. /* Sanity check */ if (!len) { vq_err(vq, Unexpected header len for TX: - %zd expected %zd\n, - len, vq-guest_hlen); + %zd expected %zd\n, len, vq-guest_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ /* TODO: Should check and handle checksum. */ + if (vhost_has_feature(net-dev, VIRTIO_NET_F_MRG_RXBUF)) { + struct virtio_net_hdr_mrg_rxbuf *vhdr = + (struct virtio_net_hdr_mrg_rxbuf *) + vq-iov[0].iov_base; + /* add num_bufs */ + vq-iov[0].iov_len = vq-guest_hlen; + vhdr-num_buffers = headcount; I don't understand this. iov_base is a userspace pointer, isn't it. How can you assign values to it like that? Rusty also commented earlier that it's not a good idea to assume specific layout, such as first chunk being large enough to include virtio_net_hdr_mrg_rxbuf. I think we need to use memcpy to/from iovec etc. I guess you mean put_user() or copy_to_user(); yes, I suppose it could be paged since we read it. The code doesn't assume that it'll fit so much as arranged for it to fit. We allocate guest_hlen bytes in the buffer, but set the iovec to the (smaller) sock_hlen; do the read, then this code adds back the 2 bytes in the middle that we didn't read into (where num_buffers goes). But the allocator does require that guest_hlen will fit in a single buffer (and reports error if it doesn't). The alternative is significantly more complicated, I'm not sure why. Can't we just call memcpy_from_iovec and then read the structure as usual? and only fails if the guest doesn't give us at least the buffer size the guest header requires (a truly lame guest). I'm not sure it's worth a lot of complexity in vhost to support the guest giving us 12 byte buffers; those guests don't exist now and maybe they never should? /* This actually signals the guest, using eventfd. */ void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) { __u16 flags = 0; + I tend not to add empty lines if a line above it is already short. Again, separating declarations from code-- never seen different in any other kernel code. if (get_user(flags, vq-avail-flags)) { vq_err(vq, Failed to get flags); return; @@ -1125,7 +1140,7 @@ /* If they don't want an interrupt, don't signal, unless empty. */ if ((flags VRING_AVAIL_F_NO_INTERRUPT) - (vq-avail_idx != vq-last_avail_idx || + (vhost_available(vq) vq-maxheadcount || I don't understand this change. It seems to make code not match the comments. It redefines empty. Without mergeable buffers, we can empty the ring down to nothing before we require notification. With mergeable buffers, if the packet requires, say, 3 buffers, and we have only 2 left, we are empty and require notification and new buffers to read anything. In both cases, we notify when we can't read another packet
Re: kvm-kmod-2.6.33 (or 2.6.32) messes up pages on guest exit
2010/3/8 Jan Kiszka jan.kis...@siemens.com: Henrik Holst wrote: Hi, I'm running a few Debian Lenny host machines with kernel 2.6.26, in production we use kvm-kmod-2.6.31.5 without any problems. Today I tested to change to kvm-kmod-2.6.33 and everything went just fine up to the moment when a guest exited and when it did the kernel started to log thousands of rows about page errors on the host. modprobe -r kvm-intel (and kvm) and modprobe of the 2.6.31.5 version made the problems go away again. Could it be that 2.6.26 is a little too old kernel to run as host for the newer kvm? Maybe. I'm only testing against 2.6.27 as oldest host, down to 2.6.24 is solely build-tested. Maybe the missing MMU notfiers in = 2.6.26 cause troubles, though this used to work before. Can't promise that I find the time to look into this (such old kernels are out of official scope). If I managed to, I would try to bisect over kvm-kmod-2.6.32 what import from kvm.git or what kvm-kmod wrapping brought us the breakage. But maybe someone else finds the time, setup support would be provided... I figured as much, I'll try to bisect when I get the time. /Henrik Holst -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Moving dirty bitmaps to userspace - Double buffering approach
Hi, I would like to hear your comments about the following plan: Moving dirty bitmaps to userspace - Double buffering approach especially I would be glad if I can hear some advice about how to keep the compatibility. Thanks in advance, Takuya --- Overview: Last time, I submitted a patch make get dirty log ioctl return the first dirty page's position http://www.spinics.net/lists/kvm/msg29724.html and got some new better ideas from Avi. As a result, I agreed to try to eliminate the bitmap allocation done in the x86 KVM every time when we execute get dirty log by using double buffering approach. Here is my plan: - move the dirty bitmap allocation to userspace We allocate bitmaps in the userspace and register them by ioctl. Once a bitmap is registered, we do not touch it from userspace and let the kernel modify it directly until we switch to the next bitmap. We use double buffering at this switch point: userspace give the kernel a new bitmap by ioctl and the kernel switch the bitmap atomically to new one. After succeeded in this switch, we can read the old bitmap freely in the userspace and free it if we want: needless to say we can also reuse it at the next switch. - implementation details Although it may be possible to touch the bitmap from the kernel side without doing kmap, I think kmapping the bitmap is better. So we may use the following functions paying enough attention to the preemption control. - get_user_pages() - kmap_atomic() - compatibility issues What I am facing now are the compatibility issues. We have to support both the userspace and kernel side bitmap allocations to let the current qemu and KVM work properly. 1. From the kernel side, we have to care bitmap allocations done in both the kvm_vm_ioctl_set_memory_region() and kvm_vm_ioctl_get_dirty_log(). 2. From the userspace side, we have to check the new api's availability and determine which way we use, e.g. by using check extension ioctl. The most problematic is 1, kernel side. We have to be able to know by which way current bitmap allocation is being done using flags or something. In the case of set memory region, we have to judge whether we allocate a bitmap, and if not we have to register a bitmap later by another api: set memory region is not restricted to the dirty log issues and need more care than get dirty log. Are there any good ways to solve this kind of problems? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 03:46 AM, Bernhard Schmidt wrote: Hi, sorry for this pretty generic question, I did not find any real pros and cons on the net anywhere, but I might just have missed them. In a pure x86_64 environment (~2.6.32 vanilla kernel, 0.12.3 qemu-kvm), is enabling linux-aio in KVM a good idea? Yes. What are the advantages/disadvantages? It's faster. Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). The reason I'm asking is that there has been some traffic on the list about it, so it seems to be something people want to get working. qemu-kvm in Ubuntu Lucid is currently not compiled with that option. I've made a local version with aio and it seems to work fine (and performs a bit better at first glance). Is there any reason one should not compile that feature by default? Not to my knowledge. Does it do anything if not explicitly run with aio=native? IIUC, no. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On 03/03/2010 09:12 PM, Joerg Roedel wrote: This patch changes the tdp_enabled flag from its global meaning to the mmu-context. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..e7bef19 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -254,6 +254,7 @@ struct kvm_mmu { int root_level; int shadow_root_level; union kvm_mmu_page_role base_role; + bool tdp_enabled; This needs a different name, since the old one is still around. Perhaps we could call it parent_mmu and make it a kvm_mmu pointer. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker
On 03/03/2010 09:12 PM, Joerg Roedel wrote: This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. Signed-off-by: Joerg Roedeljoerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c |7 +++ arch/x86/kvm/paging_tmpl.h | 19 +++ include/linux/kvm_host.h|5 + 4 files changed, 32 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c0b5576..76c8b5f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -250,6 +250,7 @@ struct kvm_mmu { void (*free)(struct kvm_vcpu *vcpu); gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, u32 *error); + gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error); void (*prefetch_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page); int (*sync_page)(struct kvm_vcpu *vcpu, I think placing this here means we will miss a few translations, namely when we do a physical access (say, reading PDPTEs or similar). We need to do this on the level of kvm_read_guest() so we capture physical accesses: kvm_read_guest_virt - walk_addr - kvm_read_guest_tdp - kvm_read_guest_virt - walk_addr - kvm_read_guest_tdp - kvm_read_guest Of course, not all accesses will use kvm_read_guest_tdp; for example kvmclock accesses should still go untranslated. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
On 03/03/2010 09:12 PM, Joerg Roedel wrote: This capability shows userspace that is can trust the values of cpuid[0x800A] that it gets from the kernel. Old behavior was to just return the host cpuid values which is broken because all additional svm-features need support in the svm emulation code. A think we can simply fix the bug and push the fix to the various stable queues. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On 03/03/2010 09:12 PM, Joerg Roedel wrote: Hi, here are the patches that implement nested paging support for nested svm. They are somewhat intrusive to the soft-mmu so I post them as RFC in the first round to get feedback about the general direction of the changes. Nevertheless I am proud to report that with these patches the famous kernel-compile benchmark runs only 4% slower in the l2 guest as in the l1 guest when l2 is single-processor. With SMP guests the situation is very different. The more vcpus the guest has the more is the performance drop from l1 to l2. Anyway, this post is to get feedback about the overall concept of these patches. Please review and give feedback :-) Thanks, Joerg Diffstat: arch/x86/include/asm/kvm_host.h | 21 ++ arch/x86/kvm/mmu.c | 152 ++- arch/x86/kvm/mmu.h |2 + arch/x86/kvm/paging_tmpl.h | 81 ++--- arch/x86/kvm/svm.c | 126 +++- arch/x86/kvm/vmx.c |9 +++ arch/x86/kvm/x86.c | 19 +- include/linux/kvm.h |1 + include/linux/kvm_host.h|5 ++ 9 files changed, 354 insertions(+), 62 deletions(-) Okay, this looks excellent overall, it's nice to see how well this fits with the existing mmu infrastructure (only ~300 lines added). The performance results are impressive. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). What does work well mean in this context? Potential dataloss? Is there any reason one should not compile that feature by default? Not to my knowledge. Thanks, I've filed a bug with Ubuntu to get it enabled. Bernhard -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Am 08.03.2010 um 02:45 schrieb Jamie Lokier ja...@shareable.org: Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Yes. Guest-observable behaviour is likely to be quite different on different hosts, expecially beteen x86 and non-x86 hosts, which is not good at all for emulation. Memory barriers performed by the guest would help, but would not remove the fact that behaviour would vary beteen different host types if a guest doesn't call them. I.e. you could accidentally have some guests working fine for years on x86 hosts, which gain subtle memory corruption as soon as you run them on a different host. This is acceptable when recompiling code for different architectures, but it's asking for trouble with binary guest images which aren't supposed to depend on host architecture. However, coherence could be made host-type-independent by the host mapping and unampping pages, so that each page is only mapped into one guest (or guest CPU) at a time. Just like some clustering filesystems do to maintain coherence. Or we could put in some code that tells the guest the host shm architecture and only accept x86 on x86 for now. If anyone cares for other combinations, they're free to implement them. Seriously, we're looking at an interface designed for kvm here. Let's please keep it as simple and fast as possible for the actual use case, not some theoretically possible ones. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 11:48 AM, Bernhard Schmidt wrote: On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). What does work well mean in this context? Potential dataloss? No, it becomes synchronous (=extra slow). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Support adding a file to qemu's ram allocation
On 03/06/2010 01:52 AM, Cam Macdonell wrote: This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a host file into guest RAM. This function mmaps the opened file anywhere and adds the memory to the ram blocks. Usage is qemu_add_file_to_ram(fd, size, MAP_SHARED); A traditional name would be qemu_ram_mmap() as a counterpart to qemu_ram_alloc(). Would be nice to accept an offset. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Alexander Graf wrote: Or we could put in some code that tells the guest the host shm architecture and only accept x86 on x86 for now. If anyone cares for other combinations, they're free to implement them. Seriously, we're looking at an interface designed for kvm here. Let's please keep it as simple and fast as possible for the actual use case, not some theoretically possible ones. The concern is that a perfectly working guest image running on kvm, the guest being some OS or app that uses this facility (_not_ a kvm-only guest driver), is later run on qemu on a different host, and then mostly works except for some silent data corruption. That is not a theoretical scenario. Well, the bit with this driver is theoretical, obviously :-) But not the bit about moving to a different host. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/06/2010 01:52 AM, Cam Macdonell wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the Doorbell register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are supported between multiple VMs by using a shared memory server -ivshmemsize in MB,[unix:path][file] Interrupts can also be used between host and guest as well by implementing a listener on the host that talks to shared memory server. The shared memory server passes file descriptors for the shared memory object and eventfds (our interrupt mechanism) to the respective qemu instances. Can you provide a spec that describes the device? This would be useful for maintaining the code, writing guest drivers, and as a framework for review. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Corrupted filesystem, possible after livemigration with iSCSI storagebackend.
In our KVM system we have two iSCSI backends (master/slave configuration) with failover and two KVM hosts supporting live migration. The iSCSI volumes are shared by the host as a block device in KVM, and the volumes are available on both frontends. After a reboot one of the KVMs where not able to start again due to file system corruption. We use XFS and have problems to understand what caused the corruption. We have ruled out the iSCSI backend as both the master and slave data where consistent at the time. Anyone else had similar problems? What is the recommended way to share an iSCSI drive among the two host machines? Should XFS be ok as a file system for live migration? I'm not able to find any documentation stating that a clustered file system (GFS2 etc.) is recommended. Are there any concurrent writes on the two host machines during a livemigtation? disk type='block' device='disk' driver name='qemu'/ source dev='/dev/disk/by-path/ip-ip:3260-iscsi-test2-lun-0'/ target dev='sda' bus='scsi'/ address type='drive' controller='0' bus='0' unit='0'/ /disk #virsh version Compiled against library: libvir 0.7.6 Using library: libvir 0.7.6 Using API: QEMU 0.7.6 Running hypervisor: QEMU 0.11.0 #uname -a Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64 GNU/Linux Regards Espen -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM PMU virtualization
On 03/01/2010 07:17 PM, Peter Zijlstra wrote: 2. For every emulated performance counter the guest activates kvm allocates a perf_event and configures it for the guest (we may allow kvm to specify the counter index, the guest would be able to use rdpmc unintercepted then). Event filtering is also done in this step. rdpmc can never be used unintercepted, for perf might be multiplexing the actual hw. How often is rdpmc used? If it is invoked on high frequency software-only events (like context switches), then this may be a performance issue. If it is only issued on perf interrupts, we may be able to live with it (since we already took an exit for the interrupt). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
extended vga modes?
After updating qemu-kvm Debian package to 0.12 we've a bugreport about missing video modes which were present in previous versions. Big thanks to the original reporter, Bjørn Mork, who found what the issue is. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572991 for the bugreport in question, together with the resolution. In short, when vgabios were dropped from qemu-kvm (for whatever yet unknown reason), all local changes to it were dropped too, including this patch: http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=ebfac597cf Note that this patch is present in upstream 0.6c version of vgabios, but not in 0.6b+ which is currently used in qemu. Should vgabios in qemu include that patch? See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545 Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: extended vga modes?
On 03/08/2010 12:20 PM, Michael Tokarev wrote: After updating qemu-kvm Debian package to 0.12 we've a bugreport about missing video modes which were present in previous versions. Big thanks to the original reporter, Bjørn Mork, who found what the issue is. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572991 for the bugreport in question, together with the resolution. In short, when vgabios were dropped from qemu-kvm (for whatever yet unknown reason), What do you mean? qemu-kvm still carries a local vgabios (see kvm/vgabios in qemu-kvm.git). all local changes to it were dropped too, including this patch: http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=ebfac597cf Note that this patch is present in upstream 0.6c version of vgabios, but not in 0.6b+ which is currently used in qemu. Should vgabios in qemu include that patch? See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545 Looks like Fedora was using the upstream vgabios for a while, not the version in qemu-kvm.git. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Jamie Lokier wrote: Alexander Graf wrote: Or we could put in some code that tells the guest the host shm architecture and only accept x86 on x86 for now. If anyone cares for other combinations, they're free to implement them. Seriously, we're looking at an interface designed for kvm here. Let's please keep it as simple and fast as possible for the actual use case, not some theoretically possible ones. The concern is that a perfectly working guest image running on kvm, the guest being some OS or app that uses this facility (_not_ a kvm-only guest driver), is later run on qemu on a different host, and then mostly works except for some silent data corruption. That is not a theoretical scenario. Well, the bit with this driver is theoretical, obviously :-) But not the bit about moving to a different host. I agree. Hence there should be a safety check so people can't corrupt their data silently. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: extended vga modes?
Avi Kivity wrote: [] In short, when vgabios were dropped from qemu-kvm (for whatever yet unknown reason), What do you mean? qemu-kvm still carries a local vgabios (see kvm/vgabios in qemu-kvm.git). Oh my. So we all overlooked it. I asked you several times about the bios sources, in 0.12 seabios were supposed to be in roms/seabios (which is still empty in the release), and I thought vgabios should be in roms/vgabios (which is empty too), and concluded it were dropped from qemu-kvm tarball. But you're right, and I by mistake take vgabios sources from upstream qemu when building Debian package, instead of using the old'good sources from kvm/vgabios. What a mess!... :( And it looks like that it's time to remove at least parts of this mess, don't you think? How about pushing the vgabios changes to qemu and moving it to the same place where it is in qemu? Does it make sense? There were another patch mentioned recently when I asked for bios sources origin a few days ago which probably should be applied as well... [] Should vgabios in qemu include that patch? See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545 Looks like Fedora was using the upstream vgabios for a while, not the version in qemu-kvm.git. Yes they are still using upstream but they also applied the patches missing in there. Thank you for the prompt response. Now I think all confusion is cleared. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: extended vga modes?
On 03/08/2010 01:07 PM, Michael Tokarev wrote: Avi Kivity wrote: [] In short, when vgabios were dropped from qemu-kvm (for whatever yet unknown reason), What do you mean? qemu-kvm still carries a local vgabios (see kvm/vgabios in qemu-kvm.git). Oh my. So we all overlooked it. I asked you several times about the bios sources, in 0.12 seabios were supposed to be in roms/seabios (which is still empty in the release), and I thought vgabios should be in roms/vgabios (which is empty too), and concluded it were dropped from qemu-kvm tarball. But you're right, and I by mistake take vgabios sources from upstream qemu when building Debian package, instead of using the old'good sources from kvm/vgabios. What a mess!... :( And it looks like that it's time to remove at least parts of this mess, don't you think? How about pushing the vgabios changes to qemu and moving it to the same place where it is in qemu? Does it make sense? We can't push the changes to qemu since qemu.git doesn't have a vgabios fork. We might push the changes upstream. Best of all if the seabios thing repeats itself with vgabios so we have maintainable and maintained vga firmware. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
request: please merge docs for -netdev in stable
hi, with version 0.12.x there is a new -netdev option, but the docs cannot be found anywhere. It seems that this commit http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=96560cb34c3183a4fb1769e4eff4d860a24579a8 is only applied to the unstable but not stable, is it possible to merge this to stable? Thanks xming -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: request: please merge docs for -netdev in stable
Copying qemu-devel On 03/08/2010 01:11 PM, xming wrote: hi, with version 0.12.x there is a new -netdev option, but the docs cannot be found anywhere. It seems that this commit http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=96560cb34c3183a4fb1769e4eff4d860a24579a8 is only applied to the unstable but not stable, is it possible to merge this to stable? Thanks xming -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 3/3] Let host NIC driver to DMA to guest user space.
On Sat, Mar 06, 2010 at 05:38:38PM +0800, xiaohui@intel.com wrote: From: Xin Xiaohui xiaohui@intel.com The patch let host NIC driver to receive user space skb, then the driver has chance to directly DMA to guest user space buffers thru single ethX interface. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzha...@gmail.com Sigend-off-by: Jeff Dike jd...@c2.user-mode-linux.org I have a feeling I commented on some of the below issues already. Do you plan to send a version with comments addressed? --- include/linux/netdevice.h | 76 ++- include/linux/skbuff.h| 30 +++-- net/core/dev.c| 32 ++ net/core/skbuff.c | 79 + 4 files changed, 205 insertions(+), 12 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 94958c1..97bf12c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -485,6 +485,17 @@ struct netdev_queue { unsigned long tx_dropped; } cacheline_aligned_in_smp; +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) +struct mpassthru_port{ + int hdr_len; + int data_len; + int npages; + unsignedflags; + struct socket *sock; + struct skb_user_page*(*ctor)(struct mpassthru_port *, + struct sk_buff *, int); +}; +#endif /* * This structure defines the management hooks for network devices. @@ -636,6 +647,10 @@ struct net_device_ops { int (*ndo_fcoe_ddp_done)(struct net_device *dev, u16 xid); #endif +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) + int (*ndo_mp_port_prep)(struct net_device *dev, + struct mpassthru_port *port); +#endif }; /* @@ -891,7 +906,8 @@ struct net_device struct macvlan_port *macvlan_port; /* GARP */ struct garp_port*garp_port; - + /* mpassthru */ + struct mpassthru_port *mp_port; /* class/net/name entry */ struct device dev; /* space for optional statistics and wireless sysfs groups */ @@ -2013,6 +2029,62 @@ static inline u32 dev_ethtool_get_flags(struct net_device *dev) return 0; return dev-ethtool_ops-get_flags(dev); } -#endif /* __KERNEL__ */ +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) +static inline int netdev_mp_port_prep(struct net_device *dev, + struct mpassthru_port *port) +{ This function lacks documentation. + int rc; + int npages, data_len; + const struct net_device_ops *ops = dev-netdev_ops; + + /* needed by packet split */ + if (ops-ndo_mp_port_prep) { + rc = ops-ndo_mp_port_prep(dev, port); + if (rc) + return rc; + } else { /* should be temp */ + port-hdr_len = 128; + port-data_len = 2048; + port-npages = 1; where do the numbers come from? + } + + if (port-hdr_len = 0) + goto err; + + npages = port-npages; + data_len = port-data_len; + if (npages = 0 || npages MAX_SKB_FRAGS || + (data_len PAGE_SIZE * (npages - 1) || + data_len PAGE_SIZE * npages)) + goto err; + + return 0; +err: + dev_warn(dev-dev, invalid page constructor parameters\n); + + return -EINVAL; +} + +static inline int netdev_mp_port_attach(struct net_device *dev, + struct mpassthru_port *port) +{ + if (rcu_dereference(dev-mp_port)) + return -EBUSY; + + rcu_assign_pointer(dev-mp_port, port); + + return 0; +} + +static inline void netdev_mp_port_detach(struct net_device *dev) +{ + if (!rcu_dereference(dev-mp_port)) + return; + + rcu_assign_pointer(dev-mp_port, NULL); + synchronize_rcu(); +} The above looks wrong, rcu_dereference should be called under rcu read side, rcu_assign_pointer usually should not, synchronize_rcu definitely should not. As I suggested already, these functions are better opencoded, rcu is tricky as is without hiding it in inline helpers. +#endif /* CONFIG_VHOST_PASSTHRU */ +#endif /* __KERNEL__ */ #endif /* _LINUX_NETDEVICE_H */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index df7b23a..e59fa57 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -209,6 +209,13 @@ struct skb_shared_info { void * destructor_arg; }; +struct skb_user_page { + u8 *start; + int size; + struct skb_frag_struct *frags;
Re: [PATCH v1 1/3] A device for zero-copy based on KVM virtio-net.
On Sat, Mar 06, 2010 at 05:38:36PM +0800, xiaohui@intel.com wrote: From: Xin Xiaohui xiaohui@intel.com Add a device to utilize the vhost-net backend driver for copy-less data transfer between guest FE and host NIC. It pins the guest user space to the host memory and provides proto_ops as sendmsg/recvmsg to vhost-net. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzha...@gmail.com Sigend-off-by: Jeff Dike jd...@c2.user-mode-linux.org I think some of the comments below are repeated. Do you plan addressing them? --- drivers/vhost/Kconfig |5 + drivers/vhost/Makefile|2 + drivers/vhost/mpassthru.c | 1202 + include/linux/mpassthru.h | 29 ++ I'm not sure it's wise to limit the device to vhost even if that's the only mode that you are going to support in the first version. How about locating the char device under drivers/net/? 4 files changed, 1238 insertions(+), 0 deletions(-) create mode 100644 drivers/vhost/mpassthru.c create mode 100644 include/linux/mpassthru.h diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 9f409f4..ee32a3b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -9,3 +9,8 @@ config VHOST_NET To compile this driver as a module, choose M here: the module will be called vhost_net. +config VHOST_PASSTHRU + tristate Zerocopy network driver (EXPERIMENTAL) + depends on VHOST_NET + ---help--- + zerocopy network I/O support diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 72dd020..3f79c79 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o vhost_net-y := vhost.o net.o + +obj-$(CONFIG_VHOST_PASSTHRU) += mpassthru.o diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c new file mode 100644 index 000..744d6cd --- /dev/null +++ b/drivers/vhost/mpassthru.c @@ -0,0 +1,1202 @@ +/* + * MPASSTHRU - Mediate passthrough device. + * Copyright (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define DRV_NAMEmpassthru +#define DRV_DESCRIPTION Mediate passthru device driver +#define DRV_COPYRIGHT (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + +#include linux/module.h +#include linux/errno.h +#include linux/kernel.h +#include linux/major.h +#include linux/slab.h +#include linux/smp_lock.h +#include linux/poll.h +#include linux/fcntl.h +#include linux/init.h +#include linux/skbuff.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/miscdevice.h +#include linux/ethtool.h +#include linux/rtnetlink.h +#include linux/if.h +#include linux/if_arp.h +#include linux/if_ether.h +#include linux/crc32.h +#include linux/nsproxy.h +#include linux/uaccess.h +#include linux/virtio_net.h +#include linux/mpassthru.h +#include net/net_namespace.h +#include net/netns/generic.h +#include net/rtnetlink.h +#include net/sock.h + +#include asm/system.h + +#include vhost.h + +/* Uncomment to enable debugging */ +/* #define MPASSTHRU_DEBUG 1 */ + +#ifdef MPASSTHRU_DEBUG +static int debug; + +#define DBG if (mp-debug) printk +#define DBG1 if (debug == 2) printk +#else +#define DBG(a...) +#define DBG1(a...) +#endif + +#define COPY_THRESHOLD (L1_CACHE_BYTES * 4) +#define COPY_HDR_LEN (L1_CACHE_BYTES 64 ? 64 : L1_CACHE_BYTES) + +struct frag { + u16 offset; + u16 size; +}; + +struct page_ctor { + struct list_headreadq; + int w_len; + int r_len; + spinlock_t read_lock; + atomic_trefcnt; + struct kmem_cache *cache; + struct net_device *dev; + struct mpassthru_port port; + void*sendctrl; + void*recvctrl; +}; + +struct page_info { + struct list_headlist; + int header; + /* indicate the actual length of bytes + * send/recv in the user space buffers + */ + int total; + int offset; + struct page *pages[MAX_SKB_FRAGS+1]; + struct skb_frag_struct frag[MAX_SKB_FRAGS+1]; + struct sk_buff *skb; + struct page_ctor*ctor; + + /* The pointer relayed to skb,
Re: [RFC] Moving dirty bitmaps to userspace - Double buffering approach
On 03/08/2010 10:22 AM, Takuya Yoshikawa wrote: Hi, I would like to hear your comments about the following plan: Moving dirty bitmaps to userspace - Double buffering approach especially I would be glad if I can hear some advice about how to keep the compatibility. Thanks in advance, Takuya --- Overview: Last time, I submitted a patch make get dirty log ioctl return the first dirty page's position http://www.spinics.net/lists/kvm/msg29724.html and got some new better ideas from Avi. As a result, I agreed to try to eliminate the bitmap allocation done in the x86 KVM every time when we execute get dirty log by using double buffering approach. [...] Although it may be possible to touch the bitmap from the kernel side without doing kmap, I think kmapping the bitmap is better. So we may use the following functions paying enough attention to the preemption control. - get_user_pages() - kmap_atomic() Although direct access is more difficult (you need to implement put_user_bit() or similar) I think it is worthwhile. get_user_pages_fast() is fast, but nowhere near as fast as put_user_bit() (or set_bit_user()), which can be just two instructions in the fast path. - compatibility issues What I am facing now are the compatibility issues. We have to support both the userspace and kernel side bitmap allocations to let the current qemu and KVM work properly. 1. From the kernel side, we have to care bitmap allocations done in both the kvm_vm_ioctl_set_memory_region() and kvm_vm_ioctl_get_dirty_log(). One way to handle this is to call do_mmap() from the kernel, so that now the bitmap really lives in user space. This is a bit ugly but I think acceptable. We already do this for KVM_SET_MEMORY_REGION (which was replaced by KVM_SET_USER_MEMORY_REGION, which moved allocation to userspace). 2. From the userspace side, we have to check the new api's availability and determine which way we use, e.g. by using check extension ioctl. The most problematic is 1, kernel side. We have to be able to know by which way current bitmap allocation is being done using flags or something. In the case of set memory region, we have to judge whether we allocate a bitmap, and if not we have to register a bitmap later by another api: set memory region is not restricted to the dirty log issues and need more care than get dirty log. Are there any good ways to solve this kind of problems? I believe that do_mmap() will simplify this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). It will actually work well on pre-allocated filesystem images, at least on XFS and NFS. The real pitfal is that cache=none is required for kernel support as it only supports O_DIRECT. Is there any reason one should not compile that feature by default? It's compiled by default if libaio and it's development headers are found. Does it do anything if not explicitly run with aio=native? No. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
However, coherence could be made host-type-independent by the host mapping and unampping pages, so that each page is only mapped into one guest (or guest CPU) at a time. Just like some clustering filesystems do to maintain coherence. You're assuming that a TLB flush implies a write barrier, and a TLB miss implies a read barrier. I'd be surprised if this were true in general. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 03:03 PM, Paul Brook wrote: On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Guest mmap.c bug
On 03/02/2010 10:25 PM, BRUNO CESAR RIBAS wrote: Hi, I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest machine. The virtual machines are desktop servers for Thin Clients. My host is running a 2.6.33 kernel and have 32GB of rami, opteron with amd-v. The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some guests are using 10GB, 4GB or 20GB of ram. My qemu-kvm version is 0.12.3 All guests are using NFSROOT as the ROOT FS and virtio as the network driver. I run the guest with: kvm -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM -net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\ -net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net tap,vlan=121,ifname=${NOME}121\ -net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net tap,vlan=112,ifname=${NOME}112\ -kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \ -append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh I have a machine running an identical kernel (without virtio stuff) for a dedicated machine (as it does not have amd-v) and it stays up for days and even months. But when running a guest machine with qemu-kvm i get some bug message and lots of process in D state and i can't 'ps aux' or look inside /proc and /sys without losing my shell (it hangs). In `console` I get the folowing message, repeated for different processor, different Pid and diferent mmap.c line (line 486 appears to). [ cut here ] kernel BUG at mm/mmap.c:869! invalid opcode: [1] SMP CPU 2 Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267 #2 RIP: 0010:[8027b2e1] [8027b2e1] find_mergeable_ano f1/0x200 RSP: :8804d933fb38 EFLAGS: 00010283 RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088 RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138 RBP: 88049fa56138 R08: 8804d933e000 R09: R10: R11: R12: 00100073 R13: 00100073 R14: f4803000 R15: 806ce6c0 FS: () GS:88051cc7d440(0063) knlGS:f41 CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880 ) Stack: 8052e62d 88049fa5 88051a5aac40 80280382 8804cb41b790 880498919018 88049f8dad20 3000 802770aa Call Trace: [8052e62d] ? _spin_lock_irq+0xd/0x10 [80280382] ? anon_vma_prepare+0x52/0x100 [802770aa] ? handle_mm_fault+0x65a/0x900 [802de6d8] ? proc_alloc_inode+0x58/0x90 [8052e545] ? __down_read+0x85/0xbc [80223331] ? do_page_fault+0x2a1/0xab0 [803d6899] ? vsnprintf+0x4d9/0x750 [8029d7a1] ? do_lookup+0x81/0x240 [8027265d] ? zone_statistics+0x7d/0x80 [8052ea3a] ? error_exit+0x0/0x70 [803d706d] ? copy_user_generic_string+0x2d/0x40 [802e35ec] ? proc_file_read+0x12c/0x2e0 [802e34c0] ? proc_file_read+0x0/0x2e0 [802dec1a] ? proc_reg_read+0x8a/0xe0 [80295995] ? vfs_read+0xb5/0x160 [80295b2e] ? sys_read+0x4e/0x90 [80227004] ? ia32_sysret+0x0/0x5 Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff 78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff0f 0b eb fe 66 1f 84 00 00 00 00 00 48 83 ec 08 48 8b RIP [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200 RSP8804d933fb38 ---[ end trace e5ca25224cd7d1d4 ]--- Does anyone has a sugestion? Where to look? What else should I trace? It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
On 03/05/2010 06:50 PM, Alexander Graf wrote: We have wrappers to do for example gpr read/write accesses with, because the contents of registers could be either in the PACA or in the VCPU struct. There's nothing that says we have to have the guest vcpu loaded when using these wrappers though, so let's introduce a flag that tells us whether we're inside a vcpu_load context. On x86 we always access registers within vcpu_load() context. That simplifies things. Does this not apply here? Even so, sometimes guest registers are present on the cpu, and sometimes in shadow variables (for example, msrs might be loaded or not). The approach here is to always unload and access the variable data. See for example vmx_set_msr() calling vmx_load_host_state() before accessing msrs. Seems like this could reduce the if () tree? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/05/2010 06:50 PM, Alexander Graf wrote: Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. Signed-off-by: Alexander Grafag...@suse.de --- arch/powerpc/include/asm/kvm.h |3 +++ arch/powerpc/include/asm/kvm_ppc.h |2 ++ arch/powerpc/kvm/book3s.c |6 ++ arch/powerpc/kvm/powerpc.c |5 - 4 files changed, 15 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h index 19bae31..6c5547d 100644 --- a/arch/powerpc/include/asm/kvm.h +++ b/arch/powerpc/include/asm/kvm.h @@ -84,4 +84,7 @@ struct kvm_guest_debug_arch { #define KVM_REG_QPR 0x0040 #define KVM_REG_FQPR 0x0060 +#define KVM_INTERRUPT_SET -1U +#define KVM_INTERRUPT_UNSET-2U Funny choice of numbers. How does userspace know they exist? Can you use KVM_IRQ_LINE? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: We have wrappers to do for example gpr read/write accesses with, because the contents of registers could be either in the PACA or in the VCPU struct. There's nothing that says we have to have the guest vcpu loaded when using these wrappers though, so let's introduce a flag that tells us whether we're inside a vcpu_load context. On x86 we always access registers within vcpu_load() context. That simplifies things. Does this not apply here? Even so, sometimes guest registers are present on the cpu, and sometimes in shadow variables (for example, msrs might be loaded or not). The approach here is to always unload and access the variable data. See for example vmx_set_msr() calling vmx_load_host_state() before accessing msrs. Seems like this could reduce the if () tree? Well - it would probably render this particular patch void. In fact, I think it is already useless thanks to the other always do vcpu_load patch. As far as the already existing if goes, we can't really get rid of that. I want to be fast in the instruction emulation. Copying around the registers won't help there. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. Signed-off-by: Alexander Grafag...@suse.de --- arch/powerpc/include/asm/kvm.h |3 +++ arch/powerpc/include/asm/kvm_ppc.h |2 ++ arch/powerpc/kvm/book3s.c |6 ++ arch/powerpc/kvm/powerpc.c |5 - 4 files changed, 15 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h index 19bae31..6c5547d 100644 --- a/arch/powerpc/include/asm/kvm.h +++ b/arch/powerpc/include/asm/kvm.h @@ -84,4 +84,7 @@ struct kvm_guest_debug_arch { #define KVM_REG_QPR0x0040 #define KVM_REG_FQPR0x0060 +#define KVM_INTERRUPT_SET-1U +#define KVM_INTERRUPT_UNSET-2U Funny choice of numbers. Qemu currently does explicitly set -1U and is the only user. How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. Can you use KVM_IRQ_LINE? I'd rather like to keep that around for when we get an in-kernel-mpic, which is what we probably ultimately want for qemu. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
On 03/08/2010 03:44 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: We have wrappers to do for example gpr read/write accesses with, because the contents of registers could be either in the PACA or in the VCPU struct. There's nothing that says we have to have the guest vcpu loaded when using these wrappers though, so let's introduce a flag that tells us whether we're inside a vcpu_load context. On x86 we always access registers within vcpu_load() context. That simplifies things. Does this not apply here? Even so, sometimes guest registers are present on the cpu, and sometimes in shadow variables (for example, msrs might be loaded or not). The approach here is to always unload and access the variable data. See for example vmx_set_msr() calling vmx_load_host_state() before accessing msrs. Seems like this could reduce the if () tree? Well - it would probably render this particular patch void. In fact, I think it is already useless thanks to the other always do vcpu_load patch. As far as the already existing if goes, we can't really get rid of that. I want to be fast in the instruction emulation. Copying around the registers won't help there. So do it the other way around. Always load the registers (of course, do nothing if already loaded) and then access them in just one way. I assume during emulation the registers will always be loaded? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..c7ed3cb 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,12 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { +/* in */ +__u32 cap; Reserve space here. Add a flags field and check it for zeros. Flags? How about something like u64 args[4] That way the capability enabling code could decide what to do with the arguments. We don't always only need flags I suppose?. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/08/2010 03:48 PM, Alexander Graf wrote: How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. We generally compile on one machine, and run on another. Can you use KVM_IRQ_LINE? I'd rather like to keep that around for when we get an in-kernel-mpic, which is what we probably ultimately want for qemu. Yes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
On 03/08/2010 03:51 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..c7ed3cb 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,12 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { +/* in */ +__u32 cap; Reserve space here. Add a flags field and check it for zeros. Flags? How about something like u64 args[4] That way the capability enabling code could decide what to do with the arguments. We don't always only need flags I suppose?. If you interpret these as bit flags anyway, that would be redundant. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
Avi Kivity wrote: On 03/08/2010 03:44 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: We have wrappers to do for example gpr read/write accesses with, because the contents of registers could be either in the PACA or in the VCPU struct. There's nothing that says we have to have the guest vcpu loaded when using these wrappers though, so let's introduce a flag that tells us whether we're inside a vcpu_load context. On x86 we always access registers within vcpu_load() context. That simplifies things. Does this not apply here? Even so, sometimes guest registers are present on the cpu, and sometimes in shadow variables (for example, msrs might be loaded or not). The approach here is to always unload and access the variable data. See for example vmx_set_msr() calling vmx_load_host_state() before accessing msrs. Seems like this could reduce the if () tree? Well - it would probably render this particular patch void. In fact, I think it is already useless thanks to the other always do vcpu_load patch. As far as the already existing if goes, we can't really get rid of that. I want to be fast in the instruction emulation. Copying around the registers won't help there. So do it the other way around. Always load the registers (of course, do nothing if already loaded) and then access them in just one way. I assume during emulation the registers will always be loaded? During emulation we're always in VCPU_RUN, so the vcpu is loaded. Do you mean something like: read_register(num) { vcpu_load(); read register from PACA(num); vcpu_put(); } ? Does vcpu_load incur overhead when it doesnt' need to do anything? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
Avi Kivity wrote: On 03/08/2010 03:48 PM, Alexander Graf wrote: How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. We generally compile on one machine, and run on another. So? Then IRQ unsetting doesn't work. Without this series you won't get much further than booting the kernel anyways because XER is broken, TLB flushes are broken and FPU loading is broken. So not being able to unset an IRQ line is the least of your problems :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
Avi Kivity wrote: On 03/08/2010 03:51 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..c7ed3cb 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,12 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { +/* in */ +__u32 cap; Reserve space here. Add a flags field and check it for zeros. Flags? How about something like u64 args[4] That way the capability enabling code could decide what to do with the arguments. We don't always only need flags I suppose?. If you interpret these as bit flags anyway, that would be redundant. I think I just don't understand what you're trying to say with flags. For the OSI enabling we don't need any flags. For later additions we don't know what we'll need. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/08/2010 03:55 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:48 PM, Alexander Graf wrote: How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. We generally compile on one machine, and run on another. So? Then IRQ unsetting doesn't work. Without this series you won't get much further than booting the kernel anyways because XER is broken, TLB flushes are broken and FPU loading is broken. So not being able to unset an IRQ line is the least of your problems :). There's a difference between an error message telling you to upgrade to a kernel with KVM_CAP_BLAH and a failure. It's the difference between a bug report and silence. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
Avi Kivity wrote: On 03/08/2010 03:55 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:48 PM, Alexander Graf wrote: How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. We generally compile on one machine, and run on another. So? Then IRQ unsetting doesn't work. Without this series you won't get much further than booting the kernel anyways because XER is broken, TLB flushes are broken and FPU loading is broken. So not being able to unset an IRQ line is the least of your problems :). There's a difference between an error message telling you to upgrade to a kernel with KVM_CAP_BLAH and a failure. It's the difference between a bug report and silence. I see. So we can check for KVM_CAP_PPC_OSI and know that it's in the same patch series, also making KVM_INTERRUPT_XXX work, right? Or do you really want to have 500 capabilities for every single patch? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
On 03/08/2010 03:56 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:51 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..c7ed3cb 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,12 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { +/* in */ +__u32 cap; Reserve space here. Add a flags field and check it for zeros. Flags? How about something like u64 args[4] That way the capability enabling code could decide what to do with the arguments. We don't always only need flags I suppose?. If you interpret these as bit flags anyway, that would be redundant. I think I just don't understand what you're trying to say with flags. For the OSI enabling we don't need any flags. For later additions we don't know what we'll need. When we have reserved fields which are later used for something new, the kernel needs a way to know if the reserved fields are known or not by userspace. One way to do this is to assume a value of zero means the field is unknown to usespace so ignore it. Another is to require userspace to set a bit in an already-known flags field, and only act on the new field if its bit was set. This has the advantage that the old kernel checks for unknown flags and errors out, improving forwards and backwards compatibility. I thought -cap was already a bit field, so this isn't necessary, but if it isn't, then a flags field is helpful. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
On 03/08/2010 03:53 PM, Alexander Graf wrote: So do it the other way around. Always load the registers (of course, do nothing if already loaded) and then access them in just one way. I assume during emulation the registers will always be loaded? During emulation we're always in VCPU_RUN, so the vcpu is loaded. Do you mean something like: read_register(num) { vcpu_load(); read register from PACA(num); vcpu_put(); } ? Does vcpu_load incur overhead when it doesnt' need to do anything? If the vcpu is always loaded, this would be redundant, no? The situation is that a piece of data is in one of two places. Instead of checking and loading it from either, force it to the place where it normally is, and load it from there. So instead of if (x) y = p1; else y = p2; in a zillion places, just do force_to_p2(); // the common case anyway y = p2; which results in cleaner code. Assuming that you have a common case of course. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/08/2010 04:01 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:55 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:48 PM, Alexander Graf wrote: How does userspace know they exist? #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that won't work without the hypervisor call anyways. We generally compile on one machine, and run on another. So? Then IRQ unsetting doesn't work. Without this series you won't get much further than booting the kernel anyways because XER is broken, TLB flushes are broken and FPU loading is broken. So not being able to unset an IRQ line is the least of your problems :). There's a difference between an error message telling you to upgrade to a kernel with KVM_CAP_BLAH and a failure. It's the difference between a bug report and silence. I see. So we can check for KVM_CAP_PPC_OSI and know that it's in the same patch series, also making KVM_INTERRUPT_XXX work, right? Or do you really want to have 500 capabilities for every single patch? Having individual capabilities makes backporting a lot easier (otherwise you have to backport the whole thing). If the changes are logically separate, I prefer 500 separate capabilities. However, for a platform bringup, it's okay to have just one capability, assuming none of the changes are applicable to other platforms. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
Avi Kivity wrote: On 03/06/2010 03:53 PM, Stefan Bader wrote: i Avi, we currently try to integrate this patch for an update into a 2.6.32 based system (amongst other kvm updates). But as soon as this patch gets added kvm will die on startup in kvm_leave_lazy_mmu. This has been documented here: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823 I have placed the backports of your patches, which are currently in linux-next and marked for stable here: git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm I have tested the failure with a version that got only the following patches in: KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation KVM: x86 emulator: Check IOPL level during io instruction emulation KVM: x86 emulator: Fix popf emulation KVM: x86 emulator: Check CPL level during privilege instruction emulation and also with a version that takes all stable patches up to the bad one: KVM: VMX: Trap and invalid MWAIT/MONITOR instruction KVM: x86 emulator: Add group8 instruction decoding KVM: x86 emulator: Add group9 instruction decoding KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation But as soon as the fix for memory access gets added, the bug will occur. Would you have an idea what might be causing this? Does the same guest, using the same qemu-kvm, work on kvm.git or upstream? The test was done with a kvm user-space package based on 0.12.3 (which seems to be the current upstream version). I try to do a test on the git version. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
Avi Kivity wrote: On 03/08/2010 03:56 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 03:51 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/05/2010 06:50 PM, Alexander Graf wrote: } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..c7ed3cb 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,12 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { +/* in */ +__u32 cap; Reserve space here. Add a flags field and check it for zeros. Flags? How about something like u64 args[4] That way the capability enabling code could decide what to do with the arguments. We don't always only need flags I suppose?. If you interpret these as bit flags anyway, that would be redundant. I think I just don't understand what you're trying to say with flags. For the OSI enabling we don't need any flags. For later additions we don't know what we'll need. When we have reserved fields which are later used for something new, the kernel needs a way to know if the reserved fields are known or not by userspace. One way to do this is to assume a value of zero means the field is unknown to usespace so ignore it. Another is to require userspace to set a bit in an already-known flags field, and only act on the new field if its bit was set. This has the advantage that the old kernel checks for unknown flags and errors out, improving forwards and backwards compatibility. I thought -cap was already a bit field, so this isn't necessary, but if it isn't, then a flags field is helpful. - cap is the capability number. So you want something like: struct kvm_enable_cap { __u32 cap; __u32 flags; __u64 args[4]; __u8 pad[64]; }; And then check for flags == 0 in the ioctl handler? Flags could later on define if the padding changed to a different position, adding new fields in between args and pad? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
On 03/08/2010 04:10 PM, Stefan Bader wrote: Avi Kivity wrote: On 03/06/2010 03:53 PM, Stefan Bader wrote: i Avi, we currently try to integrate this patch for an update into a 2.6.32 based system (amongst other kvm updates). But as soon as this patch gets added kvm will die on startup in kvm_leave_lazy_mmu. This has been documented here: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823 I have placed the backports of your patches, which are currently in linux-next and marked for stable here: git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm I have tested the failure with a version that got only the following patches in: KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation KVM: x86 emulator: Check IOPL level during io instruction emulation KVM: x86 emulator: Fix popf emulation KVM: x86 emulator: Check CPL level during privilege instruction emulation and also with a version that takes all stable patches up to the bad one: KVM: VMX: Trap and invalid MWAIT/MONITOR instruction KVM: x86 emulator: Add group8 instruction decoding KVM: x86 emulator: Add group9 instruction decoding KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation But as soon as the fix for memory access gets added, the bug will occur. Would you have an idea what might be causing this? Does the same guest, using the same qemu-kvm, work on kvm.git or upstream? The test was done with a kvm user-space package based on 0.12.3 (which seems to be the current upstream version). I try to do a test on the git version. I meant keep the same userspace without change, and try it on a Linus kernel or kvm.git master (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
Avi Kivity wrote: On 03/08/2010 03:53 PM, Alexander Graf wrote: So do it the other way around. Always load the registers (of course, do nothing if already loaded) and then access them in just one way. I assume during emulation the registers will always be loaded? During emulation we're always in VCPU_RUN, so the vcpu is loaded. Do you mean something like: read_register(num) { vcpu_load(); read register from PACA(num); vcpu_put(); } ? Does vcpu_load incur overhead when it doesnt' need to do anything? If the vcpu is always loaded, this would be redundant, no? The situation is that a piece of data is in one of two places. Instead of checking and loading it from either, force it to the place where it normally is, and load it from there. So instead of if (x) y = p1; else y = p2; in a zillion places, just do force_to_p2(); // the common case anyway y = p2; which results in cleaner code. Assuming that you have a common case of course. We're looking at two different ifs here. 1) GPR Inside the PACA or not (volatile vs non-volatile) This is constant. Volatile registers go to the PACA; non-volatiles go to the vcpu struct. 2) GPR actually loaded in the PACA When we're in vcpu_load context the registers in the PACA, when not they're in the vcpu struct If you have a really easy and fast way to assure that we're always inside a vcpu_load context, all is great. I could probably even just put in a BUG_ON(not in vcpu_load context) and make the callers safe. But some check needs to be done. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
On 03/08/2010 04:10 PM, Alexander Graf wrote: When we have reserved fields which are later used for something new, the kernel needs a way to know if the reserved fields are known or not by userspace. One way to do this is to assume a value of zero means the field is unknown to usespace so ignore it. Another is to require userspace to set a bit in an already-known flags field, and only act on the new field if its bit was set. This has the advantage that the old kernel checks for unknown flags and errors out, improving forwards and backwards compatibility. I thought -cap was already a bit field, so this isn't necessary, but if it isn't, then a flags field is helpful. - cap is the capability number. So you want something like: struct kvm_enable_cap { __u32 cap; __u32 flags; __u64 args[4]; __u8 pad[64]; }; And then check for flags == 0 in the ioctl handler? Flags could later on define if the padding changed to a different position, adding new fields in between args and pad? Exactly, we do so in several places. Can be useful if, for example, some new capability comes with a resource count value. What's this thing anyway? like cpuid bits for x86? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
On 03/08/2010 04:14 PM, Alexander Graf wrote: We're looking at two different ifs here. 1) GPR Inside the PACA or not (volatile vs non-volatile) This is constant. Volatile registers go to the PACA; non-volatiles go to the vcpu struct. Okay - so no if (). 2) GPR actually loaded in the PACA When we're in vcpu_load context the registers in the PACA, when not they're in the vcpu struct If you have a really easy and fast way to assure that we're always inside a vcpu_load context, all is great. I could probably even just put in a BUG_ON(not in vcpu_load context) and make the callers safe. But some check needs to be done. x86 assumes in vcpu_load() context (without even a BUG_ON()). KVM_GET_REGS and friends are responsible for this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
Avi Kivity wrote: On 03/08/2010 04:10 PM, Stefan Bader wrote: Avi Kivity wrote: On 03/06/2010 03:53 PM, Stefan Bader wrote: i Avi, we currently try to integrate this patch for an update into a 2.6.32 based system (amongst other kvm updates). But as soon as this patch gets added kvm will die on startup in kvm_leave_lazy_mmu. This has been documented here: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823 I have placed the backports of your patches, which are currently in linux-next and marked for stable here: git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm I have tested the failure with a version that got only the following patches in: KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation KVM: x86 emulator: Check IOPL level during io instruction emulation KVM: x86 emulator: Fix popf emulation KVM: x86 emulator: Check CPL level during privilege instruction emulation and also with a version that takes all stable patches up to the bad one: KVM: VMX: Trap and invalid MWAIT/MONITOR instruction KVM: x86 emulator: Add group8 instruction decoding KVM: x86 emulator: Add group9 instruction decoding KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation But as soon as the fix for memory access gets added, the bug will occur. Would you have an idea what might be causing this? Does the same guest, using the same qemu-kvm, work on kvm.git or upstream? The test was done with a kvm user-space package based on 0.12.3 (which seems to be the current upstream version). I try to do a test on the git version. I meant keep the same userspace without change, and try it on a Linus kernel or kvm.git master (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary). Ok, sorry I misunderstood that. As I see Linus just pulled your patches in, I will get that compiled and tested. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
Avi Kivity wrote: On 03/08/2010 04:10 PM, Alexander Graf wrote: When we have reserved fields which are later used for something new, the kernel needs a way to know if the reserved fields are known or not by userspace. One way to do this is to assume a value of zero means the field is unknown to usespace so ignore it. Another is to require userspace to set a bit in an already-known flags field, and only act on the new field if its bit was set. This has the advantage that the old kernel checks for unknown flags and errors out, improving forwards and backwards compatibility. I thought -cap was already a bit field, so this isn't necessary, but if it isn't, then a flags field is helpful. - cap is the capability number. So you want something like: struct kvm_enable_cap { __u32 cap; __u32 flags; __u64 args[4]; __u8 pad[64]; }; And then check for flags == 0 in the ioctl handler? Flags could later on define if the padding changed to a different position, adding new fields in between args and pad? Exactly, we do so in several places. Can be useful if, for example, some new capability comes with a resource count value. What's this thing anyway? like cpuid bits for x86? What thing? This ioctl or the OSI call? The ioctl is a way to enable a feature on a per-vcpu basis. MOL overlays the syscall interface with a hypercall interface, so a normal OS syscall magically becomes a hypercall when magic constants get passed in r3 and r4. Because for obvious reasons we don't want to enable that when not using MOL, I figured I'd go in and have userspace decide if it wants to get a hypercall exit or not. Qemu couldn't really do anything with it after all. And while at it, I figured let's better make the interface generic. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
Avi Kivity wrote: On 03/08/2010 04:14 PM, Alexander Graf wrote: We're looking at two different ifs here. 1) GPR Inside the PACA or not (volatile vs non-volatile) This is constant. Volatile registers go to the PACA; non-volatiles go to the vcpu struct. Okay - so no if (). Eh. r[0 - 12] are volatile r[13 - 31] are non-volatile So if we want a common gpr access function we need an if. And we need one, because the opcodes just use register numbers and doesn't care where they are. 2) GPR actually loaded in the PACA When we're in vcpu_load context the registers in the PACA, when not they're in the vcpu struct If you have a really easy and fast way to assure that we're always inside a vcpu_load context, all is great. I could probably even just put in a BUG_ON(not in vcpu_load context) and make the callers safe. But some check needs to be done. x86 assumes in vcpu_load() context (without even a BUG_ON()). KVM_GET_REGS and friends are responsible for this. Oh, interesting. Just drop this patch then :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu
On 03/08/2010 04:18 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 04:10 PM, Alexander Graf wrote: When we have reserved fields which are later used for something new, the kernel needs a way to know if the reserved fields are known or not by userspace. One way to do this is to assume a value of zero means the field is unknown to usespace so ignore it. Another is to require userspace to set a bit in an already-known flags field, and only act on the new field if its bit was set. This has the advantage that the old kernel checks for unknown flags and errors out, improving forwards and backwards compatibility. I thought -cap was already a bit field, so this isn't necessary, but if it isn't, then a flags field is helpful. - cap is the capability number. So you want something like: struct kvm_enable_cap { __u32 cap; __u32 flags; __u64 args[4]; __u8 pad[64]; }; And then check for flags == 0 in the ioctl handler? Flags could later on define if the padding changed to a different position, adding new fields in between args and pad? Exactly, we do so in several places. Can be useful if, for example, some new capability comes with a resource count value. What's this thing anyway? like cpuid bits for x86? What thing? This ioctl or the OSI call? The ioctl is a way to enable a feature on a per-vcpu basis. MOL overlays the syscall interface with a hypercall interface, so a normal OS syscall magically becomes a hypercall when magic constants get passed in r3 and r4. Because for obvious reasons we don't want to enable that when not using MOL, I figured I'd go in and have userspace decide if it wants to get a hypercall exit or not. Qemu couldn't really do anything with it after all. And while at it, I figured let's better make the interface generic. That's reasonable. Thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work
On 03/08/2010 04:20 PM, Alexander Graf wrote: Avi Kivity wrote: On 03/08/2010 04:14 PM, Alexander Graf wrote: We're looking at two different ifs here. 1) GPR Inside the PACA or not (volatile vs non-volatile) This is constant. Volatile registers go to the PACA; non-volatiles go to the vcpu struct. Okay - so no if (). Eh. r[0 - 12] are volatile r[13 - 31] are non-volatile So if we want a common gpr access function we need an if. And we need one, because the opcodes just use register numbers and doesn't care where they are. I see - we have something similar on x86 (where vmx keeps rsp/rip in a register and lets us save everything else manually). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivity a...@redhat.com wrote: On 03/08/2010 11:48 AM, Bernhard Schmidt wrote: On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). What does work well mean in this context? Potential dataloss? No, it becomes synchronous (=extra slow). But for this to happen, the user would have had to consciously enter into the situation by creating/using a non block device, non-pre-allocated backing disk AND specify the aio=native option, correct? :-Dustin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 04:25 PM, Dustin Kirkland wrote: On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivitya...@redhat.com wrote: On 03/08/2010 11:48 AM, Bernhard Schmidt wrote: On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). What does work well mean in this context? Potential dataloss? No, it becomes synchronous (=extra slow). But for this to happen, the user would have had to consciously enter into the situation by creating/using a non block device, non-pre-allocated backing disk AND specify the aio=native option, correct? I thought there was some autodetection involved, but perhaps I just imagined it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Guest mmap.c bug
On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote: It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma despite vma is the one with the smaller vm_end where the comparison vma-vm_start vma-vm_end is true (the next vma is null and the prev will have vma-vm_start == prev-vm_end, not ). The bug check looks right, it doesn't seem false positive and this bugcheck indicates that the vma rbtree is memory-corrupted somehow. so yes fiddling with npt on and off sounds a good start, if it's a bug in shadow paging it's unlikely the exact same bug materializes with both npt and without. If the crash happens with npt on and off, then maybe it's not hypervisor related. Could also be bad RAM if it only happens on a single host and all other hosts are fine with same binary guest/host kernels (rbtree walk might stress the memory bus more than other operations). Said that vm_next being null (and if it's null, likely vm_next pointer has no ram bitflip) is a bit weird and not common scenario and this page fault seems triggered with procfs copy_user call which is non standard, so maybe this is a guest bug. It would be interesting to know what is the vm_start address, at the end there are stack, vdso and vsyscall areas. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 08:26 AM, Avi Kivity wrote: On 03/08/2010 04:25 PM, Dustin Kirkland wrote: On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivitya...@redhat.com wrote: On 03/08/2010 11:48 AM, Bernhard Schmidt wrote: On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote: Are there any potential pitfalls? It won't work well unless running on a block device (partition or LVM). What does work well mean in this context? Potential dataloss? No, it becomes synchronous (=extra slow). But for this to happen, the user would have had to consciously enter into the situation by creating/using a non block device, non-pre-allocated backing disk AND specify the aio=native option, correct? I thought there was some autodetection involved, but perhaps I just imagined it. There's no autodetection. linux-aio support in the kernel downgrades to synchronous IO if the underlying storage does not support linux-aio. There is no indication to userspace that this has happened. If this happens, besides having a slow guest, the guest VCPU will be starved during the I/O requests potentially resulting in things like soft lockups and time drift. Generally, speaking, linux-aio will work well under the following circumstances: - cache=off is specified - the underlying file system is XFS or you are using a block device We cannot detect this reliably though so it's really up to the user to decide whether to use it. We're working on improving the linux-aio kernel interface though to eliminate this detectability problem after which, we can enable it in a more automatic fashion. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 06:28 PM, Anthony Liguori wrote: I thought there was some autodetection involved, but perhaps I just imagined it. There's no autodetection. linux-aio support in the kernel downgrades to synchronous IO if the underlying storage does not support linux-aio. There is no indication to userspace that this has happened. If this happens, besides having a slow guest, the guest VCPU will be starved during the I/O requests potentially resulting in things like soft lockups and time drift. Generally, speaking, linux-aio will work well under the following circumstances: - cache=off is specified - the underlying file system is XFS or you are using a block device We cannot detect this reliably though so it's really up to the user to decide whether to use it. We're working on improving the linux-aio kernel interface though to eliminate this detectability problem after which, we can enable it in a more automatic fashion. Well, the common case of cache=none on a block device certainly can be autodetected. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw disks no longer work in latest kvm (kvm-88 was fine)
On 03/07/2010 10:21 AM, Avi Kivity wrote: On 03/07/2010 12:00 PM, Christoph Hellwig wrote: I can only guess that the info collected so far is not sufficient to understand what's going on: except of I/O error writing block NNN we does not have anything at all. So it's impossible to know where the problem is. Actually it is, and the bug has been fixed long ago in: commit e2a305fb13ff0f5cf6ff80aaa90a5ed5954c Author: Christoph Hellwigh...@lst.de Date: Tue Jan 26 14:49:08 2010 +0100 block: avoid creating too large iovecs in multiwrite_merge I've asked for it be added to the -stable series but that hasn't happened so far. Anthony, this looks critical. It's in stable now. Sounds like a good time to do a 0.12.4. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Exposing boot and reboot timeouts in config files
Some guests may take longer to boot/reboot in some hosts, so let's expose the boot and reboot timeouts in the tests config file. Also, print the timeouts on the debug messages. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_test_utils.py | 13 +++-- client/tests/kvm/tests/boot.py | 10 ++ client/tests/kvm/tests_base.cfg.sample |2 ++ 3 files changed, 15 insertions(+), 10 deletions(-) diff --git a/client/tests/kvm/kvm_test_utils.py b/client/tests/kvm/kvm_test_utils.py index 7d96d6e..564ff35 100644 --- a/client/tests/kvm/kvm_test_utils.py +++ b/client/tests/kvm/kvm_test_utils.py @@ -53,7 +53,7 @@ def wait_for_login(vm, nic_index=0, timeout=240, start=0, step=2): @param timeout: Time to wait before giving up. @return: A shell session object. -logging.info(Trying to log into guest '%s'... % vm.name) +logging.info(Trying to log into guest '%s', timeout %ds, vm.name, timeout) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), timeout, start, step) if not session: @@ -80,16 +80,16 @@ def reboot(vm, session, method=shell, sleep_before_reset=10, nic_index=0, if method == shell: # Send a reboot command to the guest's shell session.sendline(vm.get_params().get(reboot_command)) -logging.info(Reboot command sent; waiting for guest to go down...) +logging.info(Reboot command sent. Waiting for guest to go down) elif method == system_reset: # Sleep for a while before sending the command time.sleep(sleep_before_reset) # Send a system_reset monitor command vm.send_monitor_cmd(system_reset) -logging.info(system_reset monitor command sent; waiting for guest to - go down...) +logging.info(Monitor command system_reset sent. Waiting for guest to + go down) else: -logging.error(Unknown reboot method: %s % method) +logging.error(Unknown reboot method: %s, method) # Wait for the session to become unresponsive and close it if not kvm_utils.wait_for(lambda: not session.is_responsive(timeout=30), @@ -98,7 +98,8 @@ def reboot(vm, session, method=shell, sleep_before_reset=10, nic_index=0, session.close() # Try logging into the guest until timeout expires -logging.info(Guest is down; waiting for it to go up again...) +logging.info(Guest is down. Waiting for it to go up again, timeout %ds, + timeout) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), timeout, 0, 2) if not session: diff --git a/client/tests/kvm/tests/boot.py b/client/tests/kvm/tests/boot.py index cd1f1d4..9b3f392 100644 --- a/client/tests/kvm/tests/boot.py +++ b/client/tests/kvm/tests/boot.py @@ -16,7 +16,9 @@ def run_boot(test, params, env): @param env: Dictionary with test environment. vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) -session = kvm_test_utils.wait_for_login(vm) +session = kvm_test_utils.wait_for_login(vm, 0, + float(params.get(boot_timeout, 240)), + 0, 2) try: if not params.get(reboot_method): @@ -24,9 +26,9 @@ def run_boot(test, params, env): # Reboot the VM session = kvm_test_utils.reboot(vm, session, -params.get(reboot_method), -float(params.get(sleep_before_reset, - 10))) +params.get(reboot_method), +float(params.get(sleep_before_reset, 10)), +0, float(params.get(reboot_timeout, 240))) finally: session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 040d0c3..340b0c0 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -75,11 +75,13 @@ variants: type = boot restart_vm = yes kill_vm_on_error = yes +boot_timeout = 240 - reboot: install setup unattended_install type = boot reboot_method = shell kill_vm_on_error = yes +reboot_timeout = 240 - migrate: install setup unattended_install type = migration -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM test: Exposing boot and reboot timeouts in config files
On Mon, Mar 8, 2010 at 10:58 PM, Lucas Meneghel Rodrigues l...@redhat.com wrote: Some guests may take longer to boot/reboot in some hosts, so let's expose the boot and reboot timeouts in the tests config file. Also, print the timeouts on the debug Fine. It seems we missed it during the major development cycle. We faced this situation when we were having kvm_autotest git. The patch that I sent was merged. This patch makes perfect sense for cases like stress boot, slow machines, highly loaded machines etc. messages. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_test_utils.py | 13 +++-- client/tests/kvm/tests/boot.py | 10 ++ client/tests/kvm/tests_base.cfg.sample | 2 ++ 3 files changed, 15 insertions(+), 10 deletions(-) diff --git a/client/tests/kvm/kvm_test_utils.py b/client/tests/kvm/kvm_test_utils.py index 7d96d6e..564ff35 100644 --- a/client/tests/kvm/kvm_test_utils.py +++ b/client/tests/kvm/kvm_test_utils.py @@ -53,7 +53,7 @@ def wait_for_login(vm, nic_index=0, timeout=240, start=0, step=2): @param timeout: Time to wait before giving up. @return: A shell session object. - logging.info(Trying to log into guest '%s'... % vm.name) + logging.info(Trying to log into guest '%s', timeout %ds, vm.name, timeout) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), timeout, start, step) if not session: @@ -80,16 +80,16 @@ def reboot(vm, session, method=shell, sleep_before_reset=10, nic_index=0, if method == shell: # Send a reboot command to the guest's shell session.sendline(vm.get_params().get(reboot_command)) - logging.info(Reboot command sent; waiting for guest to go down...) + logging.info(Reboot command sent. Waiting for guest to go down) elif method == system_reset: # Sleep for a while before sending the command time.sleep(sleep_before_reset) # Send a system_reset monitor command vm.send_monitor_cmd(system_reset) - logging.info(system_reset monitor command sent; waiting for guest to - go down...) + logging.info(Monitor command system_reset sent. Waiting for guest to + go down) else: - logging.error(Unknown reboot method: %s % method) + logging.error(Unknown reboot method: %s, method) # Wait for the session to become unresponsive and close it if not kvm_utils.wait_for(lambda: not session.is_responsive(timeout=30), @@ -98,7 +98,8 @@ def reboot(vm, session, method=shell, sleep_before_reset=10, nic_index=0, session.close() # Try logging into the guest until timeout expires - logging.info(Guest is down; waiting for it to go up again...) + logging.info(Guest is down. Waiting for it to go up again, timeout %ds, + timeout) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), timeout, 0, 2) if not session: diff --git a/client/tests/kvm/tests/boot.py b/client/tests/kvm/tests/boot.py index cd1f1d4..9b3f392 100644 --- a/client/tests/kvm/tests/boot.py +++ b/client/tests/kvm/tests/boot.py @@ -16,7 +16,9 @@ def run_boot(test, params, env): @param env: Dictionary with test environment. vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) - session = kvm_test_utils.wait_for_login(vm) + session = kvm_test_utils.wait_for_login(vm, 0, + float(params.get(boot_timeout, 240)), + 0, 2) try: if not params.get(reboot_method): @@ -24,9 +26,9 @@ def run_boot(test, params, env): # Reboot the VM session = kvm_test_utils.reboot(vm, session, - params.get(reboot_method), - float(params.get(sleep_before_reset, - 10))) + params.get(reboot_method), + float(params.get(sleep_before_reset, 10)), + 0, float(params.get(reboot_timeout, 240))) finally: session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 040d0c3..340b0c0 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -75,11 +75,13 @@ variants: type = boot restart_vm = yes kill_vm_on_error = yes + boot_timeout = 240 - reboot: install setup unattended_install type = boot reboot_method = shell kill_vm_on_error = yes + reboot_timeout = 240 - migrate: install setup unattended_install type = migration --
Re: [PATCH] Inter-VM shared memory PCI device
On Mon, Mar 8, 2010 at 2:56 AM, Avi Kivity a...@redhat.com wrote: On 03/06/2010 01:52 AM, Cam Macdonell wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the Doorbell register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are supported between multiple VMs by using a shared memory server -ivshmemsize in MB,[unix:path][file] Interrupts can also be used between host and guest as well by implementing a listener on the host that talks to shared memory server. The shared memory server passes file descriptors for the shared memory object and eventfds (our interrupt mechanism) to the respective qemu instances. Can you provide a spec that describes the device? This would be useful for maintaining the code, writing guest drivers, and as a framework for review. I'm not sure if you want the Qemu command-line part as part of the spec here, but I've included for completeness. Device Specification for Inter-VM shared memory device --- Qemu Command-line --- The command-line for inter-vm shared memory is as follows -ivshmem size,[unix:]name the size argument specifies the size of the shared memory object. The second option specifies either a unix domain socket (when using the unix: prefix) or a name for the shared memory object. If a unix domain socket is specified, the guest will receive the shared object from the shared memory server listening on that socket and will support interrupts with the other guests using that server. Each server only serves one memory object. If a name is specified on the command line (without 'unix:'), then the guest will open the POSIX shared memory object with that name (in /dev/shm) and the specified size. The guest will NOT support interrupts but the shared memory object can be shared between multiple guests. The Inter-VM Shared Memory PCI device --- BARs The device supports two BARs. BAR0 is a 256-byte MMIO region to support registers and BAR1 is used to map the shared memory object from the host. The size of BAR1 is specified on the command-line and must be a power of 2 in size. Registers BAR0 currently supports 5 registers of 16-bits each. Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). When using interrupts, VMs communicate with a shared memory server that passes the shared memory object file descriptor using SCM_RIGHTS. The server assigns each VM an ID number and sends this ID number to the Qemu process along with a series of eventfd file descriptors, one per guest using the shared memory server. These eventfds will be used to send interrupts between guests. Each guest listens on the eventfd corresponding to their ID and may use the others for sending interrupts to other guests. enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. Cheers, Cam -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/15] KVM: PPC: Make DSISR 32 bits wide
DSISR is only defined as 32 bits wide. So let's reflect that in the structs too. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |2 +- arch/powerpc/include/asm/kvm_host.h |2 +- arch/powerpc/kvm/book3s_64_interrupts.S |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 14d0262..9f5a992 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -84,8 +84,8 @@ struct kvmppc_vcpu_book3s { u64 hid[6]; u64 gqr[8]; int slb_nr; + u32 dsisr; u64 sdr1; - u64 dsisr; u64 hior; u64 msr_mask; u64 vsid_first; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 119deb4..0ebda67 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -260,7 +260,7 @@ struct kvm_vcpu_arch { u32 last_inst; #ifdef CONFIG_PPC64 - ulong fault_dsisr; + u32 fault_dsisr; #endif ulong fault_dear; ulong fault_esr; diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S index c1584d0..faca876 100644 --- a/arch/powerpc/kvm/book3s_64_interrupts.S +++ b/arch/powerpc/kvm/book3s_64_interrupts.S @@ -171,7 +171,7 @@ kvmppc_handler_highmem: std r3, VCPU_PC(r7) std r4, VCPU_SHADOW_SRR1(r7) std r5, VCPU_FAULT_DEAR(r7) - std r6, VCPU_FAULT_DSISR(r7) + stw r6, VCPU_FAULT_DSISR(r7) ld r5, VCPU_HFLAGS(r7) rldicl. r5, r5, 0, 63 /* CR = ((r5 1) == 0) */ -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/15] KVM: PPC: Split instruction reading out
The current check_ext function reads the instruction and then does the checking. Let's split the reading out so we can reuse it for different functions. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c | 24 1 files changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 9e0bc47..400ae0a 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -650,26 +650,34 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) kvmppc_recalc_shadow_msr(vcpu); } -static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr) +static int kvmppc_read_inst(struct kvm_vcpu *vcpu) { ulong srr0 = vcpu-arch.pc; int ret; - /* Need to do paired single emulation? */ - if (!(vcpu-arch.hflags BOOK3S_HFLAG_PAIRED_SINGLE)) - return EMULATE_DONE; - - /* Read out the instruction */ ret = kvmppc_ld(vcpu, srr0, sizeof(u32), vcpu-arch.last_inst, false); if (ret == -ENOENT) { vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 33, 33, 1); vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 34, 36, 0); vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 42, 47, 0); kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); - } else if(ret == EMULATE_DONE) { + return EMULATE_AGAIN; + } + + return EMULATE_DONE; +} + +static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr) +{ + + /* Need to do paired single emulation? */ + if (!(vcpu-arch.hflags BOOK3S_HFLAG_PAIRED_SINGLE)) + return EMULATE_DONE; + + /* Read out the instruction */ + if (kvmppc_read_inst(vcpu) == EMULATE_DONE) /* Need to emulate */ return EMULATE_FAIL; - } return EMULATE_AGAIN; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/15] KVM: PPC: MOL bringup patches
Mac-on-Linux has always lacked PPC64 host support. This is going to change now! This patchset contains minor patches to enable MOL, but is mostly about bug fixes that came out of running Mac OS X. With this set and a pretty small patch to MOL I have 10.4.11 running as a guest on a 970MP host. I'll send the MOl patches to the respective ML in the next days. v1 - v2: - Add documentation for EXIT_OSI and ENABLE_CAP - Add flags to enable_cap - Add build fix for !CONFIG_VSX - Remove in-paca register check Alexander Graf (15): KVM: PPC: Ensure split mode works KVM: PPC: Allow userspace to unset the IRQ line KVM: PPC: Make DSISR 32 bits wide KVM: PPC: Book3S_32 guest MMU fixes KVM: PPC: Split instruction reading out KVM: PPC: Don't reload FPU with invalid values KVM: PPC: Load VCPU for register fetching KVM: PPC: Implement mfsr emulation KVM: PPC: Implement BAT reads KVM: PPC: Make XER load 32 bit KVM: PPC: Implement emulation for lbzux and lhax KVM: PPC: Implement alignment interrupt KVM: Add support for enabling capabilities per-vcpu KVM: PPC: Add OSI hypercall interface KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC Documentation/kvm/api.txt | 28 +++ arch/powerpc/include/asm/kvm.h |3 + arch/powerpc/include/asm/kvm_book3s.h | 18 +++- arch/powerpc/include/asm/kvm_host.h |4 +- arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/book3s.c | 130 ++- arch/powerpc/kvm/book3s_32_mmu.c| 30 ++-- arch/powerpc/kvm/book3s_64_emulate.c| 88 + arch/powerpc/kvm/book3s_64_interrupts.S |2 +- arch/powerpc/kvm/book3s_64_slb.S|2 +- arch/powerpc/kvm/emulate.c | 20 + arch/powerpc/kvm/powerpc.c | 43 ++- include/linux/kvm.h | 17 13 files changed, 335 insertions(+), 52 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/15] KVM: PPC: Ensure split mode works
On PowerPC we can go into MMU Split Mode. That means that either data relocation is on but instruction relocation is off or vice versa. That mode didn't work properly, as we weren't always flushing entries when going into a new split mode, potentially mapping different code or data that we're supposed to. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |9 +++--- arch/powerpc/kvm/book3s.c | 46 +--- 2 files changed, 29 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index e6ea974..14d0262 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -99,10 +99,11 @@ struct kvmppc_vcpu_book3s { #define CONTEXT_GUEST 1 #define CONTEXT_GUEST_END 2 -#define VSID_REAL 0xfff0 -#define VSID_REAL_DR 0xffe0 -#define VSID_REAL_IR 0xffd0 -#define VSID_BAT 0xffc0 +#define VSID_REAL_DR 0x7ff0 +#define VSID_REAL_IR 0x7fe0 +#define VSID_SPLIT_MASK0x7fe0 +#define VSID_REAL 0x7fc0 +#define VSID_BAT 0x7fb0 #define VSID_PR0x8000 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 94c229d..c2ffb91 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -133,6 +133,14 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) if (((vcpu-arch.msr (MSR_IR|MSR_DR)) != (old_msr (MSR_IR|MSR_DR))) || (vcpu-arch.msr MSR_PR) != (old_msr MSR_PR)) { + bool dr = (vcpu-arch.msr MSR_DR) ? true : false; + bool ir = (vcpu-arch.msr MSR_IR) ? true : false; + + /* Flush split mode PTEs */ + if (dr != ir) + kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK, + VSID_SPLIT_MASK); + kvmppc_mmu_flush_segments(vcpu); kvmppc_mmu_map_segment(vcpu, vcpu-arch.pc); } @@ -395,15 +403,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, } else { pte-eaddr = eaddr; pte-raddr = eaddr 0x; - pte-vpage = eaddr 12; - switch (vcpu-arch.msr (MSR_DR|MSR_IR)) { - case 0: - pte-vpage |= VSID_REAL; - case MSR_DR: - pte-vpage |= VSID_REAL_DR; - case MSR_IR: - pte-vpage |= VSID_REAL_IR; - } + pte-vpage = VSID_REAL | eaddr 12; pte-may_read = true; pte-may_write = true; pte-may_execute = true; @@ -512,12 +512,10 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, int page_found = 0; struct kvmppc_pte pte; bool is_mmio = false; + bool dr = (vcpu-arch.msr MSR_DR) ? true : false; + bool ir = (vcpu-arch.msr MSR_IR) ? true : false; - if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) { - relocated = (vcpu-arch.msr MSR_DR); - } else { - relocated = (vcpu-arch.msr MSR_IR); - } + relocated = data ? dr : ir; /* Resolve real address if translation turned on */ if (relocated) { @@ -529,14 +527,18 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, pte.raddr = eaddr 0x; pte.eaddr = eaddr; pte.vpage = eaddr 12; - switch (vcpu-arch.msr (MSR_DR|MSR_IR)) { - case 0: - pte.vpage |= VSID_REAL; - case MSR_DR: - pte.vpage |= VSID_REAL_DR; - case MSR_IR: - pte.vpage |= VSID_REAL_IR; - } + } + + switch (vcpu-arch.msr (MSR_DR|MSR_IR)) { + case 0: + pte.vpage |= VSID_REAL; + break; + case MSR_DR: + pte.vpage |= VSID_REAL_DR; + break; + case MSR_IR: + pte.vpage |= VSID_REAL_IR; + break; } if (vcpu-arch.mmu.is_dcbz32(vcpu) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/15] KVM: PPC: Implement mfsr emulation
We emulate the mfsrin instruction already, that passes the SR number in a register value. But we lacked support for mfsr that encoded the SR number in the opcode. So let's implement it. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_64_emulate.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c index c989214..8d7a78d 100644 --- a/arch/powerpc/kvm/book3s_64_emulate.c +++ b/arch/powerpc/kvm/book3s_64_emulate.c @@ -35,6 +35,7 @@ #define OP_31_XOP_SLBMTE 402 #define OP_31_XOP_SLBIE434 #define OP_31_XOP_SLBIA498 +#define OP_31_XOP_MFSR 595 #define OP_31_XOP_MFSRIN 659 #define OP_31_XOP_SLBMFEV 851 #define OP_31_XOP_EIOIO854 @@ -90,6 +91,18 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, case OP_31_XOP_MTMSR: kvmppc_set_msr(vcpu, kvmppc_get_gpr(vcpu, get_rs(inst))); break; + case OP_31_XOP_MFSR: + { + int srnum; + + srnum = kvmppc_get_field(inst, 12 + 32, 15 + 32); + if (vcpu-arch.mmu.mfsrin) { + u32 sr; + sr = vcpu-arch.mmu.mfsrin(vcpu, srnum); + kvmppc_set_gpr(vcpu, get_rt(inst), sr); + } + break; + } case OP_31_XOP_MFSRIN: { int srnum; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/15] KVM: PPC: Load VCPU for register fetching
When trying to read or store vcpu register data, we should also make sure the vcpu is actually loaded, so we're 100% sure we get the correct values. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 029e1be..585dc91 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -955,6 +955,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; + vcpu_load(vcpu); + regs-pc = vcpu-arch.pc; regs-cr = kvmppc_get_cr(vcpu); regs-ctr = vcpu-arch.ctr; @@ -975,6 +977,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i ARRAY_SIZE(regs-gpr); i++) regs-gpr[i] = kvmppc_get_gpr(vcpu, i); + vcpu_put(vcpu); + return 0; } @@ -982,6 +986,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; + vcpu_load(vcpu); + vcpu-arch.pc = regs-pc; kvmppc_set_cr(vcpu, regs-cr); vcpu-arch.ctr = regs-ctr; @@ -1001,6 +1007,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i ARRAY_SIZE(regs-gpr); i++) kvmppc_set_gpr(vcpu, i, regs-gpr[i]); + vcpu_put(vcpu); + return 0; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/15] KVM: PPC: Make XER load 32 bit
We have a 32 bit value in the PACA to store XER in. We also do an stw when storing XER in there. But then we load it with ld, completely screwing it up on every entry. Welcome to the Big Endian world. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_64_slb.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S index 35b7627..0919679 100644 --- a/arch/powerpc/kvm/book3s_64_slb.S +++ b/arch/powerpc/kvm/book3s_64_slb.S @@ -145,7 +145,7 @@ slb_do_enter: lwz r11, (PACA_KVM_CR)(r13) mtcrr11 - ld r11, (PACA_KVM_XER)(r13) + lwz r11, (PACA_KVM_XER)(r13) mtxer r11 ld r11, (PACA_KVM_R11)(r13) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/15] KVM: PPC: Don't reload FPU with invalid values
When the guest activates the FPU, we load it up. That's fine when it wasn't activated before on the host, but if it was we end up reloading FPU values from last time the FPU was deactivated on the host without writing the proper values back to the vcpu struct. This patch checks if the FPU is enabled already and if so just doesn't bother activating it, making FPU operations survive guest context switches. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 400ae0a..029e1be 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -701,6 +701,11 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr, return RESUME_GUEST; } + /* We already own the ext */ + if (vcpu-arch.guest_owned_ext msr) { + return RESUME_GUEST; + } + #ifdef DEBUG_EXT printk(KERN_INFO Loading up ext 0x%lx\n, msr); #endif -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/15] KVM: PPC: Book3S_32 guest MMU fixes
This patch makes the VSID of mapped pages always reflecting all special cases we have, like split mode. It also changes the tlbie mask to 0x0000 according to the spec. The mask we used before was incorrect. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |1 + arch/powerpc/kvm/book3s_32_mmu.c | 30 +++--- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 9f5a992..b47b2f5 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -44,6 +44,7 @@ struct kvmppc_sr { bool Ks; bool Kp; bool nx; + bool valid; }; struct kvmppc_bat { diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 1483a9b..7071e22 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -57,6 +57,8 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu) static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data); +static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid, +u64 *vsid); static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr) { @@ -66,13 +68,14 @@ static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t e static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr, bool data) { - struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr); + u64 vsid; struct kvmppc_pte pte; if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data)) return pte.vpage; - return (((u64)eaddr 12) 0x) | (((u64)sre-vsid) 16); + kvmppc_mmu_book3s_32_esid_to_vsid(vcpu, eaddr SID_SHIFT, vsid); + return (((u64)eaddr 12) 0x) | (vsid 16); } static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu) @@ -142,8 +145,13 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, bat-bepi_mask); } if ((eaddr bat-bepi_mask) == bat-bepi) { + u64 vsid; + kvmppc_mmu_book3s_32_esid_to_vsid(vcpu, + eaddr SID_SHIFT, vsid); + vsid = 16; + pte-vpage = (((u64)eaddr 12) 0x) | vsid; + pte-raddr = bat-brpn | (eaddr ~bat-bepi_mask); - pte-vpage = (eaddr 12) | VSID_BAT; pte-may_read = bat-pp; pte-may_write = bat-pp 1; pte-may_execute = true; @@ -302,6 +310,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, /* And then put in the new SR */ sre-raw = value; sre-vsid = (value 0x0fff); + sre-valid = (value 0x8000) ? false : true; sre-Ks = (value 0x4000) ? true : false; sre-Kp = (value 0x2000) ? true : false; sre-nx = (value 0x1000) ? true : false; @@ -312,7 +321,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large) { - kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL); + kvmppc_mmu_pte_flush(vcpu, ea, 0x0000); } static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid, @@ -333,15 +342,22 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid, break; case MSR_DR|MSR_IR: { - ulong ea; - ea = esid SID_SHIFT; - *vsid = find_sr(to_book3s(vcpu), ea)-vsid; + ulong ea = esid SID_SHIFT; + struct kvmppc_sr *sr = find_sr(to_book3s(vcpu), ea); + + if (!sr-valid) + return -1; + + *vsid = sr-vsid; break; } default: BUG(); } + if (vcpu-arch.msr MSR_PR) + *vsid |= VSID_PR; + return 0; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/15] KVM: PPC: Implement emulation for lbzux and lhax
We get MMIOs with the weirdest instructions. But every time we do, we need to improve our emulator to implement them. So let's do that - this time it's lbzux and lhax's round. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/emulate.c | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 2410ec2..dbb5d68 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -38,10 +38,12 @@ #define OP_31_XOP_LBZX 87 #define OP_31_XOP_STWX 151 #define OP_31_XOP_STBX 215 +#define OP_31_XOP_LBZUX 119 #define OP_31_XOP_STBUX 247 #define OP_31_XOP_LHZX 279 #define OP_31_XOP_LHZUX 311 #define OP_31_XOP_MFSPR 339 +#define OP_31_XOP_LHAX 343 #define OP_31_XOP_STHX 407 #define OP_31_XOP_STHUX 439 #define OP_31_XOP_MTSPR 467 @@ -173,6 +175,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); break; + case OP_31_XOP_LBZUX: + rt = get_rt(inst); + ra = get_ra(inst); + rb = get_rb(inst); + + ea = kvmppc_get_gpr(vcpu, rb); + if (ra) + ea += kvmppc_get_gpr(vcpu, ra); + + emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); + kvmppc_set_gpr(vcpu, ra, ea); + break; + case OP_31_XOP_STWX: rs = get_rs(inst); emulated = kvmppc_handle_store(run, vcpu, @@ -202,6 +217,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, rs, ea); break; + case OP_31_XOP_LHAX: + rt = get_rt(inst); + emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1); + break; + case OP_31_XOP_LHZX: rt = get_rt(inst); emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line
Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm.h |3 +++ arch/powerpc/include/asm/kvm_ppc.h |2 ++ arch/powerpc/kvm/book3s.c |6 ++ arch/powerpc/kvm/powerpc.c |5 - 4 files changed, 15 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h index 19bae31..6c5547d 100644 --- a/arch/powerpc/include/asm/kvm.h +++ b/arch/powerpc/include/asm/kvm.h @@ -84,4 +84,7 @@ struct kvm_guest_debug_arch { #define KVM_REG_QPR0x0040 #define KVM_REG_FQPR 0x0060 +#define KVM_INTERRUPT_SET -1U +#define KVM_INTERRUPT_UNSET-2U + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c7fcdd7..6a2464e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -92,6 +92,8 @@ extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq); +extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu, + struct kvm_interrupt *irq); extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int op, int *advance); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c2ffb91..9e0bc47 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -230,6 +230,12 @@ void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL); } +void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu, + struct kvm_interrupt *irq) +{ + kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL); +} + int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) { int deliver = 1; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 5a8eb95..a28a512 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -449,7 +449,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) { - kvmppc_core_queue_external(vcpu, irq); + if (irq-irq == KVM_INTERRUPT_UNSET) + kvmppc_core_dequeue_external(vcpu, irq); + else + kvmppc_core_queue_external(vcpu, irq); if (waitqueue_active(vcpu-wq)) { wake_up_interruptible(vcpu-wq); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/15] KVM: PPC: Implement BAT reads
BATs can't only be written to, you can also read them out! So let's implement emulation for reading BAT values again. While at it, I also made BAT setting flush the segment cache, so we're absolutely sure there's no MMU state left when writing BATs. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_64_emulate.c | 35 ++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c index 8d7a78d..39d5003 100644 --- a/arch/powerpc/kvm/book3s_64_emulate.c +++ b/arch/powerpc/kvm/book3s_64_emulate.c @@ -239,6 +239,34 @@ void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, bool upper, } } +static u32 kvmppc_read_bat(struct kvm_vcpu *vcpu, int sprn) +{ + struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + struct kvmppc_bat *bat; + + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = vcpu_book3s-ibat[4 + ((sprn - SPRN_IBAT4U) / 2)]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = vcpu_book3s-dbat[4 + ((sprn - SPRN_DBAT4U) / 2)]; + break; + default: + BUG(); + } + + if (sprn % 2) + return bat-raw 32; + else + return bat-raw; +} + static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u32 val) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); @@ -290,6 +318,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) /* BAT writes happen so rarely that we're ok to flush * everything here */ kvmppc_mmu_pte_flush(vcpu, 0, 0); + kvmppc_mmu_flush_segments(vcpu); break; case SPRN_HID0: to_book3s(vcpu)-hid[0] = spr_val; @@ -373,6 +402,12 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) int emulated = EMULATE_DONE; switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + case SPRN_IBAT4U ... SPRN_IBAT7L: + case SPRN_DBAT0U ... SPRN_DBAT3L: + case SPRN_DBAT4U ... SPRN_DBAT7L: + kvmppc_set_gpr(vcpu, rt, kvmppc_read_bat(vcpu, sprn)); + break; case SPRN_SDR1: kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1); break; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/15] KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC
The FPU/Altivec/VSX enablement also brought access to some structure elements that are only defined when the respective config options are enabled. Unfortuately I forgot to check for the config options at some places, so let's do that now. Unbreaks the build when CONFIG_VSX is not set. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index e752a59..00e9684 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -608,7 +608,9 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) { struct thread_struct *t = current-thread; u64 *vcpu_fpr = vcpu-arch.fpr; +#ifdef CONFIG_VSX u64 *vcpu_vsx = vcpu-arch.vsr; +#endif u64 *thread_fpr = (u64*)t-fpr; int i; @@ -688,7 +690,9 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr, { struct thread_struct *t = current-thread; u64 *vcpu_fpr = vcpu-arch.fpr; +#ifdef CONFIG_VSX u64 *vcpu_vsx = vcpu-arch.vsr; +#endif u64 *thread_fpr = (u64*)t-fpr; int i; @@ -1218,8 +1222,12 @@ int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret; struct thread_struct ext_bkp; +#ifdef CONFIG_ALTIVEC bool save_vec = current-thread.used_vr; +#endif +#ifdef CONFIG_VSX bool save_vsx = current-thread.used_vsr; +#endif ulong ext_msr; /* No need to go into the guest when all we do is going out */ -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu
Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Add flags to enable_cap - Update documentation for kvm_enable_cap --- Documentation/kvm/api.txt | 15 +++ arch/powerpc/kvm/powerpc.c | 26 ++ include/linux/kvm.h| 11 +++ 3 files changed, 52 insertions(+), 0 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index d170cb4..6a19ab6 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -749,6 +749,21 @@ Writes debug registers into the vcpu. See KVM_GET_DEBUGREGS for the data structure. The flags field is unused yet and must be cleared on entry. +4.34 KVM_ENABLE_CAP + +Capability: basic +Architectures: all +Type: vcpu ioctl +Parameters: struct kvm_enable_cap (in) +Returns: 0 on success; -1 on error + +Not all extensions are enabled by default. Using this ioctl the application +can enable an extension, making it available to the guest. + +On systems that do not support this ioctl, it always fails. On systems that +do support it, it only works for extensions that are supported for enablement. +As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI. + 5. The kvm_run structure diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index a28a512..8bd8204 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -462,6 +462,23 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) return 0; } +static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, +struct kvm_enable_cap *cap) +{ + int r; + + if (cap-flags) + return -EINVAL; + + switch (cap-cap) { + default: + r = -EINVAL; + break; + } + + return r; +} + int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { @@ -490,6 +507,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = kvm_vcpu_ioctl_interrupt(vcpu, irq); break; } + case KVM_ENABLE_CAP: + { + struct kvm_enable_cap cap; + r = -EFAULT; + if (copy_from_user(cap, argp, sizeof(cap))) + goto out; + r = kvm_vcpu_ioctl_enable_cap(vcpu, cap); + break; + } default: r = -EINVAL; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ce28767..a18ac92 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -400,6 +400,15 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_ENABLE_CAP */ +struct kvm_enable_cap { + /* in */ + __u32 cap; + __u32 flags; + __u64 args[4]; + __u8 pad[64]; +}; + #define KVMIO 0xAE /* @@ -696,6 +705,8 @@ struct kvm_clock_data { /* Available with KVM_CAP_DEBUGREGS */ #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) +/* No need for CAP, because then it just always fails */ +#define KVM_ENABLE_CAP_IOW(KVMIO, 0xa3, struct kvm_enable_cap) #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/15] KVM: PPC: Add OSI hypercall interface
MOL uses its own hypercall interface to call back into userspace when the guest wants to do something. So let's implement that as an exit reason, specify it with a CAP and only really use it when userspace wants us to. The only user of it so far is MOL. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - Add documentation for OSI exit struct --- Documentation/kvm/api.txt | 13 + arch/powerpc/include/asm/kvm_book3s.h |5 + arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/kvm/book3s.c | 24 ++-- arch/powerpc/kvm/powerpc.c| 12 include/linux/kvm.h |6 ++ 6 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 6a19ab6..b2129e8 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -932,6 +932,19 @@ s390 specific. powerpc specific. + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + /* Fix the size of the union. */ char padding[256]; }; diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 1a169f3..54929cd 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -147,6 +147,11 @@ static inline ulong dsisr(void) extern void kvm_return_point(void); +/* Magic register values loaded into r3 and r4 before the 'sc' assembly + * instruction for the OSI hypercalls */ +#define OSI_SC_MAGIC_R30x113724FA +#define OSI_SC_MAGIC_R40x77810F9B + #define INS_DCBZ 0x7c0007ec #endif /* __ASM_KVM_BOOK3S_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0ebda67..486f1ca 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -273,6 +273,8 @@ struct kvm_vcpu_arch { u8 mmio_sign_extend; u8 dcr_needed; u8 dcr_is_write; + u8 osi_needed; + u8 osi_enabled; u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 6b8b5ed..e752a59 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -871,12 +871,24 @@ program_interrupt: break; } case BOOK3S_INTERRUPT_SYSCALL: -#ifdef EXIT_DEBUG - printk(KERN_INFO Syscall Nr %d\n, (int)kvmppc_get_gpr(vcpu, 0)); -#endif - vcpu-stat.syscall_exits++; - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - r = RESUME_GUEST; + // XXX make user settable + if (vcpu-arch.osi_enabled + (((u32)kvmppc_get_gpr(vcpu, 3)) == OSI_SC_MAGIC_R3) + (((u32)kvmppc_get_gpr(vcpu, 4)) == OSI_SC_MAGIC_R4)) { + u64 *gprs = run-osi.gprs; + int i; + + run-exit_reason = KVM_EXIT_OSI; + for (i = 0; i 32; i++) + gprs[i] = kvmppc_get_gpr(vcpu, i); + vcpu-arch.osi_needed = 1; + r = RESUME_HOST_NV; + + } else { + vcpu-stat.syscall_exits++; + kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + r = RESUME_GUEST; + } break; case BOOK3S_INTERRUPT_FP_UNAVAIL: case BOOK3S_INTERRUPT_ALTIVEC: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8bd8204..035bad4 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -148,6 +148,7 @@ int kvm_dev_ioctl_check_extension(long ext) switch (ext) { case KVM_CAP_PPC_SEGSTATE: case KVM_CAP_PPC_PAIRED_SINGLES: + case KVM_CAP_PPC_OSI: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -429,6 +430,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) if (!vcpu-arch.dcr_is_write) kvmppc_complete_dcr_load(vcpu, run); vcpu-arch.dcr_needed = 0; + } else if (vcpu-arch.osi_needed) { + u64 *gprs = run-osi.gprs; + int i; + + for (i = 0; i 32; i++) + kvmppc_set_gpr(vcpu, i, gprs[i]); +
[PATCH 12/15] KVM: PPC: Implement alignment interrupt
Mac OS X has some applications - namely the Finder - that require alignment interrupts to work properly. So we need to implement them. But the spec for 970 and 750 also looks different. While 750 requires the DSISR fields to reflect some instruction bits, the 970 declares this as an optional feature. So we need to reconstruct DSISR manually. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |1 + arch/powerpc/kvm/book3s.c |9 +++ arch/powerpc/kvm/book3s_64_emulate.c | 40 + 3 files changed, 50 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b47b2f5..1a169f3 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -131,6 +131,7 @@ extern void kvmppc_rmcall(ulong srr0, ulong srr1); extern void kvmppc_load_up_fpu(void); extern void kvmppc_load_up_altivec(void); extern void kvmppc_load_up_vsx(void); +extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst); static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 585dc91..6b8b5ed 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -905,6 +905,15 @@ program_interrupt: } break; } + case BOOK3S_INTERRUPT_ALIGNMENT: + vcpu-arch.dear = vcpu-arch.fault_dear; + if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { + to_book3s(vcpu)-dsisr = kvmppc_alignment_dsisr(vcpu, + vcpu-arch.last_inst); + kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + } + r = RESUME_GUEST; + break; case BOOK3S_INTERRUPT_MACHINE_CHECK: case BOOK3S_INTERRUPT_TRACE: kvmppc_book3s_queue_irqprio(vcpu, exit_nr); diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c index 39d5003..c401dd4 100644 --- a/arch/powerpc/kvm/book3s_64_emulate.c +++ b/arch/powerpc/kvm/book3s_64_emulate.c @@ -44,6 +44,8 @@ /* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */ #define OP_31_XOP_DCBZ 1010 +#define OP_LFS 48 + #define SPRN_GQR0 912 #define SPRN_GQR1 913 #define SPRN_GQR2 914 @@ -474,3 +476,41 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) return emulated; } +u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst) +{ + u32 dsisr = 0; + + /* +* This is what the spec says about DSISR bits (not mentioned = 0): +* +* 12:13[DS]Set to bits 30:31 +* 15:16[X] Set to bits 29:30 +* 17 [X] Set to bit 25 +* [D/DS] Set to bit 5 +* 18:21[X] Set to bits 21:24 +* [D/DS] Set to bits 1:4 +* 22:26Set to bits 6:10 (RT/RS/FRT/FRS) +* 27:31Set to bits 11:15 (RA) +*/ + + switch (get_op(inst)) { + /* D-form */ + case OP_LFS: + dsisr |= (inst 12) 0x4000; /* bit 17 */ + dsisr |= (inst 17) 0x3c00; /* bits 18:21 */ + break; + /* X-form */ + case 31: + dsisr |= (inst 14) 0x18000; /* bits 15:16 */ + dsisr |= (inst 8) 0x04000; /* bit 17 */ + dsisr |= (inst 3) 0x03c00; /* bits 18:21 */ + break; + default: + printk(KERN_INFO KVM: Unaligned instruction 0x%x\n, inst); + break; + } + + dsisr |= (inst 16) 0x03ff; /* bits 22:31 */ + + return dsisr; +} -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/15] KVM: PPC: MOL bringup patches
Alexander Graf wrote: Mac-on-Linux has always lacked PPC64 host support. This is going to change now! This patchset contains minor patches to enable MOL, but is mostly about bug fixes that came out of running Mac OS X. With this set and a pretty small patch to MOL I have 10.4.11 running as a guest on a 970MP host. I'll send the MOl patches to the respective ML in the next days. The patches for MOL are integrated in their SVN already. Forgot to change the description. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1
Chris Webb ch...@arachsys.com writes: During boot, the screen gets resized to height 1 and a mouse click at this point will cause a division by zero when calculating the absolute pointer position from the pixel (x, y). Return a click in the middle of the screen instead in this case. I think this probably ought to be a candidate for 0.12-stable too. We're seeing these crashes for real from time-to-time so it's not just a theoretical problem. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM PMU virtualization
On 02/26/2010 04:42 PM, Peter Zijlstra wrote: Also, intel debugstore things requires a host linear address, It requires a linear address, not a host linear address. Of course, it might not like the linear address mappings changing under its feet. If it has a private tlb, then this won't work. again, not something a vcpu can easily provide (although that might be worked around with an msr trap, but that still limits you to 1 page data sizes, not a limitation all software will respect). If you're willing to pin pages, you can map the guest's buffer. That won't work if BTS can happen in parallel with a #VMEXIT, or if there are interactions with npt/ept. Will have to ask the vendors. All that said, what we really want is for Intel+AMD to come up with proper hw PMU virtualization support that makes it easy to rotate the full PMU in and out for a guest. Then this whole discussion will become a non issue. As it stands there simply are a number of PMU features that defy being virtualized, simply because the virt stuff doesn't do system topology. So even if they were to support a virtualized pmu, it would likely be a different beast than the native hardware is, and it will be several hardware models in the future, coming up with a paravirt interface and getting !linux hosts to adapt and !linux guests to use is probably as 'easy'. !linux hosts are someone else's problem, but how would be get !linux guests to use a soft pmu? The only way I see that happening is if a soft pmu is standardized across hypervisors, which is unfortunately unlikely. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
Avi Kivity wrote: On 03/08/2010 03:46 AM, Bernhard Schmidt wrote: Hi, sorry for this pretty generic question, I did not find any real pros and cons on the net anywhere, but I might just have missed them. In a pure x86_64 environment (~2.6.32 vanilla kernel, 0.12.3 qemu-kvm), is enabling linux-aio in KVM a good idea? Yes. Apparently that does not quite work. I just re-compiled kvm with --enable-linux-aio (actually I just installed libaio-dev on debian and qemu-kvm's configure picked it up automatically), and tried a guest. But any I/O fails. kvm-0.12.3 ... -drive file=/dev/sda10,if=virtio,cache=none,aio=native (/dev/sda10 is a (spare) partition on my hard drive I use for testing). Here's the resulting dmesg in the guest (2.6.32): vdb: end_request: I/O error, dev vdb, sector 0 Buffer I/O error on device vdb, logical block 0 Buffer I/O error on device vdb, logical block 1 Buffer I/O error on device vdb, logical block 2 Buffer I/O error on device vdb, logical block 3 Buffer I/O error on device vdb, logical block 4 Buffer I/O error on device vdb, logical block 5 Buffer I/O error on device vdb, logical block 6 Buffer I/O error on device vdb, logical block 7 end_request: I/O error, dev vdb, sector 0 Buffer I/O error on device vdb, logical block 0 Buffer I/O error on device vdb, logical block 1 end_request: I/O error, dev vdb, sector 0 unable to read partition table And any I/O - be it reads of writes - fails. I see some aio_submit() etc are happening in strace, but no errors. Unfortunately my strace does not decode io_*() routines. # fgrep io_ trc ... 1227 io_submit(4152147968, 1, {...}) = 1 1226 io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1 1227 io_submit(4152147968, 1, {...}) = 1 1226 io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1 1227 io_submit(4152147968, 1, {...}) = 1 1226 io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1 ... Oh well ;) /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
Michael Tokarev wrote: [] Apparently that does not quite work. I just re-compiled kvm with --enable-linux-aio (actually I just installed libaio-dev on debian and qemu-kvm's configure picked it up automatically), and tried a guest. But any I/O fails. It has nothing to do with kvm. It is compat_ioctl32 in the kernel wrt aio calls. Historically I've a 64bit kernel with 32bit userland, and tried 32bit kvm too, and that does not work. But 64bit kvm works just fine with aio, and the performance numbers are indeed better. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
Avi Kivity wrote: On 03/08/2010 04:10 PM, Stefan Bader wrote: Avi Kivity wrote: On 03/06/2010 03:53 PM, Stefan Bader wrote: i Avi, we currently try to integrate this patch for an update into a 2.6.32 based system (amongst other kvm updates). But as soon as this patch gets added kvm will die on startup in kvm_leave_lazy_mmu. This has been documented here: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823 I have placed the backports of your patches, which are currently in linux-next and marked for stable here: git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm I have tested the failure with a version that got only the following patches in: KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation KVM: x86 emulator: Check IOPL level during io instruction emulation KVM: x86 emulator: Fix popf emulation KVM: x86 emulator: Check CPL level during privilege instruction emulation and also with a version that takes all stable patches up to the bad one: KVM: VMX: Trap and invalid MWAIT/MONITOR instruction KVM: x86 emulator: Add group8 instruction decoding KVM: x86 emulator: Add group9 instruction decoding KVM: x86 emulator: Add Virtual-8086 mode of emulation KVM: x86 emulator: fix memory access during x86 emulation But as soon as the fix for memory access gets added, the bug will occur. Would you have an idea what might be causing this? Does the same guest, using the same qemu-kvm, work on kvm.git or upstream? The test was done with a kvm user-space package based on 0.12.3 (which seems to be the current upstream version). I try to do a test on the git version. I meant keep the same userspace without change, and try it on a Linus kernel or kvm.git master (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary). HEAD of kvm.git tree works (with same client and userspace) Stable 2.6.32.y tree plus all patches marked cc: stable fails. (32bit host/guest) Host dmesg: kvm: emulating exchange as write Guest dmesg: ... [3.053503] Freeing initrd memory: 8843k freed [3.059863] Freeing unused kernel memory: 660k freed [3.076657] Write protecting the kernel text: 4780k [3.082863] Write protecting the kernel read-only data: 1912k [3.08] BUG: unable to handle kernel paging request at c01292e3 [3.088025] IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 [3.088025] *pde = 00910067 *pte = 00129161 [3.088025] Oops: 0003 [#1] SMP [3.088025] last sysfs file: [3.088025] Modules linked in: [3.088025] [3.088025] Pid: 1, comm: init Not tainted (2.6.32-15-generic #22-Ubuntu) Bochs [3.088025] EIP: 0060:[c01292e3] EFLAGS: 00010246 CPU: 0 [3.088025] EIP is at kvm_leave_lazy_mmu+0x43/0x70 [3.088025] EAX: 0002 EBX: 0018 ECX: 01802c20 EDX: [3.088025] ESI: c1802c20 EDI: c1802c20 EBP: df071cb4 ESP: df071ca8 [3.088025] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [3.088025] Process init (pid: 1, ti=df07 task=df068000 task.ti=df07) [3.088025] Stack: [3.088025] c000 dce2b000 dce2a844 df071cf0 c01e8b6d 0001 b000 [3.088025] 0 db7ed000 c139d54c c139d54c df133000 db7ed000 1ffef067 b000 [3.088025] 0 bfe1 db44bbfc df071d2c c01e8ce0 c000 df133000 db44bbfc bfe1 [3.088025] Call Trace: [3.088025] [c01e8b6d] ? move_ptes+0x1ad/0x270 [3.088025] [c01e8ce0] ? move_page_tables+0xb0/0x130 [3.088025] [c020b614] ? shift_arg_pages+0x94/0x180 [3.088025] [c020b885] ? setup_arg_pages+0x185/0x1b0 [3.088025] [c0241243] ? load_elf_binary+0x3c3/0xac0 [3.088025] [c02f1654] ? security_file_permission+0x14/0x20 [3.088025] [c02052f4] ? rw_verify_area+0x64/0xe0 [3.088025] [c0240e80] ? load_elf_binary+0x0/0xac0 [3.088025] [c020bd9f] ? search_binary_handler+0xef/0x2f0 [3.088025] [c020b465] ? kernel_read+0x35/0x50 [3.088025] [c023f7b2] ? load_script+0x1e2/0x270 [3.088025] [c01e4160] ? get_user_pages+0x50/0x60 [3.088025] [c020a662] ? get_arg_page+0x52/0xb0 [3.088025] [c023f5d0] ? load_script+0x0/0x270 [3.088025] [c020bd9f] ? search_binary_handler+0xef/0x2f0 [3.088025] [c020a834] ? copy_strings+0x174/0x190 [3.088025] [c020c2c7] ? do_execve+0x1f7/0x2c0 [3.088025] [c034ed6a] ? strncpy_from_user+0x3a/0x70 [3.088025] [c0101a1d] ? sys_execve+0x2d/0x60 [3.088025] [c01033ec] ? syscall_call+0x7/0xb [3.088025] [c01070a4] ? kernel_execve+0x24/0x30 [3.088025] [c01012ac] ? run_init_process+0x1c/0x20 [3.088025] [c0101396] ? init_post+0xe6/0x100 [3.088025] [c07d83d0] ? kernel_init+0xb8/0xbf [3.088025] [c07d8318] ? kernel_init+0x0/0xbf [3.088025] [c0104087] ? kernel_thread_helper+0x7/0x10 [3.088025] Code: 6c 87 c0 64 a1 40 6a 87 c0 03 3c 85 80 4a 7d c0 8b 9f 00 04 00 00 85 db 74 24 89 fe 31 d2 66 90 8d 8e 00 00 00 40 b8 02 00 00 00 0f 01 c1 01 c6 29 c3 75 ec c7 87
Re: [PATCH 00/10] uq/master: irqchip-in-kernel support
On Thu, Mar 04, 2010 at 05:33:16PM +0100, Jan Kiszka wrote: Glauber Costa wrote: Hi guys, This is the same in-kernel irqchip support already posted to qemu-devel, just rebased, retested, etc. It passes my basic tests, so it seem to be still in good shape. It is provided against uq/master as part of the integration efforts Just as another heads-up: host-guest networking performance over slirp and non-virtio NICs suffers with this irqchip support the same way as in qemu-kvm. It's not a bug I expect to be directly related to these changes, but it is at least triggered by them and should now really be addressed. Isnt it triggered by enablement of the iothread (and if so irqchip support is unrelated to the problem) ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Rework VCPU state writeback API
On Fri, Mar 05, 2010 at 09:37:26PM -0500, Kevin O'Connor wrote: On Thu, Mar 04, 2010 at 03:35:52PM -0300, Marcelo Tosatti wrote: On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote: On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote: The regression seems to be caused by seabios commit d7e998f. Kevin, the failure can be seen on the attached screenshot, which happens on the first reboot of WinXP 32 installation (after copying files etc). Sorry - I also noticed a bug in that commit recently. I pushed the fix I had in my local tree. Thanks, it does fix the issue here. Anthony can you please update seabios? Neither commit d7e998f nor the fix 8f469b96 are on the SeaBIOS stable branch. Is qemu ready to pull in bigger changes now? Anthony pulls in seabios master into qemu.git master periodically. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
It's faster. Hi Avi, Could You give some rough estimate on how much faster? I'm stuck with glibc-2.5 now, but I'm always eager to improve performance, so I wonder if it would make sense to either port eventfd + aio stuff, or switch to glibc-2.8 for me... -- - Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On Monday 08 March 2010 03:27:36 pm Nikola Ciprich wrote: It's faster. Hi Avi, Could You give some rough estimate on how much faster? I'm stuck with glibc-2.5 now, but I'm always eager to improve performance, so I wonder if it would make sense to either port eventfd + aio stuff, or switch to glibc-2.8 for me... I saw approx. 10% improvement in sequential i/o. Random i/o was only marginally faster in our setup. We generally have problems with random i/o here... Something to do with our setup. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/7] kvm-tpr-opt: remove dead code
Simplify code around kvm_enable_tpr_access_reporting. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-tpr/qemu-kvm-x86.c === --- qemu-kvm-tpr.orig/qemu-kvm-x86.c +++ qemu-kvm-tpr/qemu-kvm-x86.c @@ -597,30 +597,16 @@ int kvm_get_shadow_pages(kvm_context_t k } #ifdef KVM_CAP_VAPIC - -static int tpr_access_reporting(CPUState *env, int enabled) -{ - int r; - struct kvm_tpr_access_ctl tac = { - .enabled = enabled, - }; - - r = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_VAPIC); - if (r = 0) - return -ENOSYS; - return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, tac); -} - -int kvm_enable_tpr_access_reporting(CPUState *env) +static int kvm_enable_tpr_access_reporting(CPUState *env) { - return tpr_access_reporting(env, 1); -} +int r; +struct kvm_tpr_access_ctl tac = { .enabled = 1 }; -int kvm_disable_tpr_access_reporting(CPUState *env) -{ - return tpr_access_reporting(env, 0); +r = kvm_ioctl(env-kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_VAPIC); +if (r = 0) +return -ENOSYS; +return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, tac); } - #endif int kvm_qemu_create_memory_alias(uint64_t phys_start, @@ -1319,7 +1305,7 @@ int kvm_arch_init_vcpu(CPUState *cenv) #endif #ifdef KVM_EXIT_TPR_ACCESS -kvm_tpr_vcpu_start(cenv); +kvm_enable_tpr_access_reporting(cenv); #endif kvm_reset_mpstate(cenv); return 0; Index: qemu-kvm-tpr/qemu-kvm.h === --- qemu-kvm-tpr.orig/qemu-kvm.h +++ qemu-kvm-tpr/qemu-kvm.h @@ -601,27 +601,6 @@ int kvm_get_pit2(kvm_context_t kvm, stru #ifdef KVM_CAP_VAPIC -/*! - * \brief Enable kernel tpr access reporting - * - * When tpr access reporting is enabled, the kernel will call the - * -tpr_access() callback every time the guest vcpu accesses the tpr. - * - * \param kvm Pointer to the current kvm_context - * \param vcpu vcpu to enable tpr access reporting on - */ -int kvm_enable_tpr_access_reporting(CPUState *env); - -/*! - * \brief Disable kernel tpr access reporting - * - * Undoes the effect of kvm_enable_tpr_access_reporting(). - * - * \param kvm Pointer to the current kvm_context - * \param vcpu vcpu to disable tpr access reporting on - */ -int kvm_disable_tpr_access_reporting(CPUState *env); - int kvm_enable_vapic(CPUState *env, uint64_t vapic); #endif @@ -895,7 +874,6 @@ void qemu_kvm_aio_wait_end(void); void qemu_kvm_notify_work(void); void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write); -void kvm_tpr_vcpu_start(CPUState *env); int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf); Index: qemu-kvm-tpr/kvm-tpr-opt.c === --- qemu-kvm-tpr.orig/kvm-tpr-opt.c +++ qemu-kvm-tpr/kvm-tpr-opt.c @@ -318,11 +318,6 @@ void kvm_tpr_access_report(CPUState *env patch_instruction(env, rip); } -void kvm_tpr_vcpu_start(CPUState *env) -{ -kvm_enable_tpr_access_reporting(env); -} - static void tpr_save(QEMUFile *f, void *s) { int i; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/7] kvm-tpr-opt: clean up usage of bios_enabled
1. bios_enabled must already be set when enable_vapic is called. 2. kvm_tpr_vcpu_start is called during vcpu creation, when bios_enabled is always zero. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-tpr/kvm-tpr-opt.c === --- qemu-kvm-tpr.orig/kvm-tpr-opt.c +++ qemu-kvm-tpr/kvm-tpr-opt.c @@ -250,7 +250,6 @@ int kvm_tpr_enable_vapic(CPUState *env) static int enable_vapic(CPUState *env) { -bios_enabled = 1; env-update_vapic = 1; return 1; } @@ -322,8 +321,6 @@ void kvm_tpr_access_report(CPUState *env void kvm_tpr_vcpu_start(CPUState *env) { kvm_enable_tpr_access_reporting(env); -if (bios_enabled) - kvm_tpr_enable_vapic(env); } static void tpr_save(QEMUFile *f, void *s) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/7] kvm-tpr-opt cleanups
Prepare kvm-tpr-opt.c for upstream merge. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/7] kvm-tpr-opt: use device_init
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-tpr/kvm-tpr-opt.c === --- qemu-kvm-tpr.orig/kvm-tpr-opt.c +++ qemu-kvm-tpr/kvm-tpr-opt.c @@ -401,10 +401,12 @@ static void vtpr_ioport_write(void *opaq kvm_tpr_enable_vapic(env); } -void kvm_tpr_opt_setup(void) +static void kvm_tpr_opt_setup(void) { register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL); register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL); register_ioport_write(0x7e, 2, 2, vtpr_ioport_write16, NULL); } +device_init(kvm_tpr_opt_setup); + Index: qemu-kvm-tpr/qemu-kvm-x86.c === --- qemu-kvm-tpr.orig/qemu-kvm-x86.c +++ qemu-kvm-tpr/qemu-kvm-x86.c @@ -157,10 +157,6 @@ int kvm_arch_create(kvm_context_t kvm, u if (r 0) return r; -#ifdef KVM_EXIT_TPR_ACCESS -kvm_tpr_opt_setup(); -#endif - return 0; } Index: qemu-kvm-tpr/qemu-kvm.h === --- qemu-kvm-tpr.orig/qemu-kvm.h +++ qemu-kvm-tpr/qemu-kvm.h @@ -894,7 +894,6 @@ void qemu_kvm_aio_wait_end(void); void qemu_kvm_notify_work(void); -void kvm_tpr_opt_setup(void); void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write); void kvm_tpr_vcpu_start(CPUState *env); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/7] kvm-tpr-opt: qemu-kvm.h - kvm.h
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-tpr/kvm-tpr-opt.c === --- qemu-kvm-tpr.orig/kvm-tpr-opt.c +++ qemu-kvm-tpr/kvm-tpr-opt.c @@ -14,7 +14,7 @@ #include hw/hw.h #include hw/isa.h #include sysemu.h -#include qemu-kvm.h +#include kvm.h #include cpu.h #include stdio.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/7] qemu-kvm: move vapic enablement to kvm_arch_load_regs
update_vapic is used for enabling vcpu's vapic on migration. Use the new writeback states for that. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-tpr/qemu-kvm-x86.c === --- qemu-kvm-tpr.orig/qemu-kvm-x86.c +++ qemu-kvm-tpr/qemu-kvm-x86.c @@ -988,6 +988,10 @@ void kvm_arch_load_regs(CPUState *env, i kvm_arch_load_mpstate(env); kvm_load_lapic(env); } +if (level == KVM_PUT_FULL_STATE) { +if (env-update_vapic) +kvm_tpr_enable_vapic(env); +} if (kvm_irqchip_in_kernel()) { /* Avoid deadlock: no user space IRQ will ever clear it. */ env-halted = 0; @@ -1338,9 +1342,6 @@ int kvm_arch_halt(CPUState *env) int kvm_arch_pre_run(CPUState *env, struct kvm_run *run) { -if (env-update_vapic) { -kvm_tpr_enable_vapic(env); -} if (!kvm_irqchip_in_kernel()) kvm_set_cr8(env, cpu_get_apic_tpr(env)); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html