Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/16/2010 02:20 AM, Jason wrote: In comparing KVM 2.6.31.6b to XenServer 5.5.0, it seems KVM has fewer overall VMREADs and VMWRITEs, but there are a lot of VMWRITEs to Host FS_SEL, Host GS_SEL, Host FS_BASE, and Host GS_BASE that don't appear in Xen. Ugh, these should definitely be eliminated, they keep writing the same value most of the time. Also, KVM has a lot of MSR accesses to 0xc081-0xc084 that Xen doesn't have. These are unavoidable. Those msrs are used for system calls and we need them to keep ordinary userspace going. 2.6.33 should reduce their frequency though. Usually it doesn't make sense to pass them through, but if we detect the guest is writing them often, we can do so and eliminate the exits. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2971075 ] Assertion `bmdma->unit != (uint8_t)-1' failed.
Bugs item #2971075, was opened at 2010-03-16 07:02 Message generated for change (Tracker Item Submitted) made by zaphodbrx You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2971075&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: v1.0 (example) Status: Open Resolution: None Priority: 5 Private: No Submitted By: Andre Weidemann (zaphodbrx) Assigned to: Nobody/Anonymous (nobody) Summary: Assertion `bmdma->unit != (uint8_t)-1' failed. Initial Comment: Hi, I cloned the qemu-kvm git repository with "git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git qemu-kvm-2010-03-14", ran configure and compiled it and did a "make install". Everything went fine without warnings or errors. For configure output take a look here: http://pastebin.com/BL4DYCRY Here is my Server Hardware: Asus P5Q Mainboard Intel Q9300 8GB RAM I am running Ubuntu 9.10 x86_64 with kernel 2.6.31-20-server. RAID5 with mdadm consisting of 4x 1TB disks The volume /dev/storage/Windows7test mentioned below is on this RAID5. I ran my virtual machine with the following command: qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc 192.168.3.42:2 -k de -smp 4,cores=4 -drive file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m 1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net tap,script=/usr/local/bin/qemu-ifup -monitor pty -name Windows7test,process=Windows7test -drive file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native Windows7Test_600G.img is a qcow2 file and contains a Windows 7 Pro image. /dev/storage/Windows7test is formated with XFS(inside the VM) After starting the machine with the above command line, I booted into an Ubuntu 9.10 x86_64 Live Image via PXE and mounted /dev/sdb1 (/dev/storage/Windows7test) under /mnt. I then did "cd /mnt/" and ran "iozone -Ra -g 2G -b /tmp/iozone-aoi-linux.xls" iozone ran some tests and then kvm simply quit with the following error message: qemu-system-x86_64: /usr/local/src/qemu-kvm-2010-03-10/hw/ide/internal.h:510: bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed. /var/log/syslog contained the folowing: Mar 14 09:18:14 server kernel: [318080.627468] kvm: 1361: cpu0 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop Mar 14 09:18:14 server kernel: [318080.627473] kvm: 1361: cpu0 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop Mar 14 09:18:14 server kernel: [318080.627476] kvm: 1361: cpu0 unhandled wrmsr: 0x400 data Mar 14 09:18:14 server kernel: [318080.627506] kvm: 1361: cpu1 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop Mar 14 09:18:14 server kernel: [318080.627509] kvm: 1361: cpu1 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop Mar 14 09:18:14 server kernel: [318080.627511] kvm: 1361: cpu1 unhandled wrmsr: 0x400 data Mar 14 09:18:14 server kernel: [318080.627538] kvm: 1361: cpu2 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop Mar 14 09:18:14 server kernel: [318080.627540] kvm: 1361: cpu2 kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop Mar 14 09:18:14 server kernel: [318080.627543] kvm: 1361: cpu2 unhandled wrmsr: 0x400 data I was able to reproduce this error 3 times in a row. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2971075&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/16/2010 03:21 AM, Anthony Liguori wrote: On 03/15/2010 10:06 AM, Avi Kivity wrote: On 03/15/2010 03:23 PM, Anthony Liguori wrote: On 03/15/2010 08:11 AM, Avi Kivity wrote: On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to be costly. KVM is a bit unusual in terms of how many times the instructions are executed per exit. Do you know offhand of any unnecessary read/writes? There's update_cr8_intercept(), but on normal exits, I don't see what else we can remove. Yeah, there are a number of examples. vmcs_clear_bits() and vmcs_set_bits() read a field of the VMCS and then immediately writes it. This is unnecessary as the same information could be kept in a shadow variable. In vmx_fpu_activate, we call vmcs_clear_bits() followed immediately by vmcs_set_bits(). which means we're reading GUEST_CR0 twice and writing it twice. This should be much better these days (2.6.34-rc1) as vmx_fpu_activate() is called at most once per heavyweight exit (and I have evil plans to reduce it even further). Still, that code should be optimized. vmx_get_rflags() reads from the VMCS and we frequently call get_rflags() followed by a set_rflags() to update a bit. We also don't cache the value between calls and there's a few spots in the code that make multiple calls. We definitely should cache that (and segment access from the emulator as well). But I'd have thought this to be relatively infrequent. At least with Linux, using x2apic and virtio allows you to eliminate most emulator access, if you have npt or ept. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: From: Zhang, Yanmin Based on the discussion in KVM community, I worked out the patch to support perf to collect guest os statistics from host side. This patch is implemented with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a critical bug and provided good suggestions with other guys. I really appreciate their kind help. The patch adds new subcommand kvm to perf. perf kvm top perf kvm record perf kvm report perf kvm diff The new perf could profile guest os kernel except guest os user space, but it could summarize guest os user space utilization per guest os. Below are some examples. 1) perf kvm top [r...@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules top Excellent, support for guest kernel != host kernel is critical (I can't remember the last time I ran same kernels). How would we support multiple guests with different kernels? Perhaps a symbol server that perf can connect to (and that would connect to guests in turn)? diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 @@ -26,6 +26,7 @@ #include #include #include +#include #include "kvm_cache_regs.h" #include "x86.h" @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct vmcs_write32(TPR_THRESHOLD, irr); } +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; + +static void kvm_set_in_guest(void) +{ + percpu_write(kvm_in_guest, 1); +} + +static int kvm_is_in_guest(void) +{ + return percpu_read(kvm_in_guest); +} There is already PF_VCPU for this. +static struct perf_guest_info_callbacks kvm_guest_cbs = { + .is_in_guest= kvm_is_in_guest, + .is_user_mode = kvm_is_user_mode, + .get_guest_ip = kvm_get_guest_ip, + .reset_in_guest = kvm_reset_in_guest, +}; Should be in common code, not vmx specific. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
* Randy Dunlap [2010-03-15 08:46:31]: > On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote: > > Hi, > If you go ahead with this, please add the boot parameter & its description > to Documentation/kernel-parameters.txt. > I certainly will, thanks for keeping a watch. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
* Chris Webb [2010-03-15 20:23:54]: > Avi Kivity writes: > > > On 03/15/2010 10:07 AM, Balbir Singh wrote: > > > > >Yes, it is a virtio call away, but is the cost of paying twice in > > >terms of memory acceptable? > > > > Usually, it isn't, which is why I recommend cache=off. > > Hi Avi. One observation about your recommendation for cache=none: > > We run hosts of VMs accessing drives backed by logical volumes carved out > from md RAID1. Each host has 32GB RAM and eight cores, divided between (say) > twenty virtual machines, which pretty much fill the available memory on the > host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback > caching turned on get advertised to the guest as having a write-cache, and > FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback > isn't acting as cache=neverflush like it would have done a year ago. I know > that comparing performance for cache=none against that unsafe behaviour > would be somewhat unfair!) > > Wasteful duplication of page cache between guest and host notwithstanding, > turning on cache=writeback is a spectacular performance win for our guests. > For example, even IDE with cache=writeback easily beats virtio with > cache=none in most of the guest filesystem performance tests I've tried. The > anecdotal feedback from clients is also very strongly in favour of > cache=writeback. > > With a host full of cache=none guests, IO contention between guests is > hugely problematic with non-stop seek from the disks to service tiny > O_DIRECT writes (especially without virtio), many of which needn't have been > synchronous if only there had been some way for the guest OS to tell qemu > that. Running with cache=writeback seems to reduce the frequency of disk > flush per guest to a much more manageable level, and to allow the host's > elevator to optimise writing out across the guests in between these flushes. Thanks for the inputs above, they are extremely useful. The goal of these patches is that with cache != none, we allow double caching when needed and then slowly take away unmapped pages, pushing the caching to the host. There are knobs to control how much, etc and the whole feature is enabled via a boot parameter. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/15/2010 07:43 PM, Christoph Hellwig wrote: On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote: I knew someone would do this... This really gets down to your definition of "safe" behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches are volatile and should a drive lose power, it could lead to data corruption. Enterprise disks tend to have battery backed write caches to prevent this. In the set up you're emulating, the host is acting as a giant write cache. Should your host fail, you can get data corruption. cache=writethrough provides a much stronger data guarantee. Even in the event of a host failure, data integrity will be preserved. Actually cache=writeback is as safe as any normal host is with a volatile disk cache, except that in this case the disk cache is actually a lot larger. With a properly implemented filesystem this will never cause corruption. Metadata corruption, not necessarily corruption of data stored in a file. You will lose recent updates after the last sync/fsync/etc up to the size of the cache, but filesystem metadata should never be corrupted, and data that has been forced to disk using fsync/O_SYNC should never be lost either. Not all software uses fsync as much as they should. And often times, it's for good reason (like ext3). This is mitigated by the fact that there's usually a short window of time before metadata is flushed to disk. Adding another layer increases that delay. IIUC, an O_DIRECT write using cache=writeback is not actually on the spindle when the write() completes. Rather, an explicit fsync() would be required. That will cause data corruption in many applications (like databases) regardless of whether the fs gets metadata corruption. You could argue that the software should disable writeback caching on the virtual disk, but we don't currently support that so even if the application did, it's not going to help. Regards, Anthony Liguori If it is that's a bug somewhere in the stack, but in my powerfail testing we never did so using xfs or ext3/4 after I fixed up the fsync code in the latter two. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 10:06 AM, Avi Kivity wrote: On 03/15/2010 03:23 PM, Anthony Liguori wrote: On 03/15/2010 08:11 AM, Avi Kivity wrote: On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to be costly. KVM is a bit unusual in terms of how many times the instructions are executed per exit. Do you know offhand of any unnecessary read/writes? There's update_cr8_intercept(), but on normal exits, I don't see what else we can remove. Yeah, there are a number of examples. vmcs_clear_bits() and vmcs_set_bits() read a field of the VMCS and then immediately writes it. This is unnecessary as the same information could be kept in a shadow variable. In vmx_fpu_activate, we call vmcs_clear_bits() followed immediately by vmcs_set_bits(). which means we're reading GUEST_CR0 twice and writing it twice. vmx_get_rflags() reads from the VMCS and we frequently call get_rflags() followed by a set_rflags() to update a bit. We also don't cache the value between calls and there's a few spots in the code that make multiple calls. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote: > I knew someone would do this... > > This really gets down to your definition of "safe" behaviour. As it > stands, if you suffer a power outage, it may lead to guest corruption. > > While we are correct in advertising a write-cache, write-caches are > volatile and should a drive lose power, it could lead to data > corruption. Enterprise disks tend to have battery backed write caches > to prevent this. > > In the set up you're emulating, the host is acting as a giant write > cache. Should your host fail, you can get data corruption. > > cache=writethrough provides a much stronger data guarantee. Even in the > event of a host failure, data integrity will be preserved. Actually cache=writeback is as safe as any normal host is with a volatile disk cache, except that in this case the disk cache is actually a lot larger. With a properly implemented filesystem this will never cause corruption. You will lose recent updates after the last sync/fsync/etc up to the size of the cache, but filesystem metadata should never be corrupted, and data that has been forced to disk using fsync/O_SYNC should never be lost either. If it is that's a bug somewhere in the stack, but in my powerfail testing we never did so using xfs or ext3/4 after I fixed up the fsync code in the latter two. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
Avi Kivity redhat.com> writes: > > On 03/15/2010 03:23 PM, Anthony Liguori wrote: > > On 03/15/2010 08:11 AM, Avi Kivity wrote: > >> Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. > >> > >> I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can > >> do for other guests. > > > > VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to > > be costly. KVM is a bit unusual in terms of how many times the > > instructions are executed per exit. > > Do you know offhand of any unnecessary read/writes? There's > update_cr8_intercept(), but on normal exits, I don't see what else we > can remove. > In comparing KVM 2.6.31.6b to XenServer 5.5.0, it seems KVM has fewer overall VMREADs and VMWRITEs, but there are a lot of VMWRITEs to Host FS_SEL, Host GS_SEL, Host FS_BASE, and Host GS_BASE that don't appear in Xen. Also, KVM has a lot of MSR accesses to 0xc081-0xc084 that Xen doesn't have. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/15/2010 03:23 PM, Chris Webb wrote: Avi Kivity writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend cache=off. Hi Avi. One observation about your recommendation for cache=none: We run hosts of VMs accessing drives backed by logical volumes carved out from md RAID1. Each host has 32GB RAM and eight cores, divided between (say) twenty virtual machines, which pretty much fill the available memory on the host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback caching turned on get advertised to the guest as having a write-cache, and FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback isn't acting as cache=neverflush like it would have done a year ago. I know that comparing performance for cache=none against that unsafe behaviour would be somewhat unfair!) I knew someone would do this... This really gets down to your definition of "safe" behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches are volatile and should a drive lose power, it could lead to data corruption. Enterprise disks tend to have battery backed write caches to prevent this. In the set up you're emulating, the host is acting as a giant write cache. Should your host fail, you can get data corruption. cache=writethrough provides a much stronger data guarantee. Even in the event of a host failure, data integrity will be preserved. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity writes: > On 03/15/2010 10:07 AM, Balbir Singh wrote: > > >Yes, it is a virtio call away, but is the cost of paying twice in > >terms of memory acceptable? > > Usually, it isn't, which is why I recommend cache=off. Hi Avi. One observation about your recommendation for cache=none: We run hosts of VMs accessing drives backed by logical volumes carved out from md RAID1. Each host has 32GB RAM and eight cores, divided between (say) twenty virtual machines, which pretty much fill the available memory on the host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback caching turned on get advertised to the guest as having a write-cache, and FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback isn't acting as cache=neverflush like it would have done a year ago. I know that comparing performance for cache=none against that unsafe behaviour would be somewhat unfair!) Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. For example, even IDE with cache=writeback easily beats virtio with cache=none in most of the guest filesystem performance tests I've tried. The anecdotal feedback from clients is also very strongly in favour of cache=writeback. With a host full of cache=none guests, IO contention between guests is hugely problematic with non-stop seek from the disks to service tiny O_DIRECT writes (especially without virtio), many of which needn't have been synchronous if only there had been some way for the guest OS to tell qemu that. Running with cache=writeback seems to reduce the frequency of disk flush per guest to a much more manageable level, and to allow the host's elevator to optimise writing out across the guests in between these flushes. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: clean up assigned_device_enable_host_msix
On Sat, Mar 13, 2010 at 03:00:45PM +0800, jing zhang wrote: > From: Jing Zhang > > Date: Sat Mar 13 14:05:27 2010 > > Cc: Avi Kivity > Signed-off-by: Jing Zhang Applied (with a better description), thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/2] qemu-kvm: Save&restore debug registers
On Fri, Mar 12, 2010 at 03:20:48PM +0100, Jan Kiszka wrote: > Patch 1 is for upstream and should be applied to uq/master as well, patch > 2 is for qemu-kvm only. > > Jan Kiszka (2): > KVM: x86: Add debug register saving and restoring > qemu-kvm: x86: Add support for saving&restoring debug registers > > kvm-all.c | 11 ++ > kvm.h |1 + > qemu-kvm-x86.c|2 + > qemu-kvm.c|5 > qemu-kvm.h|1 + > target-i386/kvm.c | 55 > + > 6 files changed, 75 insertions(+), 0 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure
On Fri, Mar 12, 2010 at 12:59:06PM +0800, Wei Yongjun wrote: > This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO > from -EINVAL to -ENXIO if no coalesced mmio dev exists. > > Signed-off-by: Wei Yongjun Applied all, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On 03/15/2010 04:27 AM, Avi Kivity wrote: That's only beneficial if the cache is shared. Otherwise, you could use the balloon to evict cache when memory is tight. Shared cache is mostly a desktop thing where users run similar workloads. For servers, it's much less likely. So a modified-guest doesn't help a lot here. Not really. In many cloud environments, there's a set of common images that are instantiated on each node. Usually this is because you're running a horizontally scalable application or because you're supporting an ephemeral storage model. In fact, with ephemeral storage, you typically want to use cache=writeback since you aren't providing data guarantees across shutdown/failure. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nfs and db servers under kvm?
Hi all, I'm considering changing much of my current infrastructure so that it runs under an array of VM's. I'm wondering how well database and nfs servers run under KVM. Should I put the data on a host filesystem, or can I put it on the guest filesystem? -- Take care and have fun, Mike Diehl. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
On Mon, Mar 15, 2010 at 04:46:20PM +0100, Andre Przywara wrote: > Gleb Natapov wrote: > >If LOCK prefix is used dest arg should be memory, otherwise instruction > >should generate #UD. > Well, there is one exception: > There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where > there is no memory involved (and we intercept this). I am not sure > if anyone actually uses this code sequence, but it is definitely > legal. > Even without this patch "lock mov cr0" will cause #UD to be injected by emulator since mov does not have Lock in opcode table. Also it look like Intel does not support this extension so no portable program can use it. > Regards, > Andre. > > > > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/kvm/emulate.c |2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > >index b89a8f2..46a7ee3 100644 > >--- a/arch/x86/kvm/emulate.c > >+++ b/arch/x86/kvm/emulate.c > >@@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct > >x86_emulate_ops *ops) > > } > > /* LOCK prefix is allowed only with some instructions */ > >-if (c->lock_prefix && !(c->d & Lock)) { > >+if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) { > > kvm_queue_exception(ctxt->vcpu, UD_VECTOR); > > goto done; > > } > > > -- > Andre Przywara > AMD-OSRC (Dresden) > Tel: x29712 -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/30] emulator cleanup
On Mon, Mar 15, 2010 at 04:51:35PM +0100, Andre Przywara wrote: > Gleb Natapov wrote: > >This is the first series of patches that tries to cleanup emulator code. > >This is mix of bug fixes and moving code that does emulation from x86.c > >to emulator.c while making it KVM independent. The status of the patches: > >works for me. realtime.flat test now also pass where it failed before. > > Patch 1..13, 17: > Reviewed-by: Andre Przywara > > I am still investigating a corner case in patch 14 (calling > syscall/sysenter from real mode), and there is the issue in patch > 16. I have only shortly looked over the others. > Patch 14 is only mechanical change. It doesn't change behaviour of syscall/sysenter emulation. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 08:14:29AM -0500, Anthony Liguori wrote: > On 03/15/2010 07:42 AM, Avi Kivity wrote: >> On 03/15/2010 02:38 PM, Joerg Roedel wrote: >>> On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: >Hi there, > >Our wiki page for the Summer of Code 2010 is doing quite well: > > http://wiki.qemu.org/Google_Summer_of_Code_2010 > I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. >>> Good idea. If there is interest I could help to mentor this project. >> >> Thanks. I volunteered Anthony, but he may be a little overcommitted. > > Joerg, feel free to put your name against too. [x] Done. Thanks, Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/30] emulator cleanup
Gleb Natapov wrote: This is the first series of patches that tries to cleanup emulator code. This is mix of bug fixes and moving code that does emulation from x86.c to emulator.c while making it KVM independent. The status of the patches: works for me. realtime.flat test now also pass where it failed before. Patch 1..13, 17: Reviewed-by: Andre Przywara I am still investigating a corner case in patch 14 (calling syscall/sysenter from real mode), and there is the issue in patch 16. I have only shortly looked over the others. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
On 03/15/2010 05:46 PM, Andre Przywara wrote: Gleb Natapov wrote: If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Well, there is one exception: There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where there is no memory involved (and we intercept this). I am not sure if anyone actually uses this code sequence, but it is definitely legal. It's better to trap on this instead of emulating it incorrectly as mov cr8. If/when someone adds 32-bit mov cr8 handling, this will need to be addressed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote: > Selectively control Unmapped Page Cache (nospam version) > > From: Balbir Singh > > This patch implements unmapped page cache control via preferred > page cache reclaim. The current patch hooks into kswapd and reclaims > page cache if the user has requested for unmapped page control. > This is useful in the following scenario > > - In a virtualized environment with cache!=none, we see > double caching - (one in the host and one in the guest). As > we try to scale guests, cache usage across the system grows. > The goal of this patch is to reclaim page cache when Linux is running > as a guest and get the host to hold the page cache and manage it. > There might be temporary duplication, but in the long run, memory > in the guests would be used for mapped pages. > - The option is controlled via a boot option and the administrator > can selectively turn it on, on a need to use basis. > > A lot of the code is borrowed from zone_reclaim_mode logic for > __zone_reclaim(). One might argue that the with ballooning and > KSM this feature is not very useful, but even with ballooning, > we need extra logic to balloon multiple VM machines and it is hard > to figure out the correct amount of memory to balloon. With these > patches applied, each guest has a sufficient amount of free memory > available, that can be easily seen and reclaimed by the balloon driver. > The additional memory in the guest can be reused for additional > applications or used to start additional guests/balance memory in > the host. > > KSM currently does not de-duplicate host and guest page cache. The goal > of this patch is to help automatically balance unmapped page cache when > instructed to do so. > > There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO > and the number of pages to reclaim when unmapped_page_control argument > is supplied. These numbers were chosen to avoid aggressiveness in > reaping page cache ever so frequently, at the same time providing control. > > The sysctl for min_unmapped_ratio provides further control from > within the guest on the amount of unmapped pages to reclaim. > > The patch is applied against mmotm feb-11-2010. Hi, If you go ahead with this, please add the boot parameter & its description to Documentation/kernel-parameters.txt. > TODOS > - > 1. Balance slab cache as well > 2. Invoke the balance routines from the balloon driver --- ~Randy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
Gleb Natapov wrote: If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Well, there is one exception: There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where there is no memory involved (and we intercept this). I am not sure if anyone actually uses this code sequence, but it is definitely legal. Regards, Andre. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b89a8f2..46a7ee3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* LOCK prefix is allowed only with some instructions */ - if (c->lock_prefix && !(c->d & Lock)) { + if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) { kvm_queue_exception(ctxt->vcpu, UD_VECTOR); goto done; } -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 03:23 PM, Anthony Liguori wrote: On 03/15/2010 08:11 AM, Avi Kivity wrote: On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to be costly. KVM is a bit unusual in terms of how many times the instructions are executed per exit. Do you know offhand of any unnecessary read/writes? There's update_cr8_intercept(), but on normal exits, I don't see what else we can remove. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/30] emulator cleanup
On 03/15/2010 04:38 PM, Gleb Natapov wrote: This is the first series of patches that tries to cleanup emulator code. This is mix of bug fixes and moving code that does emulation from x86.c to emulator.c while making it KVM independent. The status of the patches: works for me. realtime.flat test now also pass where it failed before. Reviewed-by: Avi Kivity -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 02:03:11PM +0100, Joerg Roedel wrote: > On Mon, Mar 15, 2010 at 05:53:13AM -0700, Muli Ben-Yehuda wrote: > > On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: > > > On 03/10/2010 11:30 PM, Luiz Capitulino wrote: > > > > > > Hi there, > > > > > > > > Our wiki page for the Summer of Code 2010 is doing quite well: > > > > > > > >http://wiki.qemu.org/Google_Summer_of_Code_2010 > > > > > > I will add another project - iommu emulation. Could be very > > > useful for doing device assignment to nested guests, which could > > > make testing a lot easier. > > > > Our experiments show that nested device assignment is pretty much > > required for I/O performance in nested scenarios. > > Really? I did a small test with virtio-blk in a nested guest (disk > read with dd, so not a real benchmark) and got a reasonable > read-performance of around 25MB/s from the disk in the l2-guest. Netperf running in L1 with direct access: ~950 Mbps throughput with 25% CPU utilization. Netperf running in L2 with virtio between L2 and L1 and direct assignment between L1 and L0: roughly the same throughput, but over 90% CPU utilization! Now extrapolate to 10GbE. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 08/30] KVM: Provide current eip as part of emulator context.
Eliminate the need to call back into KVM to get it from emulator. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |3 ++- arch/x86/kvm/emulate.c | 12 ++-- arch/x86/kvm/x86.c |1 + 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b048fd2..0765725 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -141,7 +141,7 @@ struct decode_cache { u8 seg_override; unsigned int d; unsigned long regs[NR_VCPU_REGS]; - unsigned long eip, eip_orig; + unsigned long eip; /* modrm */ u8 modrm; u8 modrm_mod; @@ -160,6 +160,7 @@ struct x86_emulate_ctxt { struct kvm_vcpu *vcpu; unsigned long eflags; + unsigned long eip; /* eip before instruction emulation */ /* Emulated execution mode, represented by an X86EMUL_MODE value. */ int mode; u32 cs_base; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8bd0557..2c27aa4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt, int rc; /* x86 instructions are limited to 15 bytes. */ - if (eip + size - ctxt->decode.eip_orig > 15) + if (eip + size - ctxt->eip > 15) return X86EMUL_UNHANDLEABLE; eip += ctxt->cs_base; while (size--) { @@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); - c->eip = c->eip_orig = kvm_rip_read(ctxt->vcpu); + c->eip = ctxt->eip; ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS); memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); @@ -1878,7 +1878,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } } register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); - c->eip = kvm_rip_read(ctxt->vcpu); + c->eip = ctxt->eip; } if (c->src.type == OP_MEM) { @@ -2447,7 +2447,7 @@ twobyte_insn: goto done; /* Let the processor re-execute the fixed hypercall */ - c->eip = kvm_rip_read(ctxt->vcpu); + c->eip = ctxt->eip; /* Disable writeback. */ c->dst.type = OP_NONE; break; @@ -2551,7 +2551,7 @@ twobyte_insn: | ((u64)c->regs[VCPU_REGS_RDX] << 32); if (kvm_set_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt->vcpu, 0); - c->eip = kvm_rip_read(ctxt->vcpu); + c->eip = ctxt->eip; } rc = X86EMUL_CONTINUE; c->dst.type = OP_NONE; @@ -2560,7 +2560,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], &msr_data)) { kvm_inject_gp(ctxt->vcpu, 0); - c->eip = kvm_rip_read(ctxt->vcpu); + c->eip = ctxt->eip; } else { c->regs[VCPU_REGS_RAX] = (u32)msr_data; c->regs[VCPU_REGS_RDX] = msr_data >> 32; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3b6848e..022d28e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3494,6 +3494,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu->arch.emulate_ctxt.vcpu = vcpu; vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu); + vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); vcpu->arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM) -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/30] KVM: x86 emulator: do not call writeback if msr access fails.
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1393bf0..b89a8f2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2563,7 +2563,7 @@ twobyte_insn: | ((u64)c->regs[VCPU_REGS_RDX] << 32); if (kvm_set_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt->vcpu, 0); - c->eip = ctxt->eip; + goto done; } rc = X86EMUL_CONTINUE; c->dst.type = OP_NONE; @@ -2572,7 +2572,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], &msr_data)) { kvm_inject_gp(ctxt->vcpu, 0); - c->eip = ctxt->eip; + goto done; } else { c->regs[VCPU_REGS_RAX] = (u32)msr_data; c->regs[VCPU_REGS_RDX] = msr_data >> 32; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 20/30] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index db4776c..702bfff 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1508,7 +1508,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt, if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)selector, seg); + rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg); return rc; } @@ -1683,7 +1683,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt, rc = emulate_pop(ctxt, ops, &cs, c->op_bytes); if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)cs, VCPU_SREG_CS); + rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS); return rc; } @@ -2717,7 +2717,7 @@ special_insn: if (c->modrm_reg == VCPU_SREG_SS) toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_MOV_SS); - rc = kvm_load_segment_descriptor(ctxt->vcpu, sel, c->modrm_reg); + rc = load_segment_descriptor(ctxt, ops, sel, c->modrm_reg); c->dst.type = OP_NONE; /* Disable writeback. */ break; @@ -2892,8 +2892,8 @@ special_insn: goto jmp; case 0xea: /* jmp far */ jump_far: - if (kvm_load_segment_descriptor(ctxt->vcpu, c->src2.val, - VCPU_SREG_CS)) + if (load_segment_descriptor(ctxt, ops, c->src2.val, + VCPU_SREG_CS)) goto done; c->eip = c->src.val; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 13/30] KVM: x86 emulator: fix mov dr to inject #UD when needed.
If CR4.DE=1 access to registers DR4/DR5 cause #UD. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 836e97b..5afddcf 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2531,9 +2531,12 @@ twobyte_insn: c->dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops->get_cr(4, ctxt->vcpu) & X86_CR4_DE) && + (c->modrm_reg == 4 || c->modrm_reg == 5)) { + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + goto done; + } + emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]); c->dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ @@ -2541,9 +2544,12 @@ twobyte_insn: c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops->get_cr(4, ctxt->vcpu) & X86_CR4_DE) && + (c->modrm_reg == 4 || c->modrm_reg == 5)) { + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + goto done; + } + emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]); c->dst.type = OP_NONE; /* no writeback */ break; case 0x30: -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl
Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 15 --- arch/x86/kvm/x86.c |6 ++ 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0c5caa4..b048fd2 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -110,6 +110,7 @@ struct x86_emulate_ops { struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); + int (*cpl)(struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5e2fa61..8bd0557 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, int rc; unsigned long val, change_mask; int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; - int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu); + int cpl = ops->cpl(ctxt->vcpu); rc = emulate_pop(ctxt, ops, &val, len); if (rc != X86EMUL_CONTINUE) @@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } -static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) +static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) { int iopl; if (ctxt->mode == X86EMUL_MODE_REAL) @@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) if (ctxt->mode == X86EMUL_MODE_VM86) return true; iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; - return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl; + return ops->cpl(ctxt->vcpu) > iopl; } static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, @@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, u16 port, u16 len) { - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) if (!emulator_io_port_access_allowed(ctxt, ops, port, len)) return false; return true; @@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* Privileged instruction can be executed only in CPL=0 */ - if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) { + if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) { kvm_inject_gp(ctxt->vcpu, 0); goto done; } @@ -2378,7 +2379,7 @@ special_insn: c->dst.type = OP_NONE; /* Disable writeback. */ break; case 0xfa: /* cli */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt->vcpu, 0); else { ctxt->eflags &= ~X86_EFLAGS_IF; @@ -2386,7 +2387,7 @@ special_insn: } break; case 0xfb: /* sti */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt->vcpu, 0); else { toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b139334..3b6848e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3442,6 +3442,11 @@ static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) } } +static int emulator_get_cpl(struct kvm_vcpu *vcpu) +{ + return kvm_x86_ops->get_cpl(vcpu); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .fetch = kvm_fetch_guest_virt, @@ -3450,6 +3455,7 @@ static struct x86_emulate_ops emulate_ops = { .cmpxchg_emulated= emulator_cmpxchg_emulated, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, + .cpl = emulator_get_cpl, }; static void cache_all_regs(struct kvm_vcpu *vcpu) -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index b89a8f2..46a7ee3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* LOCK prefix is allowed only with some instructions */ - if (c->lock_prefix && !(c->d & Lock)) { + if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) { kvm_queue_exception(ctxt->vcpu, UD_VECTOR); goto done; } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 23/30] KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM
Add decoding of X,Y parameters from Intel SDM which are used by string instruction to specify source and destination. Use this new decoding to implement movs, cmps, stos, lods in a generic way. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 125 +--- 1 files changed, 44 insertions(+), 81 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 55b8a8b..6ebd642 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -51,6 +51,7 @@ #define DstReg (2<<1) /* Register operand. */ #define DstMem (3<<1) /* Memory operand. */ #define DstAcc (4<<1) /* Destination Accumulator */ +#define DstDI (5<<1) /* Destination is in ES:(E)DI */ #define DstMask (7<<1) /* Source operand type. */ #define SrcNone (0<<4) /* No source operand. */ @@ -64,6 +65,7 @@ #define SrcOne (7<<4) /* Implied '1' */ #define SrcImmUByte (8<<4) /* 8-bit unsigned immediate operand. */ #define SrcImmU (9<<4) /* Immediate operand, unsigned */ +#define SrcSI (0xa<<4) /* Source is in the DS:RSI */ #define SrcMask (0xf<<4) /* Generic ModRM decode. */ #define ModRM (1<<8) @@ -177,12 +179,12 @@ static u32 opcode_table[256] = { /* 0xA0 - 0xA7 */ ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs, ByteOp | DstMem | SrcReg | Mov | MemAbs, DstMem | SrcReg | Mov | MemAbs, - ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | String, ImplicitOps | String, + ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String, + ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String, /* 0xA8 - 0xAF */ - 0, 0, ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, - ByteOp | ImplicitOps | String, ImplicitOps | String, + 0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String, + ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String, + ByteOp | DstDI | String, DstDI | String, /* 0xB0 - 0xB7 */ ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov, @@ -1145,6 +1147,14 @@ done_prefixes: c->src.bytes = 1; c->src.val = 1; break; + case SrcSI: + c->src.type = OP_MEM; + c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; + c->src.ptr = (unsigned long *) + register_address(c, seg_override_base(ctxt, c), +c->regs[VCPU_REGS_RSI]); + c->src.val = 0; + break; } /* @@ -1230,6 +1240,14 @@ done_prefixes: } c->dst.orig_val = c->dst.val; break; + case DstDI: + c->dst.type = OP_MEM; + c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; + c->dst.ptr = (unsigned long *) + register_address(c, es_base(ctxt), +c->regs[VCPU_REGS_RDI]); + c->dst.val = 0; + break; } done: @@ -2388,6 +2406,16 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, return rc; } +static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base, + int reg, unsigned long **ptr) +{ + struct decode_cache *c = &ctxt->decode; + int df = (ctxt->eflags & EFLG_DF) ? -1 : 1; + + register_address_increment(c, &c->regs[reg], df * c->src.bytes); + *ptr = (unsigned long *)register_address(c, base, c->regs[reg]); +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { @@ -2750,89 +2778,16 @@ special_insn: c->dst.val = (unsigned long)c->regs[VCPU_REGS_RAX]; break; case 0xa4 ... 0xa5: /* movs */ - c->dst.type = OP_MEM; - c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; - c->dst.ptr = (unsigned long *)register_address(c, - es_base(ctxt), - c->regs[VCPU_REGS_RDI]); - rc = ops->read_emulated(register_address(c, - seg_override_base(ctxt, c), - c->regs[VCPU_REGS_RSI]), - &c->dst.val, - c->dst.bytes, ctxt->vcpu); - if (rc != X86EMUL_CONTINUE) - goto done; - register_address_increment(c, &c->regs[VCPU_REGS_RSI], - (ctxt->eflags & EFLG_DF) ? -c->dst.bytes -
[PATCH v3 02/30] KVM: x86 emulator: fix RCX access during rep emulation
During rep emulation access length to RCX depends on current address mode. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0b70a36..4dce805 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1852,7 +1852,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if (c->rep_prefix && (c->d & String)) { /* All REP prefixes have the same first termination condition */ - if (c->regs[VCPU_REGS_RCX] == 0) { + if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) { kvm_rip_write(ctxt->vcpu, c->eip); goto done; } @@ -1876,7 +1876,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } } - c->regs[VCPU_REGS_RCX]--; + register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); c->eip = kvm_rip_read(ctxt->vcpu); } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 19/30] KVM: x86 emulator: Emulate task switch in emulator.c
Implement emulation of 16/32 bit task switch in emulator.c Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |5 + arch/x86/kvm/emulate.c | 563 2 files changed, 568 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f901467..bd46929 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -11,6 +11,8 @@ #ifndef _ASM_X86_KVM_X86_EMULATE_H #define _ASM_X86_KVM_X86_EMULATE_H +#include + struct x86_emulate_ctxt; /* @@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); +int emulator_task_switch(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 tss_selector, int reason); #endif /* _ASM_X86_KVM_X86_EMULATE_H */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d696cbd..db4776c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -33,6 +33,7 @@ #include #include "x86.h" +#include "tss.h" /* * Opcode effective-address decode tables. @@ -1221,6 +1222,198 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static u32 desc_limit_scaled(struct desc_struct *desc) +{ + u32 limit = get_desc_limit(desc); + + return desc->g ? (limit << 12) | 0xfff : limit; +} + +static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 selector, struct desc_ptr *dt) +{ + if (selector & 1 << 2) { + struct desc_struct desc; + memset (dt, 0, sizeof *dt); + if (!ops->get_cached_descriptor(&desc, VCPU_SREG_LDTR, ctxt->vcpu)) + return; + + dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? */ + dt->address = get_desc_base(&desc); + } else + ops->get_gdt(dt, ctxt->vcpu); +} + +/* allowed just for 8 bytes segments */ +static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector >> 3; + int ret; + u32 err; + ulong addr; + + get_descriptor_table_ptr(ctxt, ops, selector, &dt); + + if (dt.size < index * 8 + 7) { + kvm_inject_gp(ctxt->vcpu, selector & 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + addr = dt.address + index * 8; + ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu, &err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt->vcpu, addr, err); + + return ret; +} + +/* allowed just for 8 bytes segments */ +static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector >> 3; + u32 err; + ulong addr; + int ret; + + get_descriptor_table_ptr(ctxt, ops, selector, &dt); + + if (dt.size < index * 8 + 7) { + kvm_inject_gp(ctxt->vcpu, selector & 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + + addr = dt.address + index * 8; + ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt->vcpu, addr, err); + + return ret; +} + +static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, int seg) +{ + struct desc_struct seg_desc; + u8 dpl, rpl, cpl; + unsigned err_vec = GP_VECTOR; + u32 err_code = 0; + bool null_selector = !(selector & ~0x3); /* -0003 are null */ + int ret; + + memset(&seg_desc, 0, sizeof seg_desc); + + if ((seg <= VCPU_SREG_GS && ctxt->mode == X86EMUL_MODE_VM86) + || ctxt->mode == X86EMUL_MODE_REAL) { + /* set real mode segment descriptor */ + set_desc_base(&seg_desc, selector << 4); + set_desc_limit(&seg_desc, 0x); + seg_desc.type = 3; + seg_desc.p = 1; + seg_desc.s = 1; + goto load; + } + + /* NULL selector is not valid for TR, CS and SS */ + if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR) + && null_selector) + goto exception; + + /* TR should be in GDT
[PATCH v3 27/30] KVM: x86 emulator: remove saved_eip
c->eip is never written back in case of emulation failure, so no need to set it to old value. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |9 + 1 files changed, 1 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1bedbb6..541f3c9 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2420,7 +2420,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { u64 msr_data; - unsigned long saved_eip = 0; struct decode_cache *c = &ctxt->decode; int rc = X86EMUL_CONTINUE; @@ -2432,7 +2431,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) */ memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); - saved_eip = c->eip; if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) { kvm_queue_exception(ctxt->vcpu, UD_VECTOR); @@ -2923,11 +2921,7 @@ writeback: kvm_rip_write(ctxt->vcpu, c->eip); done: - if (rc == X86EMUL_UNHANDLEABLE) { - c->eip = saved_eip; - return -1; - } - return 0; + return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; twobyte_insn: switch (c->b) { @@ -3204,6 +3198,5 @@ twobyte_insn: cannot_emulate: DPRINTF("Cannot emulate %02x\n", c->b); - c->eip = saved_eip; return -1; } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 18/30] KVM: x86 emulator: Provide more callbacks for x86 emulator.
Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0765725..f901467 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* +* write_std: Write bytes of standard (non-emulated/special) memory. +*Used for descriptor writing. +* @addr: [IN ] Linear address to which to write. +* @val: [OUT] Value write to memory, zero-extended to 'u_long'. +* @bytes: [IN ] Number of bytes to write to memory. +*/ + int (*write_std)(unsigned long addr, void *val, +unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); + /* * fetch: Read bytes of standard (non-emulated/special) memory. *Used for instruction fetch. * @addr: [IN ] Linear address from which to read. @@ -108,6 +117,13 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct desc_struct *desc, + int seg, struct kvm_vcpu *vcpu); + void (*set_cached_descriptor)(struct desc_struct *desc, + int seg, struct kvm_vcpu *vcpu); + u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu); + void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu); + void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 022d28e..2ef83db 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3050,6 +3050,18 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v); } +static void kvm_set_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg) +{ + kvm_x86_ops->set_segment(vcpu, var, seg); +} + +void kvm_get_segment(struct kvm_vcpu *vcpu, +struct kvm_segment *var, int seg) +{ + kvm_x86_ops->get_segment(vcpu, var, seg); +} + gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) { u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; @@ -3130,14 +3142,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); } -static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, - struct kvm_vcpu *vcpu, u32 *error) +static int kvm_write_guest_virt_helper(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 access, + u32 *error) { void *data = val; int r = X86EMUL_CONTINUE; + access |= PFERR_WRITE_MASK; + while (bytes) { - gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error); + gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error); unsigned offset = addr & (PAGE_SIZE-1); unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); int ret; @@ -3160,6 +3176,19 @@ out: return r; } +static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, error); +} + +static int kvm_write_guest_virt_system(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error); +} static int emulator_read_emulated(unsigned long addr, void *val, @@ -3447,12 +3476,95 @@ static int emulator_get_cpl(struct kvm_vcpu *vcpu) return kvm_x86_ops->get_cpl(vcpu); } +static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) +{ + kvm_x86_ops->get_g
[PATCH v3 26/30] KVM: x86 emulator: Move string pio emulation into emulator.c
Currently emulation is done outside of emulator so things like doing ins/outs to/from mmio are broken it also makes it hard (if not impossible) to implement single stepping in the future. The implementation in this patch is not efficient since it exits to userspace for each IO while previous implementation did 'ins' in batches. Further patch that implements pio in string read ahead address this problem. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |8 -- arch/x86/kvm/emulate.c | 48 +++-- arch/x86/kvm/x86.c | 204 +++ 3 files changed, 31 insertions(+), 229 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4a4fb8d..c072401 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer { struct kvm_pio_request { unsigned long count; - int cur_count; - gva_t guest_gva; int in; int port; int size; - int string; - int down; - int rep; }; /* @@ -590,9 +585,6 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); struct x86_emulate_ctxt; int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); -int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in, - int size, unsigned long count, int down, - gva_t address, int rep, unsigned port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 873da58..1bedbb6 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -153,8 +153,8 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, /* 0x68 - 0x6F */ SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* insb, insw/insd */ - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* outsb, outsw/outsd */ + DstDI | ByteOp | Mov | String, DstDI | Mov | String, /* insb, insw/insd */ + SrcSI | ByteOp | ImplicitOps | String, SrcSI | ImplicitOps | String, /* outsb, outsw/outsd */ /* 0x70 - 0x77 */ SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, @@ -2611,47 +2611,29 @@ special_insn: break; case 0x6c: /* insb */ case 0x6d: /* insw/insd */ + c->dst.bytes = min(c->dst.bytes, 4u); if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX], - (c->d & ByteOp) ? 1 : c->op_bytes)) { + c->dst.bytes)) { kvm_inject_gp(ctxt->vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt->vcpu, - 1, - (c->d & ByteOp) ? 1 : c->op_bytes, - c->rep_prefix ? - address_mask(c, c->regs[VCPU_REGS_RCX]) : 1, - (ctxt->eflags & EFLG_DF), - register_address(c, es_base(ctxt), -c->regs[VCPU_REGS_RDI]), - c->rep_prefix, - c->regs[VCPU_REGS_RDX]) == 0) { - c->eip = saved_eip; - return -1; - } - return 0; + if (!ops->pio_in_emulated(c->dst.bytes, c->regs[VCPU_REGS_RDX], + &c->dst.val, 1, ctxt->vcpu)) + goto done; /* IO is needed, skip writeback */ + break; case 0x6e: /* outsb */ case 0x6f: /* outsw/outsd */ + c->src.bytes = min(c->src.bytes, 4u); if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX], - (c->d & ByteOp) ? 1 : c->op_bytes)) { + c->src.bytes)) { kvm_inject_gp(ctxt->vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt->vcpu, - 0, - (c->d & ByteOp) ? 1 : c->op_bytes, - c->rep_prefix ? - address_mask(c, c->regs[VCPU_REGS_RCX]) : 1, - (ctxt->eflags & EFLG_DF), -register_address(c, - seg_override_base(ctxt, c), -c->regs[VCPU_REGS_RSI]), -
[PATCH v3 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode, but return to a guest mode each 1024 iterations to allow interrupt injection. Pending exception causes immediate guest entry too. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 34 +++--- arch/x86/kvm/x86.c | 19 ++- 3 files changed, 42 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 679245c..7fda16f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -193,6 +193,7 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; + bool restart; /* restart string instruction after writeback */ /* decode cache */ struct decode_cache decode; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 541f3c9..c4da60e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -927,8 +927,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int mode = ctxt->mode; int def_op_bytes, def_ad_bytes, group; - /* Shadow copy of register state. Committed on successful emulation. */ + /* we cannot decode insn before we complete previous rep insn */ + WARN_ON(ctxt->restart); + + /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); c->eip = ctxt->eip; ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS); @@ -2422,6 +2425,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) u64 msr_data; struct decode_cache *c = &ctxt->decode; int rc = X86EMUL_CONTINUE; + int saved_dst_type = c->dst.type; ctxt->interruptibility = 0; @@ -2450,8 +2454,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c->rep_prefix && (c->d & String)) { + ctxt->restart = true; /* All REP prefixes have the same first termination condition */ if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) { + string_done: + ctxt->restart = false; kvm_rip_write(ctxt->vcpu, c->eip); goto done; } @@ -2463,17 +2470,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) * - if REPNE/REPNZ and ZF = 1 then done */ if ((c->b == 0xa6) || (c->b == 0xa7) || - (c->b == 0xae) || (c->b == 0xaf)) { + (c->b == 0xae) || (c->b == 0xaf)) { if ((c->rep_prefix == REPE_PREFIX) && - ((ctxt->eflags & EFLG_ZF) == 0)) { - kvm_rip_write(ctxt->vcpu, c->eip); - goto done; - } + ((ctxt->eflags & EFLG_ZF) == 0)) + goto string_done; if ((c->rep_prefix == REPNE_PREFIX) && - ((ctxt->eflags & EFLG_ZF) == EFLG_ZF)) { - kvm_rip_write(ctxt->vcpu, c->eip); - goto done; - } + ((ctxt->eflags & EFLG_ZF) == EFLG_ZF)) + goto string_done; } c->eip = ctxt->eip; } @@ -2906,6 +2909,12 @@ writeback: if (rc != X86EMUL_CONTINUE) goto done; + /* +* restore dst type in case the decoding will be reused +* (happens for string instruction ) +*/ + c->dst.type = saved_dst_type; + if ((c->d & SrcMask) == SrcSI) string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI, &c->src); @@ -2913,8 +2922,11 @@ writeback: if ((c->d & DstMask) == DstDI) string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst); - if (c->rep_prefix && (c->d & String)) + if (c->rep_prefix && (c->d & String)) { register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); + if (!(c->regs[VCPU_REGS_RCX] & 0x3ff)) + ctxt->restart = false; + } /* Commit shadow register state. */ memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b8237ac..cd0043a 100644 --- a/arch/x86/kvm/
[PATCH v3 30/30] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
Unify all conditions that get us back into emulator after returning from userspace. Signed-off-by: Gleb Natapov --- arch/x86/kvm/x86.c | 32 ++-- 1 files changed, 6 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cd0043a..1c00c06 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4505,33 +4505,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if (!irqchip_in_kernel(vcpu->kvm)) kvm_set_cr8(vcpu, kvm_run->cr8); - if (vcpu->arch.pio.count) { - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); - if (r == EMULATE_DO_MMIO) { - r = 0; - goto out; + if (vcpu->arch.pio.count || vcpu->mmio_needed || + vcpu->arch.emulate_ctxt.restart) { + if (vcpu->mmio_needed) { + memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); + vcpu->mmio_read_completed = 1; + vcpu->mmio_needed = 0; } - } - if (vcpu->mmio_needed) { - memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); - vcpu->mmio_read_completed = 1; - vcpu->mmio_needed = 0; - - vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = emulate_instruction(vcpu, vcpu->arch.mmio_fault_cr2, 0, - EMULTYPE_NO_DECODE); - srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); - if (r == EMULATE_DO_MMIO) { - /* -* Read-modify-write. Back to userspace. -*/ - r = 0; - goto out; - } - } - if (vcpu->arch.emulate_ctxt.restart) { vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 21/30] KVM: Use task switch from emulator.c
Remove old task switch code from x86.c Signed-off-by: Gleb Natapov --- arch/x86/kvm/x86.c | 557 ++-- 1 files changed, 17 insertions(+), 540 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ef83db..7d1b481 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4795,553 +4795,30 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return 0; } -static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector, - struct kvm_segment *kvm_desct) -{ - kvm_desct->base = get_desc_base(seg_desc); - kvm_desct->limit = get_desc_limit(seg_desc); - if (seg_desc->g) { - kvm_desct->limit <<= 12; - kvm_desct->limit |= 0xfff; - } - kvm_desct->selector = selector; - kvm_desct->type = seg_desc->type; - kvm_desct->present = seg_desc->p; - kvm_desct->dpl = seg_desc->dpl; - kvm_desct->db = seg_desc->d; - kvm_desct->s = seg_desc->s; - kvm_desct->l = seg_desc->l; - kvm_desct->g = seg_desc->g; - kvm_desct->avl = seg_desc->avl; - if (!selector) - kvm_desct->unusable = 1; - else - kvm_desct->unusable = 0; - kvm_desct->padding = 0; -} - -static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu, - u16 selector, - struct desc_ptr *dtable) -{ - if (selector & 1 << 2) { - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, &kvm_seg, VCPU_SREG_LDTR); - - if (kvm_seg.unusable) - dtable->size = 0; - else - dtable->size = kvm_seg.limit; - dtable->address = kvm_seg.base; - } - else - kvm_x86_ops->get_gdt(vcpu, dtable); -} - -/* allowed just for 8 bytes segments */ -static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector >> 3; - int ret; - u32 err; - gva_t addr; - - get_segment_descriptor_dtable(vcpu, selector, &dtable); - - if (dtable.size < index * 8 + 7) { - kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc); - return X86EMUL_PROPAGATE_FAULT; - } - addr = dtable.address + index * 8; - ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc), -vcpu, &err); - if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(vcpu, addr, err); - - return ret; -} - -/* allowed just for 8 bytes segments */ -static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector >> 3; - - get_segment_descriptor_dtable(vcpu, selector, &dtable); - - if (dtable.size < index * 8 + 7) - return 1; - return kvm_write_guest_virt(dtable.address + index*8, seg_desc, sizeof(*seg_desc), vcpu, NULL); -} - -static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu, - struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL); -} - -static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu, -struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL); -} - -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg) -{ - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, &kvm_seg, seg); - return kvm_seg.selector; -} - -static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment segvar = { - .base = selector << 4, - .limit = 0x, - .selector = selector, - .type = 3, - .present = 1, - .dpl = 3, - .db = 0, - .s = 1, - .l = 0, - .g = 0, - .avl = 0, - .unusable = 0, - }; - kvm_x86_ops->set_segment(vcpu, &segvar, seg); - return X86EMUL_CONTINUE; -} - -static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg) -{ - return (seg != VCPU_SREG_LDTR) && - (seg != VCPU_SREG_TR) && - (kvm_get_rflags(vcpu) & X86_EFLAGS_VM); -} - -int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment kvm_seg; - struct desc_struct seg_desc; - u8 dpl, rpl, cpl; - unsigned err_vec = GP_VECTOR; -
[PATCH v3 24/30] KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6ebd642..a166235 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2407,13 +2407,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, } static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base, - int reg, unsigned long **ptr) + int reg, struct operand *op) { struct decode_cache *c = &ctxt->decode; int df = (ctxt->eflags & EFLG_DF) ? -1 : 1; - register_address_increment(c, &c->regs[reg], df * c->src.bytes); - *ptr = (unsigned long *)register_address(c, base, c->regs[reg]); + register_address_increment(c, &c->regs[reg], df * op->bytes); + op->ptr = (unsigned long *)register_address(c, base, c->regs[reg]); } int @@ -2479,7 +2479,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } } - register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); c->eip = ctxt->eip; } @@ -2932,11 +2931,13 @@ writeback: if ((c->d & SrcMask) == SrcSI) string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI, - &c->src.ptr); + &c->src); if ((c->d & DstMask) == DstDI) - string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, - &c->dst.ptr); + string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst); + + if (c->rep_prefix && (c->d & String)) + register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); /* Commit shadow register state. */ memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 22/30] KVM: x86 emulator: populate OP_MEM operand during decoding.
All struct operand fields are initialized during decoding for all operand types except OP_MEM, but there is no reason for that. Move OP_MEM operand initialization into decoding stage for consistency. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 66 +--- 1 files changed, 29 insertions(+), 37 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 702bfff..55b8a8b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1057,6 +1057,10 @@ done_prefixes: if (c->ad_bytes != 8) c->modrm_ea = (u32)c->modrm_ea; + + if (c->rip_relative) + c->modrm_ea += c->eip; + /* * Decode and fetch the source operand: register, memory * or immediate. @@ -1091,6 +1095,8 @@ done_prefixes: break; } c->src.type = OP_MEM; + c->src.ptr = (unsigned long *)c->modrm_ea; + c->src.val = 0; break; case SrcImm: case SrcImmU: @@ -1169,8 +1175,10 @@ done_prefixes: c->src2.val = 1; break; case Src2Mem16: - c->src2.bytes = 2; c->src2.type = OP_MEM; + c->src2.bytes = 2; + c->src2.ptr = (unsigned long *)(c->modrm_ea + c->src.bytes); + c->src2.val = 0; break; } @@ -1192,6 +1200,15 @@ done_prefixes: break; } c->dst.type = OP_MEM; + c->dst.ptr = (unsigned long *)c->modrm_ea; + c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; + c->dst.val = 0; + if (c->d & BitOp) { + unsigned long mask = ~(c->dst.bytes * 8 - 1); + + c->dst.ptr = (void *)c->dst.ptr + + (c->src.val & mask) / 8; + } break; case DstAcc: c->dst.type = OP_REG; @@ -1215,9 +1232,6 @@ done_prefixes: break; } - if (c->rip_relative) - c->modrm_ea += c->eip; - done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } @@ -1638,14 +1652,13 @@ static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt, } static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, - struct x86_emulate_ops *ops, - unsigned long memop) + struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; u64 old, new; int rc; - rc = ops->read_emulated(memop, &old, 8, ctxt->vcpu); + rc = ops->read_emulated(c->modrm_ea, &old, 8, ctxt->vcpu); if (rc != X86EMUL_CONTINUE) return rc; @@ -1660,7 +1673,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt, new = ((u64)c->regs[VCPU_REGS_RCX] << 32) | (u32) c->regs[VCPU_REGS_RBX]; - rc = ops->cmpxchg_emulated(memop, &old, &new, 8, ctxt->vcpu); + rc = ops->cmpxchg_emulated(c->modrm_ea, &old, &new, 8, ctxt->vcpu); if (rc != X86EMUL_CONTINUE) return rc; ctxt->eflags |= EFLG_ZF; @@ -2378,7 +2391,6 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { - unsigned long memop = 0; u64 msr_data; unsigned long saved_eip = 0; struct decode_cache *c = &ctxt->decode; @@ -2413,9 +2425,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } - if (((c->d & ModRM) && (c->modrm_mod != 3)) || (c->d & MemAbs)) - memop = c->modrm_ea; - if (c->rep_prefix && (c->d & String)) { /* All REP prefixes have the same first termination condition */ if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) { @@ -2447,8 +2456,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c->src.type == OP_MEM) { - c->src.ptr = (unsigned long *)memop; - c->src.val = 0; rc = ops->read_emulated((unsigned long)c->src.ptr, &c->src.val, c->src.bytes, @@ -2459,8 +2466,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c->src2.type == OP_MEM) { - c->src2.ptr = (unsigned long *)(memop + c->src.bytes); - c->src2.val = 0; rc = ops->read_emulated((unsigned long)c->src2.ptr, &c->src2.val, c->src2.bytes, @@ -2473,25 +2478,12 @@
[PATCH v3 29/30] KVM: x86 emulator: introduce pio in string read ahead.
To optimize "rep ins" instruction do IO in big chunks ahead of time instead of doing it only when required during instruction emulation. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |7 ++ arch/x86/kvm/emulate.c | 43 +++ 2 files changed, 45 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 7fda16f..b5e12c5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -151,6 +151,12 @@ struct fetch_cache { unsigned long end; }; +struct read_cache { + u8 data[1024]; + unsigned long pos; + unsigned long end; +}; + struct decode_cache { u8 twobyte; u8 b; @@ -178,6 +184,7 @@ struct decode_cache { void *modrm_ptr; unsigned long modrm_val; struct fetch_cache fetch; + struct read_cache io_read; }; struct x86_emulate_ctxt { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c4da60e..d9cf93b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1257,6 +1257,34 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + unsigned int size, unsigned short port, + void *dest) +{ + struct read_cache *rc = &ctxt->decode.io_read; + + if (rc->pos == rc->end) { /* refill pio read ahead */ + struct decode_cache *c = &ctxt->decode; + unsigned int in_page, n; + unsigned int count = c->rep_prefix ? + address_mask(c, c->regs[VCPU_REGS_RCX]) : 1; + in_page = (ctxt->eflags & EFLG_DF) ? + offset_in_page(c->regs[VCPU_REGS_RDI]) : + PAGE_SIZE - offset_in_page(c->regs[VCPU_REGS_RDI]); + n = min(min(in_page, (unsigned int)sizeof(rc->data)) / size, + count); + rc->pos = rc->end = 0; + if (!ops->pio_in_emulated(size, port, rc->data, n, ctxt->vcpu)) + return 0; + rc->end = n * size; + } + + memcpy(dest, rc->data + rc->pos, size); + rc->pos += size; + return 1; +} + static u32 desc_limit_scaled(struct desc_struct *desc) { u32 limit = get_desc_limit(desc); @@ -2618,8 +2646,8 @@ special_insn: kvm_inject_gp(ctxt->vcpu, 0); goto done; } - if (!ops->pio_in_emulated(c->dst.bytes, c->regs[VCPU_REGS_RDX], - &c->dst.val, 1, ctxt->vcpu)) + if (!pio_in_emulated(ctxt, ops, c->dst.bytes, +c->regs[VCPU_REGS_RDX], &c->dst.val)) goto done; /* IO is needed, skip writeback */ break; case 0x6e: /* outsb */ @@ -2835,8 +2863,7 @@ special_insn: kvm_inject_gp(ctxt->vcpu, 0); goto done; } - ops->pio_in_emulated(c->dst.bytes, c->src.val, &c->dst.val, 1, -ctxt->vcpu); + pio_in_emulated(ctxt, ops, c->dst.bytes, c->src.val, &c->dst.val); break; case 0xee: /* out al,dx */ case 0xef: /* out (e/r)ax,dx */ @@ -2923,8 +2950,14 @@ writeback: string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst); if (c->rep_prefix && (c->d & String)) { + struct read_cache *rc = &ctxt->decode.io_read; register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); - if (!(c->regs[VCPU_REGS_RCX] & 0x3ff)) + /* +* Re-enter guest when pio read ahead buffer is empty or, +* if it is not used, after each 1024 iteration. +*/ + if ((rc->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) || + (rc->end != 0 && rc->end == rc->pos)) ctxt->restart = false; } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 17/30] KVM: x86 emulator: cleanup grp3 return value
When x86_emulate_insn() does not know how to emulate instruction it exits via cannot_emulate label in all cases except when emulating grp3. Fix that. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 12 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 46a7ee3..d696cbd 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1397,7 +1397,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; - int rc = X86EMUL_CONTINUE; switch (c->modrm_reg) { case 0 ... 1: /* test */ @@ -1410,11 +1409,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, emulate_1op("neg", c->dst, ctxt->eflags); break; default: - DPRINTF("Cannot emulate %02x\n", c->b); - rc = X86EMUL_UNHANDLEABLE; - break; + return 0; } - return rc; + return 1; } static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt, @@ -2374,9 +2371,8 @@ special_insn: c->dst.type = OP_NONE; /* Disable writeback. */ break; case 0xf6 ... 0xf7: /* Grp3 */ - rc = emulate_grp3(ctxt, ops); - if (rc != X86EMUL_CONTINUE) - goto done; + if (!emulate_grp3(ctxt, ops)) + goto cannot_emulate; break; case 0xf8: /* clc */ ctxt->eflags &= ~EFLG_CF; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 25/30] KVM: x86 emulator: fix in/out emulation.
in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |7 + arch/x86/include/asm/kvm_host.h|3 +- arch/x86/kvm/emulate.c | 49 - arch/x86/kvm/svm.c | 20 +-- arch/x86/kvm/vmx.c | 18 ++-- arch/x86/kvm/x86.c | 213 ++-- 6 files changed, 177 insertions(+), 133 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index bd46929..679245c 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -119,6 +119,13 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); + + int (*pio_in_emulated)(int size, unsigned short port, void *val, + unsigned int count, struct kvm_vcpu *vcpu); + + int (*pio_out_emulated)(int size, unsigned short port, const void *val, + unsigned int count, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct desc_struct *desc, int seg, struct kvm_vcpu *vcpu); void (*set_cached_descriptor)(struct desc_struct *desc, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 72997aa..4a4fb8d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -589,8 +589,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); struct x86_emulate_ctxt; -int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in, -int size, unsigned port); +int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in, int size, unsigned long count, int down, gva_t address, int rep, unsigned port); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a166235..873da58 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -210,13 +210,13 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, /* 0xE0 - 0xE7 */ 0, 0, 0, 0, - ByteOp | SrcImmUByte, SrcImmUByte, - ByteOp | SrcImmUByte, SrcImmUByte, + ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, + ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, + SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, + SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, /* 0xF0 - 0xF7 */ 0, 0, 0, 0, ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3, @@ -2422,8 +2422,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) u64 msr_data; unsigned long saved_eip = 0; struct decode_cache *c = &ctxt->decode; - unsigned int port; - int io_dir_in; int rc = X86EMUL_CONTINUE; ctxt->interruptibility = 0; @@ -2819,14 +2817,10 @@ special_insn: break; case 0xe4: /* inb */ case 0xe5: /* in */ - port = c->src.val; - io_dir_in = 1; - goto do_io; + goto do_io_in; case 0xe6: /* outb */ case 0xe7: /* out */ - port = c->src.val; - io_dir_in = 0; - goto do_io; + goto do_io_out; case 0xe8: /* call (near) */ { long int rel = c->src.val; c->src.val = (unsigned long) c->eip; @@ -2851,25 +2845,28 @@ special_insn: break; case 0xec: /* in al,dx */ case 0xed: /* in (e/r)ax,dx */ - port = c->regs[VCPU_REGS_RDX]; - io_dir_in = 1; - goto do_io; + c->src.val = c->regs[VCPU_REGS_RDX]; + do_io_in: + c->dst.bytes = min(c->dst.bytes, 4u); + if (!emulator_io_permited(ctxt, ops, c->src.val, c->dst.bytes)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } + ops->pio_in_emulated(c->dst.bytes, c->src.val, &c->dst.val, 1, +ctxt->vcpu); +
[PATCH v3 05/30] KVM: Provide callback to get/set control registers in emulator ops.
Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |3 +- arch/x86/include/asm/kvm_host.h|2 - arch/x86/kvm/emulate.c |7 +- arch/x86/kvm/x86.c | 114 ++-- 4 files changed, 63 insertions(+), 63 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 2666d7a..0c5caa4 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -108,7 +108,8 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); - + ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); + void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 8567107..9725856 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); -unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 91450b5..5b060e4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2483,7 +2483,7 @@ twobyte_insn: break; case 4: /* smsw */ c->dst.bytes = 2; - c->dst.val = realmode_get_cr(ctxt->vcpu, 0); + c->dst.val = ops->get_cr(0, ctxt->vcpu); break; case 6: /* lmsw */ realmode_lmsw(ctxt->vcpu, (u16)c->src.val, @@ -2519,8 +2519,7 @@ twobyte_insn: case 0x20: /* mov cr, reg */ if (c->modrm_mod != 3) goto cannot_emulate; - c->regs[c->modrm_rm] = - realmode_get_cr(ctxt->vcpu, c->modrm_reg); + c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ @@ -2534,7 +2533,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c->modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val); + ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu); c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 56cdaa5..fb00ed5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3386,12 +3386,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) } EXPORT_SYMBOL_GPL(kvm_report_emulation_failure); +static u64 mk_cr_64(u64 curr_cr, u32 new_val) +{ + return (curr_cr & ~((1ULL << 32) - 1)) | new_val; +} + +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ + unsigned long value; + + switch (cr) { + case 0: + value = kvm_read_cr0(vcpu); + break; + case 2: + value = vcpu->arch.cr2; + break; + case 3: + value = vcpu->arch.cr3; + break; + case 4: + value = kvm_read_cr4(vcpu); + break; + case 8: + value = kvm_get_cr8(vcpu); + break; + default: + vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); + return 0; + } + + return value; +} + +static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) +{ + switch (cr) { + case 0: + kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); + break; + case 2: + vcpu->arch.cr2 = val; + break; + case 3: + kvm_set_cr3(vcpu, val); + break; + case 4: + kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); + break; + case 8: + kvm_set_cr8(vcpu, val & 0xfUL); + break; + default: + vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); + } +} + static struct x86_emulate_ops emulate_
[PATCH v3 14/30] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD for those instruction when appropriate. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 17 +++-- 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5afddcf..1393bf0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1600,8 +1600,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) u64 msr_data; /* syscall is not available in real mode */ - if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86) - return X86EMUL_UNHANDLEABLE; + if (ctxt->mode == X86EMUL_MODE_REAL || + ctxt->mode == X86EMUL_MODE_VM86) { + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, &cs, &ss); @@ -1651,14 +1654,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) /* inject #GP if in real mode */ if (ctxt->mode == X86EMUL_MODE_REAL) { kvm_inject_gp(ctxt->vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } /* XXX sysenter/sysexit have not been tested in 64bit mode. * Therefore, we inject an #UD. */ - if (ctxt->mode == X86EMUL_MODE_PROT64) - return X86EMUL_UNHANDLEABLE; + if (ctxt->mode == X86EMUL_MODE_PROT64) { + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, &cs, &ss); @@ -1713,7 +1718,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86) { kvm_inject_gp(ctxt->vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } setup_syscalls_segments(ctxt, &cs, &ss); -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 12/30] KVM: x86 emulator: inject #UD on access to non-existing CR
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index fa4604e..836e97b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2520,6 +2520,13 @@ twobyte_insn: c->dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ + switch (c->modrm_reg) { + case 1: + case 5 ... 7: + case 9 ... 15: + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + goto done; + } c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 11/30] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field are ignored. Interestingly enough older spec says that 11 is only valid encoding. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 7c7debb..fa4604e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2520,28 +2520,20 @@ twobyte_insn: c->dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ - if (c->modrm_mod != 3) - goto cannot_emulate; c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (c->modrm_mod != 3) - goto cannot_emulate; if (emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; c->dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ - if (c->modrm_mod != 3) - goto cannot_emulate; ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu); c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (c->modrm_mod != 3) - goto cannot_emulate; if (emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/30] KVM: remove realmode_lmsw function.
Use (get|set)_cr callback to emulate lmsw inside emulator. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |2 -- arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/x86.c |7 --- 3 files changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9725856..72997aa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context); void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5b060e4..5e2fa61 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,8 +2486,8 @@ twobyte_insn: c->dst.val = ops->get_cr(0, ctxt->vcpu); break; case 6: /* lmsw */ - realmode_lmsw(ctxt->vcpu, (u16)c->src.val, - &ctxt->eflags); + ops->set_cr(0, (ops->get_cr(0, ctxt->vcpu) & ~0x0ful) | + (c->src.val & 0x0f), ctxt->vcpu); c->dst.type = OP_NONE; break; case 7: /* invlpg*/ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fb00ed5..b139334 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4061,13 +4061,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) kvm_x86_ops->set_idt(vcpu, &dt); } -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags) -{ - kvm_lmsw(vcpu, msw); - *rflags = kvm_get_rflags(vcpu); -} - static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) { struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i]; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/30] KVM: x86 emulator: fix mov r/m, sreg emulation.
mov r/m, sreg generates #UD ins sreg is incorrect. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2c27aa4..c3b9334 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2126,12 +2126,11 @@ special_insn: case 0x8c: { /* mov r/m, sreg */ struct kvm_segment segreg; - if (c->modrm_reg <= 5) + if (c->modrm_reg <= VCPU_SREG_GS) kvm_get_segment(ctxt->vcpu, &segreg, c->modrm_reg); else { - printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 0x%02x\n", - c->modrm); - goto cannot_emulate; + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + goto done; } c->dst.val = segreg.selector; break; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 10/30] KVM: x86 emulator: fix 0f 01 /5 emulation
It is undefined and should generate #UD. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c3b9334..7c7debb 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2490,6 +2490,9 @@ twobyte_insn: (c->src.val & 0x0f), ctxt->vcpu); c->dst.type = OP_NONE; break; + case 5: /* not defined */ + kvm_queue_exception(ctxt->vcpu, UD_VECTOR); + goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt->vcpu, memop); /* Disable writeback. */ -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/30] emulator cleanup
This is the first series of patches that tries to cleanup emulator code. This is mix of bug fixes and moving code that does emulation from x86.c to emulator.c while making it KVM independent. The status of the patches: works for me. realtime.flat test now also pass where it failed before. ChangeLog: v1->v2: - A couple of new bug fixed - cpl is now x86_emulator_ops callback - during string instruction re-enter guest on each page boundary - retain fast path for pio out (do not go through emulator) v2->v3: - use correct operand length for pio instruction with REX prefix - check for string instruction before decrementing ecx - change guest re-entry condition for string instruction Gleb Natapov (30): KVM: x86 emulator: Fix DstAcc decoding. KVM: x86 emulator: fix RCX access during rep emulation KVM: x86 emulator: check return value against correct define KVM: Remove pointer to rflags from realmode_set_cr parameters. KVM: Provide callback to get/set control registers in emulator ops. KVM: remove realmode_lmsw function. KVM: Provide x86_emulate_ctxt callback to get current cpl KVM: Provide current eip as part of emulator context. KVM: x86 emulator: fix mov r/m, sreg emulation. KVM: x86 emulator: fix 0f 01 /5 emulation KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits. KVM: x86 emulator: inject #UD on access to non-existing CR KVM: x86 emulator: fix mov dr to inject #UD when needed. KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations KVM: x86 emulator: do not call writeback if msr access fails. KVM: x86 emulator: If LOCK prefix is used dest arg should be memory. KVM: x86 emulator: cleanup grp3 return value KVM: x86 emulator: Provide more callbacks for x86 emulator. KVM: x86 emulator: Emulate task switch in emulator.c KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor() KVM: Use task switch from emulator.c KVM: x86 emulator: populate OP_MEM operand during decoding. KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded KVM: x86 emulator: fix in/out emulation. KVM: x86 emulator: Move string pio emulation into emulator.c KVM: x86 emulator: remove saved_eip KVM: x86 emulator: restart string instruction without going back to a guest. KVM: x86 emulator: introduce pio in string read ahead. KVM: small kvm_arch_vcpu_ioctl_run() cleanup. arch/x86/include/asm/kvm_emulate.h | 41 ++- arch/x86/include/asm/kvm_host.h| 16 +- arch/x86/kvm/emulate.c | 1062 ++- arch/x86/kvm/svm.c | 20 +- arch/x86/kvm/vmx.c | 18 +- arch/x86/kvm/x86.c | 1121 +--- 6 files changed, 1146 insertions(+), 1132 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/30] KVM: x86 emulator: Fix DstAcc decoding.
Set correct operation length. Add RAX (64bit) handling. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2832a8c..0b70a36 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1194,9 +1194,9 @@ done_prefixes: break; case DstAcc: c->dst.type = OP_REG; - c->dst.bytes = c->op_bytes; + c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; c->dst.ptr = &c->regs[VCPU_REGS_RAX]; - switch (c->op_bytes) { + switch (c->dst.bytes) { case 1: c->dst.val = *(u8 *)c->dst.ptr; break; @@ -1206,6 +1206,9 @@ done_prefixes: case 4: c->dst.val = *(u32 *)c->dst.ptr; break; + case 8: + c->dst.val = *(u64 *)c->dst.ptr; + break; } c->dst.orig_val = c->dst.val; break; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 03/30] KVM: x86 emulator: check return value against correct define
Check return value against correct define instead of open code the value. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4dce805..670ca8f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -566,7 +566,7 @@ static u32 group2_table[] = { #define insn_fetch(_type, _size, _eip) \ ({ unsigned long _x; \ rc = do_insn_fetch(ctxt, ops, (_eip), &_x, (_size));\ - if (rc != 0)\ + if (rc != X86EMUL_CONTINUE) \ goto done; \ (_eip) += (_size); \ (_type)_x; \ -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 04/30] KVM: Remove pointer to rflags from realmode_set_cr parameters.
Mov reg, cr instruction doesn't change flags in any meaningful way, so no need to update rflags after instruction execution. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |3 +-- arch/x86/kvm/emulate.c |3 +-- arch/x86/kvm/x86.c |4 +--- 3 files changed, 3 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ea1b6c6..8567107 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -586,8 +586,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value, -unsigned long *rflags); +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 670ca8f..91450b5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2534,8 +2534,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c->modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt->vcpu, - c->modrm_reg, c->modrm_val, &ctxt->eflags); + realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val); c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9d02cc7..56cdaa5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4043,13 +4043,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr) return value; } -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val, -unsigned long *rflags) +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val) { switch (cr) { case 0: kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); - *rflags = kvm_get_rflags(vcpu); break; case 2: vcpu->arch.cr2 = val; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.
On Mon, Mar 15, 2010 at 08:59:10AM -0500, Anthony Liguori wrote: > On 03/15/2010 08:46 AM, Espen Berg wrote: > >In our KVM system we have two iSCSI backends (master/slave > >configuration) with failover and two KVM hosts supporting live migration. > > > >The iSCSI volumes are shared by the host as a block device in KVM, and > >the volumes are available on both frontends. After a reboot one of the > >KVMs where not able to start again due to file system corruption. We > >use XFS and have problems to understand what caused the corruption. > > > >We have ruled out the iSCSI backend as both the master and slave data > >where consistent at the time. > > > >Anyone else had similar problems? What is the recommended way to share > >an iSCSI drive among the two host machines? > > > >Should XFS be ok as a file system for live migration? I'm not able to > >find any documentation stating that a clustered file system (GFS2 etc.) > >is recommended. Are there any concurrent writes on the two host > >machines during a livemigtation? > > > > > > > > > > > > > > > > You need to use cache=off if you've got one iscsi drive mounted on two > separate physical machines. FYI, this can be done by changing the disk XML driver to be Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.
On 03/15/2010 08:46 AM, Espen Berg wrote: In our KVM system we have two iSCSI backends (master/slave configuration) with failover and two KVM hosts supporting live migration. The iSCSI volumes are shared by the host as a block device in KVM, and the volumes are available on both frontends. After a reboot one of the KVMs where not able to start again due to file system corruption. We use XFS and have problems to understand what caused the corruption. We have ruled out the iSCSI backend as both the master and slave data where consistent at the time. Anyone else had similar problems? What is the recommended way to share an iSCSI drive among the two host machines? Should XFS be ok as a file system for live migration? I'm not able to find any documentation stating that a clustered file system (GFS2 etc.) is recommended. Are there any concurrent writes on the two host machines during a livemigtation? You need to use cache=off if you've got one iscsi drive mounted on two separate physical machines. The additional layer of caching will result in inconsistency because iSCSI doesn't have a mechanism to provide cache coherence between two nodes. Regards, Anthony Liguori #virsh version Compiled against library: libvir 0.7.6 Using library: libvir 0.7.6 Using API: QEMU 0.7.6 Running hypervisor: QEMU 0.11.0 #uname -a Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64 GNU/Linux Regards Espen -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 08:24 AM, Joerg Roedel wrote: On Mon, Mar 15, 2010 at 03:11:42PM +0200, Avi Kivity wrote: On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. Does it matter for the ept-on-ept case? The initial patchset of nested-vmx implemented it and they reported a performance drop of around 12% between levels which is reasonable. So I expected the loss of io-performance for l2 also reasonable in this case. My small measurement was also done using npt-on-npt. But that was something like kernbench IIRC which is actually exit light once ept is enabled. Network IO is typically exit heavy and becomes something more of a pathological work load (both for nested ept and nested npt). Regards, Anthony Liguori Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to tweak kernel to get the best out of kvm?
On 03/13/10 09:54, Avi Kivity wrote: > > If the slowdown is indeed due to I/O, LVM (with cache=off) should > eliminate it completely. > As promised I have installed LVM: The difference is remarkable. My test case (running 8 vhosts in parallel, each building a Linux kernel) just works. There is no blocking job (by now), all vhosts can be pinged, great. Many thanx for your help, and for the nice software, of course. Regards Harri -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.
In our KVM system we have two iSCSI backends (master/slave configuration) with failover and two KVM hosts supporting live migration. The iSCSI volumes are shared by the host as a block device in KVM, and the volumes are available on both frontends. After a reboot one of the KVMs where not able to start again due to file system corruption. We use XFS and have problems to understand what caused the corruption. We have ruled out the iSCSI backend as both the master and slave data where consistent at the time. Anyone else had similar problems? What is the recommended way to share an iSCSI drive among the two host machines? Should XFS be ok as a file system for live migration? I'm not able to find any documentation stating that a clustered file system (GFS2 etc.) is recommended. Are there any concurrent writes on the two host machines during a livemigtation? #virsh version Compiled against library: libvir 0.7.6 Using library: libvir 0.7.6 Using API: QEMU 0.7.6 Running hypervisor: QEMU 0.11.0 #uname -a Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64 GNU/Linux Regards Espen -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 03:11:42PM +0200, Avi Kivity wrote: > On 03/15/2010 03:03 PM, Joerg Roedel wrote: >> I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. >>> Our experiments show that nested device assignment is pretty much >>> required for I/O performance in nested scenarios. >>> >> Really? I did a small test with virtio-blk in a nested guest (disk read >> with dd, so not a real benchmark) and got a reasonable read-performance >> of around 25MB/s from the disk in the l2-guest. >> >> > > Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. > > I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do > for other guests. Does it matter for the ept-on-ept case? The initial patchset of nested-vmx implemented it and they reported a performance drop of around 12% between levels which is reasonable. So I expected the loss of io-performance for l2 also reasonable in this case. My small measurement was also done using npt-on-npt. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 08:11 AM, Avi Kivity wrote: On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to be costly. KVM is a bit unusual in terms of how many times the instructions are executed per exit. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl
On Mon, Mar 15, 2010 at 02:16:01PM +0100, Andre Przywara wrote: > Gleb, > > what is the purpose of this patch? Is this a preparation for > something upcoming? I don't see a reason to change this, in my eyes > it is not a simplification. > To make emulator independent of KVM. All direct calls from emulator to KVM will be changed to callbacks. > Regards, > Andre. > > > Gleb Natapov wrote: > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/include/asm/kvm_emulate.h |1 + > > arch/x86/kvm/emulate.c | 15 --- > > arch/x86/kvm/x86.c |6 ++ > > 3 files changed, 15 insertions(+), 7 deletions(-) > > > >diff --git a/arch/x86/include/asm/kvm_emulate.h > >b/arch/x86/include/asm/kvm_emulate.h > >index 0c5caa4..b048fd2 100644 > >--- a/arch/x86/include/asm/kvm_emulate.h > >+++ b/arch/x86/include/asm/kvm_emulate.h > >@@ -110,6 +110,7 @@ struct x86_emulate_ops { > > struct kvm_vcpu *vcpu); > > ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); > > void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); > >+int (*cpl)(struct kvm_vcpu *vcpu); > > }; > > /* Type, address-of, and value of an instruction's operand. */ > >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > >index 5e2fa61..8bd0557 100644 > >--- a/arch/x86/kvm/emulate.c > >+++ b/arch/x86/kvm/emulate.c > >@@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, > > int rc; > > unsigned long val, change_mask; > > int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; > >-int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu); > >+int cpl = ops->cpl(ctxt->vcpu); > > rc = emulate_pop(ctxt, ops, &val, len); > > if (rc != X86EMUL_CONTINUE) > >@@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) > > return X86EMUL_CONTINUE; > > } > >-static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) > >+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt, > >+ struct x86_emulate_ops *ops) > > { > > int iopl; > > if (ctxt->mode == X86EMUL_MODE_REAL) > >@@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt > >*ctxt) > > if (ctxt->mode == X86EMUL_MODE_VM86) > > return true; > > iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; > >-return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl; > >+return ops->cpl(ctxt->vcpu) > iopl; > > } > > static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, > >@@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct > >x86_emulate_ctxt *ctxt, > > struct x86_emulate_ops *ops, > > u16 port, u16 len) > > { > >-if (emulator_bad_iopl(ctxt)) > >+if (emulator_bad_iopl(ctxt, ops)) > > if (!emulator_io_port_access_allowed(ctxt, ops, port, len)) > > return false; > > return true; > >@@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct > >x86_emulate_ops *ops) > > } > > /* Privileged instruction can be executed only in CPL=0 */ > >-if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) { > >+if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) { > > kvm_inject_gp(ctxt->vcpu, 0); > > goto done; > > } > >@@ -2378,7 +2379,7 @@ special_insn: > > c->dst.type = OP_NONE; /* Disable writeback. */ > > break; > > case 0xfa: /* cli */ > >-if (emulator_bad_iopl(ctxt)) > >+if (emulator_bad_iopl(ctxt, ops)) > > kvm_inject_gp(ctxt->vcpu, 0); > > else { > > ctxt->eflags &= ~X86_EFLAGS_IF; > >@@ -2386,7 +2387,7 @@ special_insn: > > } > > break; > > case 0xfb: /* sti */ > >-if (emulator_bad_iopl(ctxt)) > >+if (emulator_bad_iopl(ctxt, ops)) > > kvm_inject_gp(ctxt->vcpu, 0); > > else { > > toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index b08f8a1..3f2a8d3 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -3426,6 +3426,11 @@ static void emulator_set_cr(int cr, unsigned long > >val, struct kvm_vcpu *vcpu) > > } > > } > >+static int emulator_get_cpl(struct kvm_vcpu *vcpu) > >+{ > >+return kvm_x86_ops->get_cpl(vcpu); > >+} > >+ > > static struct x86_emulate_ops emulate_ops = { > > .read_std= kvm_read_guest_virt_system, > > .fetch = kvm_fetch_guest_virt, > >@@ -3434,6 +3439,7 @@ static struct x86_emulate_ops emulate_ops = { > > .cmpxchg_emulated= emulator_cmpxchg_emulated, > > .get_cr = emulator_get_cr, > > .set_cr = emulator_set_cr, > >+.cpl = emulator_get_cpl, > > }; > > static void cache_all_regs(struct kvm_vcpu *vcpu) > > > -- > Andre Przywara >
Re: [PATCH v2 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl
Gleb, what is the purpose of this patch? Is this a preparation for something upcoming? I don't see a reason to change this, in my eyes it is not a simplification. Regards, Andre. Gleb Natapov wrote: Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 15 --- arch/x86/kvm/x86.c |6 ++ 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0c5caa4..b048fd2 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -110,6 +110,7 @@ struct x86_emulate_ops { struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); + int (*cpl)(struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5e2fa61..8bd0557 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, int rc; unsigned long val, change_mask; int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; - int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu); + int cpl = ops->cpl(ctxt->vcpu); rc = emulate_pop(ctxt, ops, &val, len); if (rc != X86EMUL_CONTINUE) @@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } -static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) +static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) { int iopl; if (ctxt->mode == X86EMUL_MODE_REAL) @@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) if (ctxt->mode == X86EMUL_MODE_VM86) return true; iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; - return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl; + return ops->cpl(ctxt->vcpu) > iopl; } static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, @@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, u16 port, u16 len) { - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) if (!emulator_io_port_access_allowed(ctxt, ops, port, len)) return false; return true; @@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* Privileged instruction can be executed only in CPL=0 */ - if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) { + if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) { kvm_inject_gp(ctxt->vcpu, 0); goto done; } @@ -2378,7 +2379,7 @@ special_insn: c->dst.type = OP_NONE; /* Disable writeback. */ break; case 0xfa: /* cli */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt->vcpu, 0); else { ctxt->eflags &= ~X86_EFLAGS_IF; @@ -2386,7 +2387,7 @@ special_insn: } break; case 0xfb: /* sti */ - if (emulator_bad_iopl(ctxt)) + if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt->vcpu, 0); else { toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b08f8a1..3f2a8d3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3426,6 +3426,11 @@ static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) } } +static int emulator_get_cpl(struct kvm_vcpu *vcpu) +{ + return kvm_x86_ops->get_cpl(vcpu); +} + static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .fetch = kvm_fetch_guest_virt, @@ -3434,6 +3439,7 @@ static struct x86_emulate_ops emulate_ops = { .cmpxchg_emulated= emulator_cmpxchg_emulated, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, + .cpl = emulator_get_cpl, }; static void cache_all_regs(struct kvm_vcpu *vcpu) -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 07:42 AM, Avi Kivity wrote: On 03/15/2010 02:38 PM, Joerg Roedel wrote: On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: Hi there, Our wiki page for the Summer of Code 2010 is doing quite well: http://wiki.qemu.org/Google_Summer_of_Code_2010 I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Good idea. If there is interest I could help to mentor this project. Thanks. I volunteered Anthony, but he may be a little overcommitted. Joerg, feel free to put your name against too. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.
On 03/15/2010 03:06 PM, Andre Przywara wrote: Gleb Natapov wrote: Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. Do you mind removing the static before emulator_{set,get}_cr and marking it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon in svm.c ;-) while handling MOV-CR intercepts. Currently most of the code is actually duplicated. Just do that in your patch, that's standard practice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 03:03 PM, Joerg Roedel wrote: I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit. I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do for other guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.
On Mon, Mar 15, 2010 at 02:06:48PM +0100, Andre Przywara wrote: > Gleb Natapov wrote: > >Use this callback instead of directly call kvm function. Also rename > >realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing > >to do with real mode. > Do you mind removing the static before emulator_{set,get}_cr and I don't, but this is not the goal of this patch series. > marking it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon > in svm.c ;-) while handling MOV-CR intercepts. Currently most of the > code is actually duplicated. > > Also, shouldn't mk_cr_64() not be called mask_cr_64() for better > readability? This is how it is called now, the patch only moves it. But this code will be reworked by later patches anyway since functions called from emulator should not inject exceptions behind emulator's back. > > Regards, > Andre. > > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/include/asm/kvm_emulate.h |3 +- > > arch/x86/include/asm/kvm_host.h|2 - > > arch/x86/kvm/emulate.c |7 +- > > arch/x86/kvm/x86.c | 114 > > ++-- > > 4 files changed, 63 insertions(+), 63 deletions(-) > > > >diff --git a/arch/x86/include/asm/kvm_emulate.h > >b/arch/x86/include/asm/kvm_emulate.h > >index 2666d7a..0c5caa4 100644 > >--- a/arch/x86/include/asm/kvm_emulate.h > >+++ b/arch/x86/include/asm/kvm_emulate.h > >@@ -108,7 +108,8 @@ struct x86_emulate_ops { > > const void *new, > > unsigned int bytes, > > struct kvm_vcpu *vcpu); > >- > >+ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); > >+void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); > > }; > > /* Type, address-of, and value of an instruction's operand. */ > >diff --git a/arch/x86/include/asm/kvm_host.h > >b/arch/x86/include/asm/kvm_host.h > >index 3b178d8..e8e108a 100644 > >--- a/arch/x86/include/asm/kvm_host.h > >+++ b/arch/x86/include/asm/kvm_host.h > >@@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, > >unsigned long address); > > void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, > >unsigned long *rflags); > >-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); > >-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); > > void kvm_enable_efer_bits(u64); > > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); > > int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); > >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > >index 91450b5..5b060e4 100644 > >--- a/arch/x86/kvm/emulate.c > >+++ b/arch/x86/kvm/emulate.c > >@@ -2483,7 +2483,7 @@ twobyte_insn: > > break; > > case 4: /* smsw */ > > c->dst.bytes = 2; > >-c->dst.val = realmode_get_cr(ctxt->vcpu, 0); > >+c->dst.val = ops->get_cr(0, ctxt->vcpu); > > break; > > case 6: /* lmsw */ > > realmode_lmsw(ctxt->vcpu, (u16)c->src.val, > >@@ -2519,8 +2519,7 @@ twobyte_insn: > > case 0x20: /* mov cr, reg */ > > if (c->modrm_mod != 3) > > goto cannot_emulate; > >-c->regs[c->modrm_rm] = > >-realmode_get_cr(ctxt->vcpu, c->modrm_reg); > >+c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu); > > c->dst.type = OP_NONE; /* no writeback */ > > break; > > case 0x21: /* mov from dr to reg */ > >@@ -2534,7 +2533,7 @@ twobyte_insn: > > case 0x22: /* mov reg, cr */ > > if (c->modrm_mod != 3) > > goto cannot_emulate; > >-realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val); > >+ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu); > > c->dst.type = OP_NONE; > > break; > > case 0x23: /* mov from reg to dr */ > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index a1e671a..bf714df 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -3370,12 +3370,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu > >*vcpu, const char *context) > > } > > EXPORT_SYMBOL_GPL(kvm_report_emulation_failure); > >+static u64 mk_cr_64(u64 curr_cr, u32 new_val) > >+{ > >+return (curr_cr & ~((1ULL << 32) - 1)) | new_val; > >+} > >+ > >+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) > >+{ > >+unsigned long value; > >+ > >+switch (cr) { > >+case 0: > >+value = kvm_read_cr0(vcpu); > >+break; > >+case 2: > >+value = vcpu->arch.cr2; > >+break; > >+case 3: > >+value = vcpu->arch.cr3; > >+break; > >+case 4: > >+value = kvm_read_cr4(vcpu); > >+break; > >+case 8: > >+value = kvm_get_cr8(vcpu); > >+break; > >+def
[PATCH rework] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it might be rare, it seems to be suggesting the system's serious state: so we'd better stop the works following the kvm_creat_vm(). This patch clears these problems. We move the coalesced mmio's initialization out of kvm_create_vm(). This seems to be natural because it includes a registration which can be done only when vm is successfully created. Signed-off-by: Takuya Yoshikawa --- virt/kvm/coalesced_mmio.c |2 ++ virt/kvm/kvm_main.c | 12 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm->coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm->coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bcd08b8..c7053aa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -418,9 +418,6 @@ static struct kvm *kvm_create_vm(void) spin_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); spin_unlock(&kvm_lock); -#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - kvm_coalesced_mmio_init(kvm); -#endif out: return kvm; @@ -1748,12 +1745,19 @@ static struct file_operations kvm_vm_fops = { static int kvm_dev_ioctl_create_vm(void) { - int fd; + int fd, r; struct kvm *kvm; kvm = kvm_create_vm(); if (IS_ERR(kvm)) return PTR_ERR(kvm); +#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET + r = kvm_coalesced_mmio_init(kvm); + if (r < 0) { + kvm_put_kvm(kvm); + return r; + } +#endif fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR); if (fd < 0) kvm_put_kvm(kvm); -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.
Gleb Natapov wrote: Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. Do you mind removing the static before emulator_{set,get}_cr and marking it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon in svm.c ;-) while handling MOV-CR intercepts. Currently most of the code is actually duplicated. Also, shouldn't mk_cr_64() not be called mask_cr_64() for better readability? Regards, Andre. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |3 +- arch/x86/include/asm/kvm_host.h|2 - arch/x86/kvm/emulate.c |7 +- arch/x86/kvm/x86.c | 114 ++-- 4 files changed, 63 insertions(+), 63 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 2666d7a..0c5caa4 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -108,7 +108,8 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); - + ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); + void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3b178d8..e8e108a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); -unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 91450b5..5b060e4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2483,7 +2483,7 @@ twobyte_insn: break; case 4: /* smsw */ c->dst.bytes = 2; - c->dst.val = realmode_get_cr(ctxt->vcpu, 0); + c->dst.val = ops->get_cr(0, ctxt->vcpu); break; case 6: /* lmsw */ realmode_lmsw(ctxt->vcpu, (u16)c->src.val, @@ -2519,8 +2519,7 @@ twobyte_insn: case 0x20: /* mov cr, reg */ if (c->modrm_mod != 3) goto cannot_emulate; - c->regs[c->modrm_rm] = - realmode_get_cr(ctxt->vcpu, c->modrm_reg); + c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ @@ -2534,7 +2533,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c->modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val); + ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu); c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a1e671a..bf714df 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3370,12 +3370,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) } EXPORT_SYMBOL_GPL(kvm_report_emulation_failure); +static u64 mk_cr_64(u64 curr_cr, u32 new_val) +{ + return (curr_cr & ~((1ULL << 32) - 1)) | new_val; +} + +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ + unsigned long value; + + switch (cr) { + case 0: + value = kvm_read_cr0(vcpu); + break; + case 2: + value = vcpu->arch.cr2; + break; + case 3: + value = vcpu->arch.cr3; + break; + case 4: + value = kvm_read_cr4(vcpu); + break; + case 8: + value = kvm_get_cr8(vcpu); + break; + default: + vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr); + return 0; + } + + return value; +} + +static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) +{ + switch (cr) { + case 0: + kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); + break; + case 2: + vcpu->arch.cr2 = val; + break; + case 3: + kvm_set_cr3(vcpu, val); +
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 05:53:13AM -0700, Muli Ben-Yehuda wrote: > On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: > > On 03/10/2010 11:30 PM, Luiz Capitulino wrote: > > > > Hi there, > > > > > > Our wiki page for the Summer of Code 2010 is doing quite well: > > > > > >http://wiki.qemu.org/Google_Summer_of_Code_2010 > > > > I will add another project - iommu emulation. Could be very useful > > for doing device assignment to nested guests, which could make > > testing a lot easier. > > Our experiments show that nested device assignment is pretty much > required for I/O performance in nested scenarios. Really? I did a small test with virtio-blk in a nested guest (disk read with dd, so not a real benchmark) and got a reasonable read-performance of around 25MB/s from the disk in the l2-guest. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: > On 03/10/2010 11:30 PM, Luiz Capitulino wrote: > > Hi there, > > > > Our wiki page for the Summer of Code 2010 is doing quite well: > > > >http://wiki.qemu.org/Google_Summer_of_Code_2010 > > I will add another project - iommu emulation. Could be very useful > for doing device assignment to nested guests, which could make > testing a lot easier. Our experiments show that nested device assignment is pretty much required for I/O performance in nested scenarios. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
On Mon, Mar 15, 2010 at 04:30:47AM +, Daniel K. wrote: > Joerg Roedel wrote: >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 2883ce8..9f8b02d 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -314,6 +314,19 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, >> unsigned long addr, >> kvm_queue_exception_e(vcpu, PF_VECTOR, error_code) >> } >> +void kvm_propagate_fault(struct kvm_vcpu *vcpu, unsigned long addr, >> u32 error_code) >> +{ >> +u32 nested, error; >> + >> +nested = error_code & PFERR_NESTED_MASK; >> +error = error_code & ~PFERR_NESTED_MASK; >> + >> +if (vcpu->arch.mmu.nested && !(error_code && PFERR_NESTED_MASK)) > > This looks incorrect, nested is unused. > > At the very least it should be a binary & operation > > if (vcpu->arch.mmu.nested && !(error_code & PFERR_NESTED_MASK)) > > which can be simplified to > > if (vcpu->arch.mmu.nested && !nested) > > but it seems wrong that the condition is that it is nested and not nested > at the same time. Yes, this is already fixed in my local patch-stack. I found it during further testing (while fixing another bug). But thanks for your feedback :-) Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [long] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled
On 03/15/2010 12:54 PM, Antoine Leca wrote: When doing switch, the cached segment selectors are preserved, which allows one to use protected mode segments in real-address mode (this is called unreal mode). Now this is a by-product of the implementation inside the BIOS. In fact, even if the BIOS enters unreal mode (or the similar big real, more useful with segmentation-less architectures), before turning back to the client it (should) reset things to normal real mode, as service 15/87 is not an usual way to enter unreal mode (for example, this effect is not even mentionned in Ralf Brown's list). The entry into unreal mode is unintentional; the bios is transitioning to protected mode and 'unreal mode' only exists for a few instructions, IIRC. As a result (and also and foremost because of 80286 compatibility), instead of directly using unreal or big real mode if possible (as done eg. in himem.sys), Minix monitor goes to the great pain to going back to square #1, and since blocks are at most 64 KB in size and several iterations are needed, on the next block Minix sets up the (very similar) GDT then does another call to the same BIOS service 15/87. I knew these parts before, but this is where Avi's answer came in: KVM on Intel does not yet support unreal mode and requires the cached segment descriptors to be valid in real-address mode. I do not know which virtual BIOS is using KVM, but I notice while reading http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c: [ Slightly edited to fit the width of my post. AL. ] 3555 case 0x87: 3556 #if BX_CPU< 3 3557 # error "Int15 function 87h not supported on< 80386" 3558 #endif 3559 // +++ should probably have descriptor checks 3560 // +++ should have exception handlers ... 3640 mov eax, cr0 3641 or al, #0x01 3642 mov cr0, eax 3643 ;; far jump to flush CPU queue after transition to prot. mode 3644 JMP_AP(0x0020, protected_mode) 3645 3646 protected_mode: 3647 ;; GDT points to valid descriptor table, now load SS, DS, ES 3648 mov ax, #0x28 ;; 101 000 = 5th desc.in table, TI=GDT,RPL=00 3649 mov ss, ax 3650 mov ax, #0x10 ;; 010 000 = 2nd desc.in table, TI=GDT,RPL=00 3651 mov ds, ax 3652 mov ax, #0x18 ;; 011 000 = 3rd desc.in table, TI=GDT,RPL=00 3653 mov es, ax 3654 xor si, si 3655 xor di, di 3656 cld 3657 rep 3658 movsw ;; move CX words from DS:SI to ES:DI 3659 3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax 3664 3665 ;; reset PG bit in CR0 ??? 3666 mov eax, cr0 3667 and al, #0xFE 3668 mov cr0, eax I should be loosing something here... There is no unreal mode at any moment, is it? [ ... some web browsing occuring meanwhile ... Later: ] Okay, now I got another picture. 8-| Until recently, KVM (and qemu) used Bochs BIOS, showed above; but they switched recently to SeaBIOS... where the applicable code is in src/system.c, and looks like (now this is AT&T assembly): 83 static void 84 handle_1587(struct bregs *regs) 85 { 86 // +++ should probably have descriptor checks 87 // +++ should have exception handlers 127 // Enable protected mode 128 " movl %%cr0, %%eax\n" 129 " orl $" __stringify(CR0_PE) ", %%eax\n" 130 " movl %%eax, %%cr0\n" 131 132 // far jump to flush CPU queue after transition to prot. mode 133 " ljmpw $(4<<3), $1f\n" 134 135 // GDT points to valid descriptor table, now load DS, ES 136 "1:movw $(2<<3), %%ax\n" // 2nd descriptor in table, TI=GDT, RPL=00 137 " movw %%ax, %%ds\n" 138 " movw $(3<<3), %%ax\n" // 3rd descriptor in table, TI=GDT, RPL=00 139 " movw %%ax, %%es\n" 140 141 // move CX words from DS:SI to ES:DI 142 " xorw %%si, %%si\n" 143 " xorw %%di, %%di\n" 144 " rep movsw\n" 145 146 // Disable protected mode 147 " movl %%cr0, %%eax\n" 148 " andl $~" __stringify(CR0_PE) ", %%eax\n" 149 " movl %%eax, %%cr0\n" Note that while the basic scheme is the same, the "cleaning up" of lines 3660-3663 "make sure DS and ES limits are 64KB" is not present. IIUC, the virtualized CPU goes back to real mode with those segments sets as they are in protected mode, and yes with Minix boot monitor they happenned to NOT be paragraph-aligned. Is it possible to add back this "cleaning up" to the BIOS used in KVM? I think so. This is a longstanding kvm bug, but I can't see any downsides to a workaround in the BIOS. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/15/2010 02:38 PM, Joerg Roedel wrote: On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: Hi there, Our wiki page for the Summer of Code 2010 is doing quite well: http://wiki.qemu.org/Google_Summer_of_Code_2010 I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. Good idea. If there is interest I could help to mentor this project. Thanks. I volunteered Anthony, but he may be a little overcommitted. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote: > On 03/10/2010 11:30 PM, Luiz Capitulino wrote: >> Hi there, >> >> Our wiki page for the Summer of Code 2010 is doing quite well: >> >> http://wiki.qemu.org/Google_Summer_of_Code_2010 >> > > I will add another project - iommu emulation. Could be very useful for > doing device assignment to nested guests, which could make testing a lot > easier. Good idea. If there is interest I could help to mentor this project. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On 03/10/2010 11:30 PM, Luiz Capitulino wrote: Hi there, Our wiki page for the Summer of Code 2010 is doing quite well: http://wiki.qemu.org/Google_Summer_of_Code_2010 I will add another project - iommu emulation. Could be very useful for doing device assignment to nested guests, which could make testing a lot easier. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)
Currently when we emulate a locked operation into a shadowed guest page table, we perform a write rather than a true atomic. This is indicated by the "emulating exchange as write" message that shows up in dmesg. In addition, the pte prefetch operation during invlpg suffered from a race. This was fixed by removing the operation. This patchset fixes both issues and reinstates pte prefetch on invlpg. v3: - rebase against next branch (resolves conflicts via hypercall patch) v2: - fix truncated description for patch 1 - add new patch 4, which fixes a bug in patch 5 Avi Kivity (5): KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write() KVM: Make locked operations truly atomic KVM: Don't follow an atomic operation by a non-atomic one KVM: MMU: Do not instantiate nontrapping spte on unsync page KVM: MMU: Reinstate pte prefetch on invlpg arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c | 78 + arch/x86/kvm/paging_tmpl.h | 25 ++- arch/x86/kvm/x86.c | 90 +++ 4 files changed, 127 insertions(+), 67 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate these into a single read, which also allows us to consolidate another read from an invlpg speculating a gpte into the shadow page table. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 69 +++ 1 files changed, 31 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b137515..f63c9ad 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2556,36 +2556,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) } static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, - const u8 *new, int bytes) + u64 gpte) { gfn_t gfn; - int r; - u64 gpte = 0; pfn_t pfn; - if (bytes != 4 && bytes != 8) - return; - - /* -* Assume that the pte write on a page table of the same type -* as the current vcpu paging mode. This is nearly always true -* (might be false while changing modes). Note it is verified later -* by update_pte(). -*/ - if (is_pae(vcpu)) { - /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - if ((bytes == 4) && (gpa % 4 == 0)) { - r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); - if (r) - return; - memcpy((void *)&gpte + (gpa % 8), new, 4); - } else if ((bytes == 8) && (gpa % 8 == 0)) { - memcpy((void *)&gpte, new, 8); - } - } else { - if ((bytes == 4) && (gpa % 4 == 0)) - memcpy((void *)&gpte, new, 4); - } if (!is_present_gpte(gpte)) return; gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; @@ -2636,7 +2611,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int r; pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); - mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; + } + + /* +* Assume that the pte write on a page table of the same type +* as the current vcpu paging mode. This is nearly always true +* (might be false while changing modes). Note it is verified later +* by update_pte(). +*/ + if (is_pae(vcpu) && bytes == 4) { + /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ + gpa &= ~(gpa_t)7; + r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8); + if (r) + gentry = 0; + } + + mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(&vcpu->kvm->mmu_lock); kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); @@ -2701,20 +2703,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, continue; } spte = &sp->spt[page_offset / sizeof(*spte)]; - if ((gpa & (pte_size - 1)) || (bytes < pte_size)) { - gentry = 0; - r = kvm_read_guest_atomic(vcpu->kvm, - gpa & ~(u64)(pte_size - 1), - &gentry, pte_size); - new = (const void *)&gentry; - if (r < 0) - new = NULL; - } while (npte--) { entry = *spte; mmu_pte_write_zap_pte(vcpu, sp, spte); - if (new) - mmu_pte_write_new_pte(vcpu, sp, spte, new); + if (gentry) + mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); mmu_pte_write_flush_tlb(vcpu, entry, *spte); ++spte; } -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: Don't follow an atomic operation by a non-atomic one
Currently emulated atomic operations are immediately followed by a non-atomic operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu but undoes the whole point of doing things atomically. Fix by only performing the atomic operation and the mmu update, and avoiding the non-atomic write. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 21 +++-- 1 files changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d724a52..2c0f632 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3227,7 +3227,8 @@ static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated) + bool guest_initiated, + bool mmu_only) { gpa_t gpa; u32 error_code; @@ -3247,6 +3248,10 @@ static int emulator_write_emulated_onepage(unsigned long addr, if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) goto mmio; + if (mmu_only) { + kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); + return X86EMUL_CONTINUE; + } if (emulator_write_phys(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; @@ -3271,7 +3276,8 @@ int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, struct kvm_vcpu *vcpu, - bool guest_initiated) + bool guest_initiated, + bool mmu_only) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { @@ -3279,7 +3285,7 @@ int __emulator_write_emulated(unsigned long addr, now = -addr & ~PAGE_MASK; rc = emulator_write_emulated_onepage(addr, val, now, vcpu, -guest_initiated); +guest_initiated, mmu_only); if (rc != X86EMUL_CONTINUE) return rc; addr += now; @@ -3287,7 +3293,7 @@ int __emulator_write_emulated(unsigned long addr, bytes -= now; } return emulator_write_emulated_onepage(addr, val, bytes, vcpu, - guest_initiated); + guest_initiated, mmu_only); } int emulator_write_emulated(unsigned long addr, @@ -3295,7 +3301,7 @@ int emulator_write_emulated(unsigned long addr, unsigned int bytes, struct kvm_vcpu *vcpu) { - return __emulator_write_emulated(addr, val, bytes, vcpu, true); + return __emulator_write_emulated(addr, val, bytes, vcpu, true, false); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3359,6 +3365,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr, if (!exchanged) return X86EMUL_CMPXCHG_FAILED; + return __emulator_write_emulated(addr, new, bytes, vcpu, true, true); + emul_write: printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); @@ -4013,7 +4021,8 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops->patch_hypercall(vcpu, instruction); - return __emulator_write_emulated(rip, instruction, 3, vcpu, false); + return __emulator_write_emulated(rip, instruction, 3, vcpu, +false, false); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: MMU: Do not instantiate nontrapping spte on unsync page
The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity --- arch/x86/kvm/paging_tmpl.h | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 81eab9a..4b37e1a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, pt_element_t gpte; unsigned pte_access; pfn_t pfn; + u64 new_spte; gpte = *(const pt_element_t *)pte; if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { - if (!is_present_gpte(gpte)) - __set_spte(spte, shadow_notrap_nonpresent_pte); + if (!is_present_gpte(gpte)) { + if (page->unsync) + new_spte = shadow_trap_nonpresent_pte; + else + new_spte = shadow_notrap_nonpresent_pte; + __set_spte(spte, new_spte); + } return; } pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte); -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: Make locked operations truly atomic
Once upon a time, locked operations were emulated while holding the mmu mutex. Since mmu pages were write protected, it was safe to emulate the writes in a non-atomic manner, since there could be no other writer, either in the guest or in the kernel. These days emulation takes place without holding the mmu spinlock, so the write could be preempted by an unshadowing event, which exposes the page to writes by the guest. This may cause corruption of guest page tables. Fix by using an atomic cmpxchg for these operations. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 69 1 files changed, 48 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9d02cc7..d724a52 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3299,41 +3299,68 @@ int emulator_write_emulated(unsigned long addr, } EXPORT_SYMBOL_GPL(emulator_write_emulated); +#define CMPXCHG_TYPE(t, ptr, old, new) \ + (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) + +#ifdef CONFIG_X86_64 +# define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) +#else +# define CMPXCHG64(ptr, old, new) \ + (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old)) +#endif + static int emulator_cmpxchg_emulated(unsigned long addr, const void *old, const void *new, unsigned int bytes, struct kvm_vcpu *vcpu) { - printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); -#ifndef CONFIG_X86_64 - /* guests cmpxchg8b have to be emulated atomically */ - if (bytes == 8) { - gpa_t gpa; - struct page *page; - char *kaddr; - u64 val; + gpa_t gpa; + struct page *page; + char *kaddr; + bool exchanged; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); + /* guests cmpxchg8b have to be emulated atomically */ + if (bytes > 8 || (bytes & (bytes - 1))) + goto emul_write; - if (gpa == UNMAPPED_GVA || - (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) - goto emul_write; + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); - if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) - goto emul_write; + if (gpa == UNMAPPED_GVA || + (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) + goto emul_write; - val = *(u64 *)new; + if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) + goto emul_write; - page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); + page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); - kaddr = kmap_atomic(page, KM_USER0); - set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val); - kunmap_atomic(kaddr, KM_USER0); - kvm_release_page_dirty(page); + kaddr = kmap_atomic(page, KM_USER0); + kaddr += offset_in_page(gpa); + switch (bytes) { + case 1: + exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); + break; + case 2: + exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); + break; + case 4: + exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); + break; + case 8: + exchanged = CMPXCHG64(kaddr, old, new); + break; + default: + BUG(); } + kunmap_atomic(kaddr, KM_USER0); + kvm_release_page_dirty(page); + + if (!exchanged) + return X86EMUL_CMPXCHG_FAILED; + emul_write: -#endif + printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); return emulator_write_emulated(addr, new, bytes, vcpu); } -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: MMU: Reinstate pte prefetch on invlpg
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c | 37 +++-- arch/x86/kvm/paging_tmpl.h | 15 +++ 3 files changed, 39 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ea1b6c6..28826c8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -389,6 +389,7 @@ struct kvm_arch { unsigned int n_free_mmu_pages; unsigned int n_requested_mmu_pages; unsigned int n_alloc_mmu_pages; + atomic_t invlpg_counter; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; /* * Hash table of struct kvm_mmu_page. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f63c9ad..b3edc46 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2609,20 +2609,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int flooded = 0; int npte; int r; + int invlpg_counter; pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); - switch (bytes) { - case 4: - gentry = *(const u32 *)new; - break; - case 8: - gentry = *(const u64 *)new; - break; - default: - gentry = 0; - break; - } + invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter); /* * Assume that the pte write on a page table of the same type @@ -2630,16 +2621,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, * (might be false while changing modes). Note it is verified later * by update_pte(). */ - if (is_pae(vcpu) && bytes == 4) { + if ((is_pae(vcpu) && bytes == 4) || !new) { /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - gpa &= ~(gpa_t)7; - r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8); + if (is_pae(vcpu)) { + gpa &= ~(gpa_t)7; + bytes = 8; + } + r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8)); if (r) gentry = 0; + new = (const u8 *)&gentry; + } + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; } mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(&vcpu->kvm->mmu_lock); + if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) + gentry = 0; kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); ++vcpu->kvm->stat.mmu_pte_write; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4b37e1a..067797a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -463,6 +463,7 @@ out_unlock: static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { struct kvm_shadow_walk_iterator iterator; + gpa_t pte_gpa = -1; int level; u64 *sptep; int need_flush = 0; @@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (level == PT_PAGE_TABLE_LEVEL || ((level == PT_DIRECTORY_LEVEL && is_large_pte(*sptep))) || ((level == PT_PDPE_LEVEL && is_large_pte(*sptep { + struct kvm_mmu_page *sp = page_header(__pa(sptep)); + + pte_gpa = (sp->gfn << PAGE_SHIFT); + pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); if (is_shadow_present_pte(*sptep)) { rmap_remove(vcpu->kvm, sptep); @@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (need_flush) kvm_flush_remote_tlbs(vcpu->kvm); + + atomic_inc(&vcpu->kvm->arch.invlpg_counter); + spin_unlock(&vcpu->kvm->mmu_lock); + + if (pte_gpa == -1) + return; + + if (mmu_topup_memory_caches(vcpu)
Re: [PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)
On 03/15/2010 12:16 PM, Marcelo Tosatti wrote: On Sun, Mar 14, 2010 at 09:03:47AM +0200, Avi Kivity wrote: On 03/10/2010 04:50 PM, Avi Kivity wrote: Currently when we emulate a locked operation into a shadowed guest page table, we perform a write rather than a true atomic. This is indicated by the "emulating exchange as write" message that shows up in dmesg. In addition, the pte prefetch operation during invlpg suffered from a race. This was fixed by removing the operation. This patchset fixes both issues and reinstates pte prefetch on invlpg. v2: - fix truncated description for patch 1 - add new patch 4, which fixes a bug in patch 5 No comments, but looks like last week's maintainer neglected to merge this. Looks fine. Can you please regenerate against next branch? (just pushed). Will send out shortly. For the invlpg prefetch it would be good to confirm the original bug is not reproducible. I tried to reproduce the problem with the original revert reverted, but couldn't. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 06/30] KVM: remove realmode_lmsw function.
On 03/15/2010 01:02 PM, Andre Przywara wrote: Gleb Natapov wrote: Use (get|set)_cr callback to emulate lmsw inside emulator. I see that vmx.c:handle_cr() is the only other user of kvm_lmsw(). If we fix this place similar like you did below, we could get rid of kvm_lmsw() entirely. But I am not sure whether it's OK to remove an exported symbol. Exported symbols can be changed or removed at will. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [long] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled
Avi Kivity wrote on 2010-03-10 13:03:25 +0200: > On 03/10/2010 12:26 PM, Erik van der Kouwe wrote: >> I've submitted this bug report a week ago: >> http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831 >> > > MINIX is using big real mode which is currently not well supported by kvm on > Intel hardware: > >> (qemu) info registers >> EAX=0010 EBX=0009 ECX=4920 EDX=a796 >> ESI=0200 EDI=49200200 EBP=0009 ESP=a762 >> EIP=f4a7 EFL=00023002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0 >> ES = f300 >> CS =f000 000f f300 >> SS =9492 00094920 f300 >> DS =97ce 00097cec f300 > > A ds.base of 0x97cec cannot be translated to a real mode segment. > > There is some work to get this to work, but it is proceeding really slowly. > It should work on AMD hardware though. Hi guys, I searched the issue, and Erik was kind enough to point me to this list where there are knowledgeable people. Erik van der Kouwe wrote in http://groups.google.com/group/minix3/msg/40f44df0c434cfa6: > The situation is as follows: > > The boot monitor runs in real-address mode, but has to copy parts of > the boot image into high memory (>= 1 MB) which is not accessible from > that mode as only 20 bits are available. It calls the BIOS (int 0x15) > to perform the copy. This is done under the ext_copy label in boot/ > boothead.s. Okay. It is my understanding this is where Minix' involvement stops. > The BIOS switches to protected mode, loading a GDT which it receives > from the caller. Before returning to the caller, it copies data using > the segment descriptors in the GDT and switches back to real-address > mode. This is the description of BIOS service 15/87, which have to be implemented (using whatever solution it pleases) by the BIOS. > When doing switch, the cached segment selectors are preserved, > which allows one to use protected mode segments in real-address mode > (this is called unreal mode). Now this is a by-product of the implementation inside the BIOS. In fact, even if the BIOS enters unreal mode (or the similar big real, more useful with segmentation-less architectures), before turning back to the client it (should) reset things to normal real mode, as service 15/87 is not an usual way to enter unreal mode (for example, this effect is not even mentionned in Ralf Brown's list). As a result (and also and foremost because of 80286 compatibility), instead of directly using unreal or big real mode if possible (as done eg. in himem.sys), Minix monitor goes to the great pain to going back to square #1, and since blocks are at most 64 KB in size and several iterations are needed, on the next block Minix sets up the (very similar) GDT then does another call to the same BIOS service 15/87. > I knew these parts before, but this is where Avi's answer came in: KVM > on Intel does not yet support unreal mode and requires the cached > segment descriptors to be valid in real-address mode. I do not know which virtual BIOS is using KVM, but I notice while reading http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c: [ Slightly edited to fit the width of my post. AL. ] 3555 case 0x87: 3556 #if BX_CPU < 3 3557 # error "Int15 function 87h not supported on < 80386" 3558 #endif 3559 // +++ should probably have descriptor checks 3560 // +++ should have exception handlers ... 3640 mov eax, cr0 3641 or al, #0x01 3642 mov cr0, eax 3643 ;; far jump to flush CPU queue after transition to prot. mode 3644 JMP_AP(0x0020, protected_mode) 3645 3646 protected_mode: 3647 ;; GDT points to valid descriptor table, now load SS, DS, ES 3648 mov ax, #0x28 ;; 101 000 = 5th desc.in table, TI=GDT,RPL=00 3649 mov ss, ax 3650 mov ax, #0x10 ;; 010 000 = 2nd desc.in table, TI=GDT,RPL=00 3651 mov ds, ax 3652 mov ax, #0x18 ;; 011 000 = 3rd desc.in table, TI=GDT,RPL=00 3653 mov es, ax 3654 xor si, si 3655 xor di, di 3656 cld 3657 rep 3658 movsw ;; move CX words from DS:SI to ES:DI 3659 3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax 3664 3665 ;; reset PG bit in CR0 ??? 3666 mov eax, cr0 3667 and al, #0xFE 3668 mov cr0, eax I should be loosing something here... There is no unreal mode at any moment, is it? [ ... some web browsing occuring meanwhile ... Later: ] Okay, now I got another picture. 8-| Until recently, KVM (and qemu) used Bochs BIOS, showed above; but they switched recently to SeaBIOS... where the applicable code is in src/system.c, and looks like (now this is AT&T assembly): 83 static void 84 handle_1587(struct bregs *regs) 85 { 86 // +++ should probably have descriptor checks 87 // +++ should have exception handlers 127 // Enable protected mode 128 " mov
Re: [PATCH v2 06/30] KVM: remove realmode_lmsw function.
Gleb Natapov wrote: Use (get|set)_cr callback to emulate lmsw inside emulator. I see that vmx.c:handle_cr() is the only other user of kvm_lmsw(). If we fix this place similar like you did below, we could get rid of kvm_lmsw() entirely. But I am not sure whether it's OK to remove an exported symbol. Regards, Andre. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |2 -- arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/x86.c |7 --- 3 files changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e8e108a..1e15a0a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context); void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5b060e4..5e2fa61 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,8 +2486,8 @@ twobyte_insn: c->dst.val = ops->get_cr(0, ctxt->vcpu); break; case 6: /* lmsw */ - realmode_lmsw(ctxt->vcpu, (u16)c->src.val, - &ctxt->eflags); + ops->set_cr(0, (ops->get_cr(0, ctxt->vcpu) & ~0x0ful) | + (c->src.val & 0x0f), ctxt->vcpu); c->dst.type = OP_NONE; break; case 7: /* invlpg*/ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bf714df..b08f8a1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4045,13 +4045,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) kvm_x86_ops->set_idt(vcpu, &dt); } -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags) -{ - kvm_lmsw(vcpu, msw); - *rflags = kvm_get_rflags(vcpu); -} - static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) { struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i]; -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Moving dirty bitmaps to userspace - Double buffering approach
Avi Kivity wrote: On 03/15/2010 10:33 AM, Marcelo Tosatti wrote: Are there any good ways to solve this kind of problems? You can introduce a new get_dirty_log ioctl that passes the address of the next bitmap in userspace, and use it (after pinning with get_user_pages), instead of vmalloc'ing. Thank you for your advice! No pinning please, put_user_bit() or set_bit_user(). (can be implemented generically using get_user_pages() and kmap_atomic(), but x86 should get an optimized implementation) Given your advice last time, I started this with my colleague. -- We were just talking about how to strugle with every architectures. As your comment, we'll make the generic implementation with optimized one for x86 first. Thanks Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
* Avi Kivity [2010-03-15 11:27:56]: > >>>The knobs are for > >>> > >>>1. Selective enablement > >>>2. Selective control of the % of unmapped pages > >>An alternative path is to enable KSM for page cache. Then we have > >>direct read-only guest access to host page cache, without any guest > >>modifications required. That will be pretty difficult to achieve > >>though - will need a readonly bit in the page cache radix tree, and > >>teach all paths to honour it. > >> > >Yes, it is, I've taken a quick look. I am not sure if de-duplication > >would be the best approach, may be dropping the page in the page cache > >might be a good first step. Data consistency would be much easier to > >maintain that way, as long as the guest is not writing frequently to > >that page, we don't need the page cache in the host. > > Trimming the host page cache should happen automatically under > pressure. Since the page is cached by the guest, it won't be > re-read, so the host page is not frequently used and then dropped. > Yes, agreed, but dropping is easier than tagging cache as read-only and getting everybody to understand read-only cached pages. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On Mon, Mar 15, 2010 at 12:24:43PM +0200, Avi Kivity wrote: > On 03/15/2010 12:19 PM, Gleb Natapov wrote: > >On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote: > >>On 03/15/2010 12:07 PM, Gleb Natapov wrote: > Or we can make the buffer larger for everyone (outside this patchset > though). > > >>>I am not sure what do you mean here. INS read ahead and MMIO read cache are > >>>different beasts. Former is needed to speed-up string pio reads, later > >>>(not yet implemented) is needed to reread previous MMIO read results in > >>>case instruction emulation is restarted due to need to exit to userspace. > >>>MMIO read cache need to be invalidated on each iteration of string > >>>instruction. > >>Instructions with multiple reads or writes need an mmio read/write > >>buffer that can be replayed on re-execution. > >> > >>buffer != cache! A cache can be dropped (perhaps after flushing it > >>to a backing store), but a buffer in general cannot. > >> > >That is just naming. Call it "buffer" if you want. > > > >I still don't understand what do you mean by "Or we can make the buffer > >larger for everyone". Who is this "everyone"? Different instruction need > >different kind of buffers. > > Many instructions can issue multiple reads, ins is just one of them. > A generic mmio buffer can be used by everyone. > No, ins can issue only _one_ io read during one iteration (i.e between each pair of reads there is a commit point). But this is slow, so we do non-architectural hack: do many reads ahead of time into a buffer and use results from this buffer for emulation of subsequent iterations. Other instruction can do multiple reads between instruction fetching and commit of emulation result and need different kind of buffering (actually caching is more appropriate here since we "cache" results of reads from past attempts to emulation same instruction). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/15/2010 12:19 PM, Gleb Natapov wrote: On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote: On 03/15/2010 12:07 PM, Gleb Natapov wrote: Or we can make the buffer larger for everyone (outside this patchset though). I am not sure what do you mean here. INS read ahead and MMIO read cache are different beasts. Former is needed to speed-up string pio reads, later (not yet implemented) is needed to reread previous MMIO read results in case instruction emulation is restarted due to need to exit to userspace. MMIO read cache need to be invalidated on each iteration of string instruction. Instructions with multiple reads or writes need an mmio read/write buffer that can be replayed on re-execution. buffer != cache! A cache can be dropped (perhaps after flushing it to a backing store), but a buffer in general cannot. That is just naming. Call it "buffer" if you want. I still don't understand what do you mean by "Or we can make the buffer larger for everyone". Who is this "everyone"? Different instruction need different kind of buffers. Many instructions can issue multiple reads, ins is just one of them. A generic mmio buffer can be used by everyone. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote: > On 03/15/2010 12:07 PM, Gleb Natapov wrote: > > > >>Or we can make the buffer larger for everyone (outside this patchset > >>though). > >> > >I am not sure what do you mean here. INS read ahead and MMIO read cache are > >different beasts. Former is needed to speed-up string pio reads, later > >(not yet implemented) is needed to reread previous MMIO read results in > >case instruction emulation is restarted due to need to exit to userspace. > >MMIO read cache need to be invalidated on each iteration of string > >instruction. > > Instructions with multiple reads or writes need an mmio read/write > buffer that can be replayed on re-execution. > > buffer != cache! A cache can be dropped (perhaps after flushing it > to a backing store), but a buffer in general cannot. > That is just naming. Call it "buffer" if you want. I still don't understand what do you mean by "Or we can make the buffer larger for everyone". Who is this "everyone"? Different instruction need different kind of buffers. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)
On Sun, Mar 14, 2010 at 09:03:47AM +0200, Avi Kivity wrote: > On 03/10/2010 04:50 PM, Avi Kivity wrote: > >Currently when we emulate a locked operation into a shadowed guest page > >table, we perform a write rather than a true atomic. This is indicated > >by the "emulating exchange as write" message that shows up in dmesg. > > > >In addition, the pte prefetch operation during invlpg suffered from a > >race. This was fixed by removing the operation. > > > >This patchset fixes both issues and reinstates pte prefetch on invlpg. > > > >v2: > >- fix truncated description for patch 1 > >- add new patch 4, which fixes a bug in patch 5 > > No comments, but looks like last week's maintainer neglected to merge this. Looks fine. Can you please regenerate against next branch? (just pushed). For the invlpg prefetch it would be good to confirm the original bug is not reproducible. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/15/2010 12:07 PM, Gleb Natapov wrote: Or we can make the buffer larger for everyone (outside this patchset though). I am not sure what do you mean here. INS read ahead and MMIO read cache are different beasts. Former is needed to speed-up string pio reads, later (not yet implemented) is needed to reread previous MMIO read results in case instruction emulation is restarted due to need to exit to userspace. MMIO read cache need to be invalidated on each iteration of string instruction. Instructions with multiple reads or writes need an mmio read/write buffer that can be replayed on re-execution. buffer != cache! A cache can be dropped (perhaps after flushing it to a backing store), but a buffer in general cannot. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On Mon, Mar 15, 2010 at 11:56:32AM +0200, Avi Kivity wrote: > On 03/15/2010 11:44 AM, Gleb Natapov wrote: > >On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote: > >>On 03/14/2010 08:06 PM, Gleb Natapov wrote: > Suggest simply reentering every N executions. > > >>>This restart mechanism is, in fact, needed for ins read ahead to work. > >>>After reading ahead from IO port we need to avoid entering decoder > >>>until entire cache is consumed otherwise decoder will clear cache and > >>>data will be lost. So we can't just enter guest in arbitrary times, only > >>>when read ahead cache is empty. Since read ahead is never done across > >>>page boundary this is save place to re-enter guest. > >>Please make the two depend on each other directly then. We can't > >>expect the reader of the emulator code know that. > >> > >We can document that. I wouldn't want to have different conditions for > >guest re-entry for different opcodes. > > We now have a write buffer size of one. It's just a matter of > making the emulator know the size of the buffer (extra parameter to > ->write_emulated). > The buffer is maintained inside emulator, so emulator knows about it and can check it, but then for all other string instruction except INS we will re-enter guest on each iteration. > >>Have the emulator ask the buffer when it is empty. > >> > >It will be always empty for all string ops except INS. > > > > Or we can make the buffer larger for everyone (outside this patchset > though). > I am not sure what do you mean here. INS read ahead and MMIO read cache are different beasts. Former is needed to speed-up string pio reads, later (not yet implemented) is needed to reread previous MMIO read results in case instruction emulation is restarted due to need to exit to userspace. MMIO read cache need to be invalidated on each iteration of string instruction. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/15/2010 11:44 AM, Gleb Natapov wrote: On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote: On 03/14/2010 08:06 PM, Gleb Natapov wrote: Suggest simply reentering every N executions. This restart mechanism is, in fact, needed for ins read ahead to work. After reading ahead from IO port we need to avoid entering decoder until entire cache is consumed otherwise decoder will clear cache and data will be lost. So we can't just enter guest in arbitrary times, only when read ahead cache is empty. Since read ahead is never done across page boundary this is save place to re-enter guest. Please make the two depend on each other directly then. We can't expect the reader of the emulator code know that. We can document that. I wouldn't want to have different conditions for guest re-entry for different opcodes. We now have a write buffer size of one. It's just a matter of making the emulator know the size of the buffer (extra parameter to ->write_emulated). Have the emulator ask the buffer when it is empty. It will be always empty for all string ops except INS. Or we can make the buffer larger for everyone (outside this patchset though). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.
On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote: > On 03/14/2010 08:06 PM, Gleb Natapov wrote: > >>Suggest simply reentering every N executions. > >> > >This restart mechanism is, in fact, needed for ins read ahead to work. > >After reading ahead from IO port we need to avoid entering decoder > >until entire cache is consumed otherwise decoder will clear cache and > >data will be lost. So we can't just enter guest in arbitrary times, only > >when read ahead cache is empty. Since read ahead is never done across > >page boundary this is save place to re-enter guest. > > Please make the two depend on each other directly then. We can't > expect the reader of the emulator code know that. > We can document that. I wouldn't want to have different conditions for guest re-entry for different opcodes. > Have the emulator ask the buffer when it is empty. > It will be always empty for all string ops except INS. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html