Re: vhost-scsi support for ANY_LAYOUT
On 27/01/2015 07:35, Nicholas A. Bellinger wrote: Hi MST Paolo, So I'm currently working on vhost-scsi support for ANY_LAYOUT, and wanted to verify some assumptions based upon your earlier emails.. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers will (always..?) be within a single iovec. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers may be (but not always..?) combined with data-out + data-in payloads into a single iovec. *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected that PI and data payloads for data-out + data-in may be (but not always..?) within the same iovec. Consequently, both headers + PI + data-payloads may also be within a single iovec. s/expected/possible/g Any split between header and payload is possible. It's also possible to split the header in multiple parts, e.g. put the CDB or sense buffer in a separate iovec. Even a single field could be split across two iovecs. *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc() in order to determine the data_direction...? If not, what's the preferred way of determining this information for get_user_pages_fast() permission bits and target_submit_cmd_map_sgls()..? No, it's not safe. What QEMU does is to check if the output buffers are collectively larger than the request header, and if the input buffers are collectively larger than the response header. This lets you cpmpute the data direction. Also, what is required on the QEMU side in order to start generating ANY_LAYOUT style iovecs to verify the WIP changes..? Nothing on the QEMU side. You could test any-layout payloads by changing the kernel side. Splitting the sense buffer into its own iovec for example is a trivial patch to drivers/scsi/virtio_scsi.c. It also helps to adhere to coding conventions. For example in QEMU I'm never using iov[N], and I'm restricting usage of iovcnt to (iov, iovcnt) parameter pairs. This was suggested by Michael and found a couple bugs indeed. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
On 27/01/15 13:17, Christoffer Dall wrote: On Tue, Jan 27, 2015 at 11:21:38AM +, Marc Zyngier wrote: On 26/01/15 22:58, Christoffer Dall wrote: On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote: Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier marc.zyng...@arm.com This had some conflicts with dirty page logging. I fixed it up here, and also removed some trailing white space and mixed spaces/tabs that patch complained about: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes Thanks for doing so. --- arch/arm/include/asm/kvm_emulate.h | 10 + arch/arm/include/asm/kvm_host.h | 3 -- arch/arm/include/asm/kvm_mmu.h | 3 +- arch/arm/kvm/arm.c | 10 - arch/arm/kvm/coproc.c| 64 ++ arch/arm/kvm/coproc_a15.c| 2 +- arch/arm/kvm/coproc_a7.c | 2 +- arch/arm/kvm/mmu.c | 70 - arch/arm/kvm/trace.h | 39 +++ arch/arm64/include/asm/kvm_emulate.h | 10 + arch/arm64/include/asm/kvm_host.h| 3 -- arch/arm64/include/asm/kvm_mmu.h | 3 +- arch/arm64/kvm/sys_regs.c| 75 +--- 13 files changed, 155 insertions(+), 139 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 66ce176..7b01523 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) vcpu-arch.hcr = HCR_GUEST_MASK; } +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.hcr; +} + +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr) +{ + vcpu-arch.hcr = hcr; +} + static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 254e065..04b4ea0 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -125,9 +125,6 @@ struct kvm_vcpu_arch { * Anything that is not used directly from assembly code goes * here. */ - /* dcache set/way operation pending */ - int last_pcpu; - cpumask_t require_dcache_flush; /* Don't run the guest on this vcpu */ bool pause; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 63e0ecc..286644c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, #define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x)) -void stage2_flush_vm(struct kvm *kvm); +void kvm_set_way_flush(struct kvm_vcpu *vcpu); +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled); #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 2d6d910..0b0d58a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu-cpu = cpu; vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); - /* - * Check whether this vcpu requires the cache to be flushed on - * this physical CPU. This is a consequence of doing dcache - * operations by set/way on this vcpu. We do it here to be in - * a non-preemptible section. - */ - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush)) - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */ - kvm_arm_set_running_vcpu(vcpu); } @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); vcpu-mode = OUTSIDE_GUEST_MODE; - vcpu-arch.last_pcpu = smp_processor_id(); kvm_guest_exit(); trace_kvm_exit(*vcpu_pc(vcpu)); /* diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 7928dbd..0afcc00 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu, return true; }
Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
Hi all, I've posted the bolow mail to the qemu-dev mailing list, but I've got no response there. That's why I decided to re-post it here as well, and besides that I think this could be a kvm-specific issue as well. Some additional thing to note: I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as well. I would typically use a max_downtime adjusted to 1 second instead of default 30 ms. I also noticed that the issue happens much more rarelly if I increase the migration bandwidth, i.e. like diff --git a/migration.c b/migration.c index 26f4b65..d2e3b39 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,7 @@ enum { MIG_STATE_COMPLETED, }; -#define MAX_THROTTLE (32 20) /* Migration speed throttling */ +#define MAX_THROTTLE (90 20) /* Migration speed throttling */ Like I said below, I would be glad to provide you with any additional information. Thanks, Mikhail On 23.01.2015 15:03, Mikhail Sennikovskii wrote: Hi all, I'm running a slitely modified migration over tcp test in virt-test, which does a migration from one smp=2 VM to another on the same host over TCP, and exposes some dummy CPU load inside the GUEST while migration, and after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD inside the guest, which happens when An expected clock interrupt was not received on a secondary processor in an MP system within the allocated interval. This indicates that the specified processor is hung and not processing interrupts. This seems to happen with any qemu version I've tested (1.2 and above, including upstream), and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 host. One thing I noticed is that exposing a dummy CPU load on the HOST (like running multiple instances of the while true; do false; done script) in parallel with doing migration makes the issue to be quite easily reproducible. Looking inside the windows crash dump, the second CPU is just running at IRQL 0, and it aparently not hung, as Windows is able to save its state in the crash dump correctly, which assumes running some code on it. So this aparently seems to be some timing issue (like host scheduler does not schedule the thread executing secondary CPU's code in time). Could you give me some insight on this, i.e. is there a way to customize QEMU/KVM to avoid such issue? If you think this might be a qemu/kvm issue, I can provide you any info, like windows crash dumps, or the test-case to reproduce this. qemu is started as: from-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 \ -netdev user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ -m 2G \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu phenom \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm to-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 \ -netdev
Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit
On Tue, Jan 27, 2015 at 10:06:44AM +, David Vrabel wrote: On 27/01/15 08:35, Jan Beulich wrote: On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote: Even if David told you this would be acceptable, I have to question an abstract model of fixing issues on only 64-bit kernels - this may be acceptable for distro purposes, but seems hardly the right approach for upstream. If 32-bit ones are to become deliberately broken, the XEN config option should become dependent on !X86_32. I'd rather have something omitted (keeping the current behaviour) than something that has not been tested at all. Obviously it would be preferable to to fix both 32-bit and 64-bit x86 (and ARM as well) but I'm not going to block an important bug fix for the majority use case (64-bit x86). The hunk for 32-bit should indeed only go in once we get a Tested-by. Please let me know if there is anything else needed for this patch. Luis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/2] x86/arm64: add xenconfig
On Fri, Jan 23, 2015 at 03:19:25PM +, Stefano Stabellini wrote: On Fri, 23 Jan 2015, Luis R. Rodriguez wrote: On Wed, Jan 14, 2015 at 11:33:45AM -0800, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com This v3 addresses Stefano's feedback from the v2 series, namely moving PCI stuff to x86 as its all x86 specific and also just removing the CONFIG_TCG_XEN=m from the general config. To be clear the changes from the v2 series are below. Luis R. Rodriguez (2): x86, platform, xen, kconfig: clarify kvmconfig is for kvm x86, arm, platform, xen, kconfig: add xen defconfig helper arch/x86/configs/xen.config | 10 ++ kernel/configs/xen.config | 26 ++ scripts/kconfig/Makefile| 7 ++- 3 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 arch/x86/configs/xen.config create mode 100644 kernel/configs/xen.config Who could these changes go through? I would be OK with taking it in the Xen tree, but I would feel more comfortable doing that if you had an ack from Yann Morin (CC'ed). *Poke* Luis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-scsi support for ANY_LAYOUT
On Tue, Jan 27, 2015 at 09:42:22AM +0100, Paolo Bonzini wrote: On 27/01/2015 07:35, Nicholas A. Bellinger wrote: Hi MST Paolo, So I'm currently working on vhost-scsi support for ANY_LAYOUT, and wanted to verify some assumptions based upon your earlier emails.. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers will (always..?) be within a single iovec. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers may be (but not always..?) combined with data-out + data-in payloads into a single iovec. *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected that PI and data payloads for data-out + data-in may be (but not always..?) within the same iovec. Consequently, both headers + PI + data-payloads may also be within a single iovec. s/expected/possible/g Any split between header and payload is possible. It's also possible to split the header in multiple parts, e.g. put the CDB or sense buffer in a separate iovec. Even a single field could be split across two iovecs. *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc() in order to determine the data_direction...? If not, what's the preferred way of determining this information for get_user_pages_fast() permission bits and target_submit_cmd_map_sgls()..? No, it's not safe. What QEMU does is to check if the output buffers are collectively larger than the request header, and if the input buffers are collectively larger than the response header. This lets you cpmpute the data direction. Generally true, but I'd like to clarify. I think what we can say is that if there's in value, host is expected to write data, and if there's out value, to read data. But this includes the headers so generally you need to get total length for in and/or out (separately) and subtract header length. But if you know there's e.g. no in header, then you can use in value to figure out gup flags. Or if you see there's no in data, you know you can use gup with read only flags. Also, what is required on the QEMU side in order to start generating ANY_LAYOUT style iovecs to verify the WIP changes..? Nothing on the QEMU side. You could test any-layout payloads by changing the kernel side. Splitting the sense buffer into its own iovec for example is a trivial patch to drivers/scsi/virtio_scsi.c. It also helps to adhere to coding conventions. For example in QEMU I'm never using iov[N], and I'm restricting usage of iovcnt to (iov, iovcnt) parameter pairs. This was suggested by Michael and found a couple bugs indeed. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-scsi support for ANY_LAYOUT
On 27/01/2015 09:42, Paolo Bonzini wrote: *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers may be (but not always..?) combined with data-out + data-in payloads into a single iovec. Note that request and data-out can be combined, and response + data-in can be combined, but there still have to be separate iovecs for at least outgoing and incoming buffers (so 2 buffers for the control and request queues; 1 buffer for the event queue since it's unidirectional. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit
On 27/01/15 08:35, Jan Beulich wrote: On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote: Even if David told you this would be acceptable, I have to question an abstract model of fixing issues on only 64-bit kernels - this may be acceptable for distro purposes, but seems hardly the right approach for upstream. If 32-bit ones are to become deliberately broken, the XEN config option should become dependent on !X86_32. I'd rather have something omitted (keeping the current behaviour) than something that has not been tested at all. Obviously it would be preferable to to fix both 32-bit and 64-bit x86 (and ARM as well) but I'm not going to block an important bug fix for the majority use case (64-bit x86). David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit
On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote: Even if David told you this would be acceptable, I have to question an abstract model of fixing issues on only 64-bit kernels - this may be acceptable for distro purposes, but seems hardly the right approach for upstream. If 32-bit ones are to become deliberately broken, the XEN config option should become dependent on !X86_32. Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit
On 27/01/15 08:35, Jan Beulich wrote: On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote: Even if David told you this would be acceptable, I have to question an abstract model of fixing issues on only 64-bit kernels - this may be acceptable for distro purposes, but seems hardly the right approach for upstream. If 32-bit ones are to become deliberately broken, the XEN config option should become dependent on !X86_32. There are still legitimate reasons to prefer 32bit PV guests over 64bit ones. Previous versions of this patch had 32bit support as well. Why did you drop it? ~Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-scsi support for ANY_LAYOUT
On Mon, Jan 26, 2015 at 10:35:17PM -0800, Nicholas A. Bellinger wrote: Hi MST Paolo, So I'm currently working on vhost-scsi support for ANY_LAYOUT, and wanted to verify some assumptions based upon your earlier emails.. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers will (always..?) be within a single iovec. Do you mean a single struct iovec - as opposed to an array? If so this is not a valid assumption I think, it can be split across entries as well. For example: struct virtio_scsi_cmd_req { __u8 lun[8];/* Logical Unit Number */ __virtio64 tag; /* Command identifier */ __u8 task_attr; /* Task attribute */ __u8 prio; /* SAM command priority field */ __u8 crn; __u8 cdb[VIRTIO_SCSI_CDB_SIZE]; } __attribute__((packed)); We get struct iovec *iov and len 4, then lun[0-3] might be in iov[0], lun[4-7], tag, task_attr, prio might be in iov[1], crn, cdb and data-out might be in iov[2]. data-in must be separate in iov[3]. But if it makes sense, you can optimize for current guest behaviour. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers may be (but not always..?) combined with data-out + data-in payloads into a single iovec. No. From virtio POV, you have to separate out (written by guest, read by host) and in (written by host, read by guest). Including both header and data. The rule is that 1. out comes before in 2. out and in entries are separate *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected that PI and data payloads for data-out + data-in may be (but not always..?) within the same iovec. Consequently, both headers + PI + data-payloads may also be within a single iovec. *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc() in order to determine the data_direction...? If not, what's the preferred way of determining this information for get_user_pages_fast() permission bits and target_submit_cmd_map_sgls()..? Also, what is required on the QEMU side in order to start generating ANY_LAYOUT style iovecs to verify the WIP changes..? I see hw/scsi/virtio-scsi.c has been converted to accept any_layout=1, but AFAICT the changes where only related to code not shared between hw/scsi/vhost-scsi.c. QEMU needs to write-list this feature bit. Thank you, --nab -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] KVM: ARM: on IO mem abort - route the call to KVM MMIO bus
On Sat, Jan 24, 2015 at 03:02:33AM +0200, Nikolay Nikolaev wrote: On Mon, Jan 12, 2015 at 7:09 PM, Eric Auger eric.au...@linaro.org wrote: Hi Nikolay, On 12/07/2014 10:37 AM, Nikolay Nikolaev wrote: On IO memory abort, try to handle the MMIO access thorugh the KVM registered read/write callbacks. This is done by invoking the relevant kvm_io_bus_* API. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c | 33 + 1 file changed, 33 insertions(+) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 4cb5a93..e42469f 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -162,6 +162,36 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return 0; } +/** + * handle_kernel_mmio - handle an in-kernel MMIO access + * @vcpu:pointer to the vcpu performing the access + * @run: pointer to the kvm_run structure + * @mmio:pointer to the data describing the access + * + * returns true if the MMIO access has been performed in kernel space, + * and false if it needs to be emulated in user space. + */ +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_exit_mmio *mmio) +{ + int ret; + + if (mmio-is_write) { + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + + } else { + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, mmio-phys_addr, + mmio-len, mmio-data); + } + if (!ret) { + kvm_prepare_mmio(run, mmio); + kvm_handle_mmio_return(vcpu, run); + } + + return !ret; in case ret 0 (-EOPNOTSUPP = -95) aren't we returning true too? return (ret==0)? +} + int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, phys_addr_t fault_ipa) { @@ -200,6 +230,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vgic_handle_mmio(vcpu, run, mmio)) return 1; + if (handle_kernel_mmio(vcpu, run, mmio)) + return 1; + kvm_prepare_mmio(run, mmio); return 0; currently the io_mem_abort returned value is not used by mmu.c code. I think this should be handed in kvm_handle_guest_abort. What do you think? You're right that the returned value is not handled further after we exit io_mem_abort, it's just passed up the call stack. However I'm not sure how to handle it better. If you have ideas, please share. I'm confused: the return value from io_mem_abort is assigned to a variable 'ret' in kvm_handle_guest_abort and that determines if we should run the VM again or return to userspace (with some work for userspace to do or with an error). -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.
On 24/01/2015 11:21, Wincy Van wrote: + memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE); Most bytes are always 0xff. It's better to initialize it to 0xff once, and set the bit here if !nested_cpu_has_virt_x2apic_mode(vmcs12). + if (nested_cpu_has_virt_x2apic_mode(vmcs12)) Please add braces here, because of the /* */ command below. + /* TPR is allowed */ + nested_vmx_disable_intercept_for_msr(msr_bitmap, + vmx_msr_bitmap_nested, + APIC_BASE_MSR + (APIC_TASKPRI 4), + MSR_TYPE_R | MSR_TYPE_W); +static inline int nested_vmx_check_virt_x2apic(struct kvm_vcpu *vcpu, + struct vmcs12 *vmcs12) +{ + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) + return -EINVAL; No need for this function and nested_cpu_has_virt_x2apic_mode. Just inline them in their caller(s). Same for other cases throughout the series. Paolo + return 0; +} + +static int nested_vmx_check_apicv_controls(struct kvm_vcpu *vcpu, + struct vmcs12 *vmcs12) +{ + int r; + + if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) + return 0; + + r = nested_vmx_check_virt_x2apic(vcpu, vmcs12); + if (r) + goto fail; + -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/6] KVM: nVMX: Enable nested apicv support.
On 24/01/2015 11:18, Wincy Van wrote: v2 --- v3: 1. Add a new field in nested_vmx to avoid the spin lock in v2. 2. Drop send eoi to L1 when doing nested interrupt delivery. 3. Use hardware MSR bitmap to enable nested virtualize x2apic mode. I think the patches are mostly okay. I made a few comments. One of the things to do on top could be to avoid rebuilding the whole vmcs02 on every entry. Recomputing the MSR bitmap on every vmentry is not particularly nice, for example. It is not necessary unless the execution controls have changed. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: iommu: Add cond_resched to legacy device assignment code
On 27/01/2015 11:57, Joerg Roedel wrote: From: Joerg Roedel jroe...@suse.de When assigning devices to large memory guests (=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel jroe...@suse.de --- arch/x86/kvm/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c index 17b73ee..7dbced3 100644 --- a/arch/x86/kvm/iommu.c +++ b/arch/x86/kvm/iommu.c @@ -138,7 +138,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) gfn += page_size PAGE_SHIFT; - + cond_resched(); } return 0; @@ -306,6 +306,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm, kvm_unpin_pages(kvm, pfn, unmap_pages); gfn += unmap_pages; + + cond_resched(); } } Applying to kvm/queue, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.
On 24/01/2015 11:21, Wincy Van wrote: +static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, + unsigned long *msr_bitmap_nested, + u32 msr, int type) +{ + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return; + Also, make this a WARN_ON. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[kvm-unit-tests][PATCH] x86: hypercall: fix compile error on i386
There is a failure to build on 32-bit hosts: x86/hypercall.c: In function ‘test_edge’: x86/hypercall.c:42:2: error: ‘test_rip’ undeclared (first use in this function) test_rip = 0; ^ This patch fixes this issue. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- x86/hypercall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86/hypercall.c b/x86/hypercall.c index 1548421..3ac5ff9 100644 --- a/x86/hypercall.c +++ b/x86/hypercall.c @@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr) } -#ifdef __x86_64__ volatile unsigned long test_rip; +#ifdef __x86_64__ extern void gp_tss(void); asm (gp_tss: \n\t add $8, %rsp\n\t// discard error code -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests][PATCH] x86: hypercall: fix compile error on i386
On 27/01/2015 21:23, Chris J Arges wrote: There is a failure to build on 32-bit hosts: x86/hypercall.c: In function ‘test_edge’: x86/hypercall.c:42:2: error: ‘test_rip’ undeclared (first use in this function) test_rip = 0; ^ This patch fixes this issue. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- x86/hypercall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86/hypercall.c b/x86/hypercall.c index 1548421..3ac5ff9 100644 --- a/x86/hypercall.c +++ b/x86/hypercall.c @@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr) } -#ifdef __x86_64__ volatile unsigned long test_rip; +#ifdef __x86_64__ extern void gp_tss(void); asm (gp_tss: \n\t add $8, %rsp\n\t// discard error code Thanks, applied. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[kvm-unit-tests][PATCH] x86: hypercall: a better fix for the compiler error
The last patch moves the ifdef in a way that causes a compiler warning. Here fix the ifdefs to isloate x86_64 functions and variables for the test. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- x86/hypercall.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/x86/hypercall.c b/x86/hypercall.c index 3ac5ff9..d134146 100644 --- a/x86/hypercall.c +++ b/x86/hypercall.c @@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr) } -volatile unsigned long test_rip; #ifdef __x86_64__ +volatile unsigned long test_rip; extern void gp_tss(void); asm (gp_tss: \n\t add $8, %rsp\n\t// discard error code @@ -34,7 +34,6 @@ asm (gp_tss: \n\t iretq\n\t jmp gp_tss\n\t ); -#endif static inline int test_edge(void) @@ -47,6 +46,7 @@ test_edge(void) printf(Return from int 13, test_rip = %lx\n, test_rip); return test_rip == (1ul 47); } +#endif int main(int ac, char **av) { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/6] KVM: nVMX: Enable nested posted interrupt processing.
On 24/01/2015 11:24, Wincy Van wrote: if (!nested_cpu_has_virt_x2apic_mode(vmcs12) !nested_cpu_has_apic_reg_virt(vmcs12) - !nested_cpu_has_vid(vmcs12)) + !nested_cpu_has_vid(vmcs12) + !nested_cpu_has_posted_intr(vmcs12)) return 0; if (nested_cpu_has_virt_x2apic_mode(vmcs12)) r = nested_vmx_check_virt_x2apic(vcpu, vmcs12); if (nested_cpu_has_vid(vmcs12)) r |= nested_vmx_check_vid(vcpu, vmcs12); + if (nested_cpu_has_posted_intr(vmcs12)) + r |= nested_vmx_check_posted_intr(vcpu, vmcs12); These ifs are always true. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: update_memslots: clean flags for invalid memslots
On 09/01/2015 09:29, Tiejun Chen wrote: Indeed, any invalid memslots should be new-npages = 0, new-base_gfn = 0 and new-flags = 0 at the same time. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- Paolo, This is just a small cleanup to follow-up commit, efbeec7098ee, fix sorting of memslots with base_gfn == 0. Tiejun virt/kvm/kvm_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1cc6e2e..369c759 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -673,6 +673,7 @@ static void update_memslots(struct kvm_memslots *slots, if (!new-npages) { WARN_ON(!mslots[i].npages); new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { Applied to kvm/queue. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii mikhail.sennikovs...@profitbricks.com wrote: Hi all, I've posted the bolow mail to the qemu-dev mailing list, but I've got no response there. That's why I decided to re-post it here as well, and besides that I think this could be a kvm-specific issue as well. Some additional thing to note: I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as well. I would typically use a max_downtime adjusted to 1 second instead of default 30 ms. I also noticed that the issue happens much more rarelly if I increase the migration bandwidth, i.e. like diff --git a/migration.c b/migration.c index 26f4b65..d2e3b39 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,7 @@ enum { MIG_STATE_COMPLETED, }; -#define MAX_THROTTLE (32 20) /* Migration speed throttling */ +#define MAX_THROTTLE (90 20) /* Migration speed throttling */ Like I said below, I would be glad to provide you with any additional information. Thanks, Mikhail Hi, Mikhail, So if you choose to use one vcpu, instead of smp, this issue would not happen, right? -Jidong On 23.01.2015 15:03, Mikhail Sennikovskii wrote: Hi all, I'm running a slitely modified migration over tcp test in virt-test, which does a migration from one smp=2 VM to another on the same host over TCP, and exposes some dummy CPU load inside the GUEST while migration, and after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD inside the guest, which happens when An expected clock interrupt was not received on a secondary processor in an MP system within the allocated interval. This indicates that the specified processor is hung and not processing interrupts. This seems to happen with any qemu version I've tested (1.2 and above, including upstream), and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 host. One thing I noticed is that exposing a dummy CPU load on the HOST (like running multiple instances of the while true; do false; done script) in parallel with doing migration makes the issue to be quite easily reproducible. Looking inside the windows crash dump, the second CPU is just running at IRQL 0, and it aparently not hung, as Windows is able to save its state in the crash dump correctly, which assumes running some code on it. So this aparently seems to be some timing issue (like host scheduler does not schedule the thread executing secondary CPU's code in time). Could you give me some insight on this, i.e. is there a way to customize QEMU/KVM to avoid such issue? If you think this might be a qemu/kvm issue, I can provide you any info, like windows crash dumps, or the test-case to reproduce this. qemu is started as: from-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 \ -netdev user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ -m 2G \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu phenom \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm to-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive
Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
On 26/01/15 22:58, Christoffer Dall wrote: On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote: Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier marc.zyng...@arm.com This had some conflicts with dirty page logging. I fixed it up here, and also removed some trailing white space and mixed spaces/tabs that patch complained about: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes Thanks for doing so. --- arch/arm/include/asm/kvm_emulate.h | 10 + arch/arm/include/asm/kvm_host.h | 3 -- arch/arm/include/asm/kvm_mmu.h | 3 +- arch/arm/kvm/arm.c | 10 - arch/arm/kvm/coproc.c| 64 ++ arch/arm/kvm/coproc_a15.c| 2 +- arch/arm/kvm/coproc_a7.c | 2 +- arch/arm/kvm/mmu.c | 70 - arch/arm/kvm/trace.h | 39 +++ arch/arm64/include/asm/kvm_emulate.h | 10 + arch/arm64/include/asm/kvm_host.h| 3 -- arch/arm64/include/asm/kvm_mmu.h | 3 +- arch/arm64/kvm/sys_regs.c| 75 +--- 13 files changed, 155 insertions(+), 139 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 66ce176..7b01523 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) vcpu-arch.hcr = HCR_GUEST_MASK; } +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.hcr; +} + +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr) +{ + vcpu-arch.hcr = hcr; +} + static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 254e065..04b4ea0 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -125,9 +125,6 @@ struct kvm_vcpu_arch { * Anything that is not used directly from assembly code goes * here. */ - /* dcache set/way operation pending */ - int last_pcpu; - cpumask_t require_dcache_flush; /* Don't run the guest on this vcpu */ bool pause; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 63e0ecc..286644c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, #define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x)) -void stage2_flush_vm(struct kvm *kvm); +void kvm_set_way_flush(struct kvm_vcpu *vcpu); +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled); #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 2d6d910..0b0d58a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu-cpu = cpu; vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); - /* - * Check whether this vcpu requires the cache to be flushed on - * this physical CPU. This is a consequence of doing dcache - * operations by set/way on this vcpu. We do it here to be in - * a non-preemptible section. - */ - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush)) - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */ - kvm_arm_set_running_vcpu(vcpu); } @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); vcpu-mode = OUTSIDE_GUEST_MODE; - vcpu-arch.last_pcpu = smp_processor_id(); kvm_guest_exit(); trace_kvm_exit(*vcpu_pc(vcpu)); /* diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 7928dbd..0afcc00 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu, return true; } -/* See note at ARM ARM B1.14.4 */ +/* + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily
[PATCH v2] kvm: iommu: Add cond_resched to legacy device assignment code
From: Joerg Roedel jroe...@suse.de When assigning devices to large memory guests (=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel jroe...@suse.de --- arch/x86/kvm/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c index 17b73ee..7dbced3 100644 --- a/arch/x86/kvm/iommu.c +++ b/arch/x86/kvm/iommu.c @@ -138,7 +138,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) gfn += page_size PAGE_SHIFT; - + cond_resched(); } return 0; @@ -306,6 +306,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm, kvm_unpin_pages(kvm, pfn, unmap_pages); gfn += unmap_pages; + + cond_resched(); } } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte.
This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation path by setting D-bit manually prior to the occurrence of the write from guest. We only set D-bit manually in set_spte, and leave fast_page_fault path unchanged, as fast_page_fault is very unlikely to happen in case of PML. For the hva - pa change case, the spte is updated to either read-only (host pte is read-only) or be dropped (host pte is writeable), and both cases will be handled by above changes, therefore no change is necessary. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/x86/kvm/mmu.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c438224..fb35535 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2597,8 +2597,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } } - if (pte_access ACC_WRITE_MASK) + if (pte_access ACC_WRITE_MASK) { mark_page_dirty(vcpu-kvm, gfn); + /* +* Explicitly set dirty bit. It is used to eliminate unnecessary +* dirty GPA logging in case of PML is enabled on VMX. +*/ + spte |= shadow_dirty_mask; + } set_pte: if (mmu_spte_update(sptep, spte)) @@ -2914,6 +2920,16 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, */ gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + /* +* Theoretically we could also set dirty bit (and flush TLB) here in +* order to eliminate the unnecessary PML logging. See comments in +* set_spte. But as in case of PML, fast_page_fault is very unlikely to +* happen so we leave it unchanged. This might result in the same GPA +* to be logged in PML buffer again when the write really happens, and +* eventually to be called by mark_page_dirty twice. But it's also no +* harm. This also avoids the TLB flush needed after setting dirty bit +* so non-PML cases won't be impacted. +*/ if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte) mark_page_dirty(vcpu-kvm, gfn); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty
We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/arm/kvm/mmu.c | 18 -- arch/x86/kvm/mmu.c | 21 +++-- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 2 +- 4 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 74aeaba..6034697 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1081,7 +1081,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) } /** - * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages + * kvm_mmu_write_protect_pt_masked() - write protect dirty pages * @kvm: The KVM pointer * @slot: The memory slot associated with mask * @gfn_offset:The gfn offset in memory slot @@ -1091,7 +1091,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) * Walks bits set in mask write protects the associated pte's. Caller must * acquire kvm_mmu_lock. */ -void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, +static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { @@ -1102,6 +1102,20 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, stage2_wp_range(kvm, start, end); } +/* + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected + * dirty pages. + * + * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to + * enable dirty logging for them. + */ +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) +{ + kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 0ed9f79..b18e65c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1216,7 +1216,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, } /** - * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance * @slot: slot to protect * @gfn_offset: start of the BITS_PER_LONG pages we care about @@ -1225,7 +1225,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, * Used when we do not need to care about huge page mappings: e.g. during dirty * logging we do not have any such mappings. */ -void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, +static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { @@ -1241,6 +1241,23 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, } } +/** + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected + * PT level pages. + * + * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to + * enable dirty logging for them. + * + * Used when we do not need to care about huge page mappings: e.g. during dirty + * logging we do not have any such mappings. + */ +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) +{ + kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); +} + static bool rmap_write_protect(struct kvm *kvm, u64 gfn) { struct kvm_memory_slot *slot; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7d67195..32d0575 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -615,7 +615,7 @@ int kvm_get_dirty_log(struct kvm *kvm, int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log, bool *is_dirty); -void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a8490f0..0c28176 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1059,7 +1059,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm, dirty_bitmap_buffer[i] = mask; offset = i * BITS_PER_LONG; - kvm_arch_mmu_write_protect_pt_masked(kvm, memslot,
[PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML
This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot-dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. Other ARCHs won't be impacted as these hooks are NULL for them. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/x86/include/asm/kvm_host.h | 25 +++ arch/x86/kvm/mmu.c | 6 +++- arch/x86/kvm/x86.c | 71 - 3 files changed, 93 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 67a98d7..57916ec 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -802,6 +802,31 @@ struct kvm_x86_ops { int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); void (*sched_in)(struct kvm_vcpu *kvm, int cpu); + + /* +* Arch-specific dirty logging hooks. These hooks are only supposed to +* be valid if the specific arch has hardware-accelerated dirty logging +* mechanism. Currently only for PML on VMX. +* +* - slot_enable_log_dirty: +* called when enabling log dirty mode for the slot. +* - slot_disable_log_dirty: +* called when disabling log dirty mode for the slot. +* also called when slot is created with log dirty disabled. +* - flush_log_dirty: +* called before reporting dirty_bitmap to userspace. +* - enable_log_dirty_pt_masked: +* called when reenabling log dirty for the GFNs in the mask after +* corresponding bits are cleared in slot-dirty_bitmap. +*/ + void (*slot_enable_log_dirty)(struct kvm *kvm, + struct kvm_memory_slot *slot); + void (*slot_disable_log_dirty)(struct kvm *kvm, + struct kvm_memory_slot *slot); + void (*flush_log_dirty)(struct kvm *kvm); + void (*enable_log_dirty_pt_masked)(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t offset, unsigned long mask); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6c24af3..c5833ca 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1335,7 +1335,11 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { - kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); + if (kvm_x86_ops-enable_log_dirty_pt_masked) + kvm_x86_ops-enable_log_dirty_pt_masked(kvm, slot, gfn_offset, + mask); + else + kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); } static bool rmap_write_protect(struct kvm *kvm, u64 gfn) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3a7fcff..442ee7d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3780,6 +3780,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) mutex_lock(kvm-slots_lock); + /* +* Flush potentially hardware-cached dirty pages to dirty_bitmap. +*/ + if (kvm_x86_ops-flush_log_dirty) + kvm_x86_ops-flush_log_dirty(kvm); + r = kvm_get_dirty_log_protect(kvm, log, is_dirty); /* @@ -7533,6 +7539,56 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, return 0; } +static void kvm_mmu_slot_apply_flags(struct kvm *kvm, +struct kvm_memory_slot *new) +{ + /* Still write protect RO slot */ + if (new-flags KVM_MEM_READONLY) { + kvm_mmu_slot_remove_write_access(kvm, new); + return; + } + + /* +* Call kvm_x86_ops dirty logging hooks when they are valid. +* +* kvm_x86_ops-slot_disable_log_dirty is called when: +* +* - KVM_MR_CREATE with dirty logging is disabled +* - KVM_MR_FLAGS_ONLY with dirty logging is disabled in new flag +* +* The reason is, in case of PML, we need to set D-bit for any slots +* with dirty logging disabled in order to eliminate unnecessary GPA +* logging in PML buffer (and potential PML buffer full VMEXT). This +* guarantees leaving PML enabled during guest's lifetime won't have +* any additonal overhead from PML when guest is running with dirty +* logging disabled for memory slots. +* +* kvm_x86_ops-slot_enable_log_dirty is called when switching new slot +* to dirty logging mode. +* +* If kvm_x86_ops dirty logging hooks are
[PATCH 2/6] KVM: MMU: Add mmu help functions to support PML
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and to write protect superpages for memory slot. In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages, instead, we only need to clear D-bit in order to log that GPA. For superpages, we still write protect it and let page fault code to handle dirty page logging, as we still need to split superpage to 4K pages in PML. As PML is always enabled during guest's lifetime, to eliminate unnecessary PML GPA logging, we set D-bit manually for the slot with dirty logging disabled. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/x86/include/asm/kvm_host.h | 9 ++ arch/x86/kvm/mmu.c | 195 2 files changed, 204 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 843bea0..4f6369b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -835,6 +835,15 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); +void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, + struct kvm_memory_slot *memslot); +void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, + struct kvm_memory_slot *memslot); +void kvm_mmu_slot_set_dirty(struct kvm *kvm, + struct kvm_memory_slot *memslot); +void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b18e65c..c438224 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1215,6 +1215,60 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, return flush; } +static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep) +{ + u64 spte = *sptep; + + rmap_printk(rmap_clear_dirty: spte %p %llx\n, sptep, *sptep); + + spte = ~shadow_dirty_mask; + + return mmu_spte_update(sptep, spte); +} + +static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *sptep; + struct rmap_iterator iter; + bool flush = false; + + for (sptep = rmap_get_first(*rmapp, iter); sptep;) { + BUG_ON(!(*sptep PT_PRESENT_MASK)); + + flush |= spte_clear_dirty(kvm, sptep); + sptep = rmap_get_next(iter); + } + + return flush; +} + +static bool spte_set_dirty(struct kvm *kvm, u64 *sptep) +{ + u64 spte = *sptep; + + rmap_printk(rmap_set_dirty: spte %p %llx\n, sptep, *sptep); + + spte |= shadow_dirty_mask; + + return mmu_spte_update(sptep, spte); +} + +static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *sptep; + struct rmap_iterator iter; + bool flush = false; + + for (sptep = rmap_get_first(*rmapp, iter); sptep;) { + BUG_ON(!(*sptep PT_PRESENT_MASK)); + + flush |= spte_set_dirty(kvm, sptep); + sptep = rmap_get_next(iter); + } + + return flush; +} + /** * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance @@ -1242,6 +1296,32 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, } /** + * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages + * @kvm: kvm instance + * @slot: slot to clear D-bit + * @gfn_offset: start of the BITS_PER_LONG pages we care about + * @mask: indicates which pages we should clear D-bit + * + * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap. + */ +void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, +struct kvm_memory_slot *slot, +gfn_t gfn_offset, unsigned long mask) +{ + unsigned long *rmapp; + + while (mask) { + rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask), + PT_PAGE_TABLE_LEVEL, slot); + __rmap_clear_dirty(kvm, rmapp); + + /* clear the first set bit */ + mask = mask - 1; + } +} +EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); + +/** * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected * PT level pages. * @@ -4368,6 +4448,121 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) kvm_flush_remote_tlbs(kvm); } +void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, +
[PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kvm/mmu.c | 5 ++--- arch/x86/kvm/x86.c | 10 +++--- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f6369b..67a98d7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -834,7 +834,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 dirty_mask, u64 nx_mask, u64 x_mask); void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); +void kvm_mmu_slot_remove_write_access(struct kvm *kvm, + struct kvm_memory_slot *memslot); void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index fb35535..6c24af3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4410,14 +4410,13 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu) init_kvm_mmu(vcpu); } -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) +void kvm_mmu_slot_remove_write_access(struct kvm *kvm, + struct kvm_memory_slot *memslot) { - struct kvm_memory_slot *memslot; gfn_t last_gfn; int i; bool flush = false; - memslot = id_to_memslot(kvm-memslots, slot); last_gfn = memslot-base_gfn + memslot-npages - 1; spin_lock(kvm-mmu_lock); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1e10e3f..3a7fcff 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7538,7 +7538,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_memory_slot *old, enum kvm_mr_change change) { - + struct kvm_memory_slot *new; int nr_mmu_pages = 0; if ((mem-slot = KVM_USER_MEM_SLOTS) (change == KVM_MR_DELETE)) { @@ -7557,6 +7557,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, if (nr_mmu_pages) kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); + + /* It's OK to get 'new' slot here as it has already been installed */ + new = id_to_memslot(kvm-memslots, mem-slot); + /* * Write protect all pages for dirty logging. * @@ -7566,8 +7570,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, * * See the comments in fast_page_fault(). */ - if ((change != KVM_MR_DELETE) (mem-flags KVM_MEM_LOG_DIRTY_PAGES)) - kvm_mmu_slot_remove_write_access(kvm, mem-slot); + if ((change != KVM_MR_DELETE) (new-flags KVM_MEM_LOG_DIRTY_PAGES)) + kvm_mmu_slot_remove_write_access(kvm, new); } void kvm_arch_flush_shadow_all(struct kvm *kvm) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] KVM: VMX: Add PML support in VMX
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- arch/x86/include/asm/vmx.h | 4 + arch/x86/include/uapi/asm/vmx.h | 1 + arch/x86/kvm/trace.h| 18 arch/x86/kvm/vmx.c | 195 +++- arch/x86/kvm/x86.c | 1 + 5 files changed, 218 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 45afaee..da772ed 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -69,6 +69,7 @@ #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 #define SECONDARY_EXEC_SHADOW_VMCS 0x4000 +#define SECONDARY_EXEC_ENABLE_PML 0x0002 #define SECONDARY_EXEC_XSAVES 0x0010 @@ -121,6 +122,7 @@ enum vmcs_field { GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, GUEST_INTR_STATUS = 0x0810, + GUEST_PML_INDEX = 0x0812, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -140,6 +142,8 @@ enum vmcs_field { VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x2009, VM_ENTRY_MSR_LOAD_ADDR = 0x200a, VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x200b, + PML_ADDRESS = 0x200e, + PML_ADDRESS_HIGH= 0x200f, TSC_OFFSET = 0x2010, TSC_OFFSET_HIGH = 0x2011, VIRTUAL_APIC_PAGE_ADDR = 0x2012, diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index ff2b8e2..c5f1a1d 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -73,6 +73,7 @@ #define EXIT_REASON_XSETBV 55 #define EXIT_REASON_APIC_WRITE 56 #define EXIT_REASON_INVPCID 58 +#define EXIT_REASON_PML_FULL62 #define EXIT_REASON_XSAVES 63 #define EXIT_REASON_XRSTORS 64 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 587149b..a139977 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc, __print_symbolic(__entry-host_clock, host_clocks)) ); +/* + * Tracepoint for PML full VMEXIT. + */ +TRACE_EVENT(kvm_pml_full, + TP_PROTO(unsigned int vcpu_id), + TP_ARGS(vcpu_id), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id ) + ), + + TP_fast_assign( + __entry-vcpu_id= vcpu_id; + ), + + TP_printk(vcpu %d: PML full, __entry-vcpu_id) +); + #endif /* CONFIG_X86_64 */ TRACE_EVENT(kvm_ple_window, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c987374..de5ce82 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -101,6 +101,9 @@ module_param(nested, bool, S_IRUGO); static u64 __read_mostly host_xss; +static bool __read_mostly enable_pml = 1; +module_param_named(pml, enable_pml, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD) #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE) #define KVM_VM_CR0_ALWAYS_ON \ @@ -516,6 +519,10 @@ struct vcpu_vmx { /* Dynamic PLE window. */ int ple_window; bool ple_window_dirty; + + /* Support for PML */ +#define PML_ENTITY_NUM 512 + struct page *pml_pg; }; enum segment_cache_field { @@ -1068,6 +1075,11 @@ static inline bool cpu_has_vmx_shadow_vmcs(void) SECONDARY_EXEC_SHADOW_VMCS; } +static inline bool cpu_has_vmx_pml(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl SECONDARY_EXEC_ENABLE_PML; +} + static inline bool report_flexpriority(void) { return flexpriority_enabled; @@ -2924,7 +2936,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_SHADOW_VMCS | - SECONDARY_EXEC_XSAVES; + SECONDARY_EXEC_XSAVES | + SECONDARY_EXEC_ENABLE_PML; if (adjust_vmx_controls(min2, opt2, MSR_IA32_VMX_PROCBASED_CTLS2, _cpu_based_2nd_exec_control) 0) @@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) a current VMCS12 */ exec_control = ~SECONDARY_EXEC_SHADOW_VMCS; + /* PML is enabled/disabled in
RE: [v3 00/26] Add VT-d Posted-Interrupts support
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, January 28, 2015 11:44 AM To: Wu, Feng Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; jiang@linux.intel.com; eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org Subject: Re: [v3 00/26] Add VT-d Posted-Interrupts support On Wed, 2015-01-28 at 03:01 +, Wu, Feng wrote: -Original Message- From: Wu, Feng Sent: Wednesday, January 21, 2015 10:26 AM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support -Original Message- From: Wu, Feng Sent: Friday, December 12, 2014 11:15 PM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: [v3 00/26] Add VT-d Posted-Interrupts support VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html v1-v2: * Use VFIO framework to enable this feature, the VFIO part of this series is base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, then revise some irq logic based on the new hierarchy irqdomain patches provided by Jiang Liu jiang@linux.intel.com v2-v3: * Adjust the Posted-interrupts Descriptor updating logic when vCPU is preempted or blocked. * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which can be used to change back to remapping mode. * Fix typo This patch series is made of the following groups: 1-6: Some preparation changes in iommu and irq component, this is based on the new hierarchy irqdomain logic. 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection, command line parameter. 10-17, 22-25: Changes related to KVM itself. 18-20: Changes in VFIO component, this part was previously sent out as [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d Posted-Interrupts 21: x86 irq related changes Feng Wu (26): genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU iommu: Add new member capability to struct irq_remap_ops iommu, x86: Define new irte structure for VT-d Posted-Interrupts iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller iommu, x86: No need to migrating irq for VT-d Posted-Interrupts iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Add intel_irq_remapping_capability() for Intel iommu, x86: define irq_remapping_cap() KVM: change struct pi_desc for VT-d Posted-Interrupts KVM: Add some helper functions for Posted-Interrupts KVM: Initialize VT-d Posted-Interrupts Descriptor KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu KVM: add interfaces to control PI outside vmx KVM: Make struct kvm_irq_routing_table accessible KVM: make kvm_set_msi_irq() public KVM: kvm-vfio: User API for VT-d Posted-Interrupts KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts KVM: x86: kvm-vfio: VT-d posted-interrupts setup x86, irq: Define a global vector for VT-d Posted-Interrupts KVM: Define a wakeup worker thread for vCPU KVM: Update Posted-Interrupts Descriptor when vCPU is preempted KVM: Update Posted-Interrupts Descriptor when vCPU is blocked KVM: Suppress posted-interrupt when 'SN' is set iommu/vt-d:
Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
On 2015-01-28 03:10:23, Jidong Xiao wrote: On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii mikhail.sennikovs...@profitbricks.com wrote: Hi all, I've posted the bolow mail to the qemu-dev mailing list, but I've got no response there. That's why I decided to re-post it here as well, and besides that I think this could be a kvm-specific issue as well. Some additional thing to note: I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as well. I would typically use a max_downtime adjusted to 1 second instead of default 30 ms. I also noticed that the issue happens much more rarelly if I increase the migration bandwidth, i.e. like diff --git a/migration.c b/migration.c index 26f4b65..d2e3b39 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,7 @@ enum { MIG_STATE_COMPLETED, }; -#define MAX_THROTTLE (32 20) /* Migration speed throttling */ +#define MAX_THROTTLE (90 20) /* Migration speed throttling */ Like I said below, I would be glad to provide you with any additional information. Thanks, Mikhail Hi, Mikhail, So if you choose to use one vcpu, instead of smp, this issue would not happen, right? I think you can try cpu feature hv_relaxed, like -cpu Haswell,hv_relaxed -Jidong On 23.01.2015 15:03, Mikhail Sennikovskii wrote: Hi all, I'm running a slitely modified migration over tcp test in virt-test, which does a migration from one smp=2 VM to another on the same host over TCP, and exposes some dummy CPU load inside the GUEST while migration, and after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD inside the guest, which happens when An expected clock interrupt was not received on a secondary processor in an MP system within the allocated interval. This indicates that the specified processor is hung and not processing interrupts. This seems to happen with any qemu version I've tested (1.2 and above, including upstream), and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 host. One thing I noticed is that exposing a dummy CPU load on the HOST (like running multiple instances of the while true; do false; done script) in parallel with doing migration makes the issue to be quite easily reproducible. Looking inside the windows crash dump, the second CPU is just running at IRQL 0, and it aparently not hung, as Windows is able to save its state in the crash dump correctly, which assumes running some code on it. So this aparently seems to be some timing issue (like host scheduler does not schedule the thread executing secondary CPU's code in time). Could you give me some insight on this, i.e. is there a way to customize QEMU/KVM to avoid such issue? If you think this might be a qemu/kvm issue, I can provide you any info, like windows crash dumps, or the test-case to reproduce this. qemu is started as: from-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 \ -netdev user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ -m 2G \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu phenom \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm to-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev
Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.
On Wed, Jan 28, 2015 at 5:37 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 24/01/2015 11:21, Wincy Van wrote: + memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE); Most bytes are always 0xff. It's better to initialize it to 0xff once, and set the bit here if !nested_cpu_has_virt_x2apic_mode(vmcs12). Indeed, will do. + if (nested_cpu_has_virt_x2apic_mode(vmcs12)) Please add braces here, because of the /* */ command below. Will do. + /* TPR is allowed */ + nested_vmx_disable_intercept_for_msr(msr_bitmap, + vmx_msr_bitmap_nested, + APIC_BASE_MSR + (APIC_TASKPRI 4), + MSR_TYPE_R | MSR_TYPE_W); +static inline int nested_vmx_check_virt_x2apic(struct kvm_vcpu *vcpu, + struct vmcs12 *vmcs12) +{ + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) + return -EINVAL; No need for this function and nested_cpu_has_virt_x2apic_mode. Just inline them in their caller(s). Same for other cases throughout the series. Do you mean that we should also inline the same functions in the other patches of this patch set? I think these functions will keep the code tidy, just like the functions as nested_cpu_has_preemption_timer, nested_cpu_has_ept, etc. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/6] KVM: nVMX: Enable nested apicv support.
On Wed, Jan 28, 2015 at 6:06 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 24/01/2015 11:18, Wincy Van wrote: v2 --- v3: 1. Add a new field in nested_vmx to avoid the spin lock in v2. 2. Drop send eoi to L1 when doing nested interrupt delivery. 3. Use hardware MSR bitmap to enable nested virtualize x2apic mode. I think the patches are mostly okay. I made a few comments. Thank you, Paolo and Yang. I can't accomplish this without your help. One of the things to do on top could be to avoid rebuilding the whole vmcs02 on every entry. Recomputing the MSR bitmap on every vmentry is not particularly nice, for example. It is not necessary unless the execution controls have changed. Indeed, I was planned to do that optimization after this patch set. Thanks, Wincy Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.
On Wed, Jan 28, 2015 at 5:39 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 24/01/2015 11:21, Wincy Van wrote: +static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, + unsigned long *msr_bitmap_nested, + u32 msr, int type) +{ + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return; + Also, make this a WARN_ON. WIll do. Thanks, Wincy Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/6] KVM: nVMX: Enable nested posted interrupt processing.
On Wed, Jan 28, 2015 at 5:55 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 24/01/2015 11:24, Wincy Van wrote: if (!nested_cpu_has_virt_x2apic_mode(vmcs12) !nested_cpu_has_apic_reg_virt(vmcs12) - !nested_cpu_has_vid(vmcs12)) + !nested_cpu_has_vid(vmcs12) + !nested_cpu_has_posted_intr(vmcs12)) return 0; if (nested_cpu_has_virt_x2apic_mode(vmcs12)) r = nested_vmx_check_virt_x2apic(vcpu, vmcs12); if (nested_cpu_has_vid(vmcs12)) r |= nested_vmx_check_vid(vcpu, vmcs12); + if (nested_cpu_has_posted_intr(vmcs12)) + r |= nested_vmx_check_posted_intr(vcpu, vmcs12); These ifs are always true. Why? L1 may config these features seperately, we should check them one by one. e.g. L1 may enable posted interrupt processing and virtual interrupt delivery, but leaving virtualize x2apic mode disabled, then nested_cpu_has_virt_x2apic_mode will return false. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend
Hi Andre, On Tue, Jan 27, 2015 at 3:31 PM, Andre Przywara andre.przyw...@arm.com wrote: Hi Nikolay, On 24/01/15 11:59, Nikolay Nikolaev wrote: In io_mem_abort remove the call to vgic_handle_mmio. The target is to have a single MMIO handling path - that is through the kvm_io_bus_ API. Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region. Both read and write calls are redirected to vgic_io_dev_access where kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c|3 - include/kvm/arm_vgic.h |3 - virt/kvm/arm/vgic.c| 123 3 files changed, 114 insertions(+), 15 deletions(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index d852137..8dc2fde 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, fault_ipa, 0); } - if (vgic_handle_mmio(vcpu, run, mmio)) - return 1; - Why is this (whole patch) actually needed? Is that just to make it nicer by pulling everything under one umbrella? It started from this mail form Christofer: https://lkml.org/lkml/2014/3/28/403 For enabling ioeventfd you actually don't need this patch, right? Yes, we don't need it. (I am asking because this breaks GICv3 emulation, see below) if (handle_kernel_mmio(vcpu, run, mmio)) return 1; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 7c55dd5..60639b1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -237,6 +237,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct vgic_vm_ops vm_ops; + struct kvm_io_device*io_dev; #endif }; @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, bool level); void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio); #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel)) #define vgic_initialized(k) (!!((k)-arch.vgic.nr_cpus)) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0cc6ab6..195d2ba 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -31,6 +31,9 @@ #include asm/kvm_emulate.h #include asm/kvm_arm.h #include asm/kvm_mmu.h +#include asm/kvm.h + +#include iodev.h /* * How the whole thing works (courtesy of Christoffer Dall): @@ -77,6 +80,7 @@ #include vgic.h +static int vgic_register_kvm_io_dev(struct kvm *kvm); static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq) int kvm_vgic_map_resources(struct kvm *kvm) { + vgic_register_kvm_io_dev(kvm); return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic); } @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run, } /** - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC emulation * @vcpu: pointer to the vcpu performing the access - * @run: pointer to the kvm_run structure - * @mmio: pointer to the data describing the access + * @this: pointer to the kvm_io_device structure + * @addr: the MMIO address being accessed + * @len: the length of the accessed data + * @val: pointer to the value being written, + * or where the read operation will store its result + * @is_write: flag to show whether a write access is performed * - * returns true if the MMIO access has been performed in kernel space, - * and false if it needs to be emulated in user space. + * returns 0 if the MMIO access has been performed in kernel space, + * and 1 if it needs to be emulated in user space. * Calls the actual handling routine for the selected VGIC model. */ -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio) +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val, bool is_write) { - if (!irqchip_in_kernel(vcpu-kvm)) - return false; + struct kvm_exit_mmio mmio; + bool ret; + + mmio = (struct kvm_exit_mmio) { + .phys_addr = addr, +
Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
On Tue, Jan 27, 2015 at 11:21:38AM +, Marc Zyngier wrote: On 26/01/15 22:58, Christoffer Dall wrote: On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote: Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier marc.zyng...@arm.com This had some conflicts with dirty page logging. I fixed it up here, and also removed some trailing white space and mixed spaces/tabs that patch complained about: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes Thanks for doing so. --- arch/arm/include/asm/kvm_emulate.h | 10 + arch/arm/include/asm/kvm_host.h | 3 -- arch/arm/include/asm/kvm_mmu.h | 3 +- arch/arm/kvm/arm.c | 10 - arch/arm/kvm/coproc.c| 64 ++ arch/arm/kvm/coproc_a15.c| 2 +- arch/arm/kvm/coproc_a7.c | 2 +- arch/arm/kvm/mmu.c | 70 - arch/arm/kvm/trace.h | 39 +++ arch/arm64/include/asm/kvm_emulate.h | 10 + arch/arm64/include/asm/kvm_host.h| 3 -- arch/arm64/include/asm/kvm_mmu.h | 3 +- arch/arm64/kvm/sys_regs.c| 75 +--- 13 files changed, 155 insertions(+), 139 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 66ce176..7b01523 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) vcpu-arch.hcr = HCR_GUEST_MASK; } +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.hcr; +} + +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr) +{ + vcpu-arch.hcr = hcr; +} + static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 254e065..04b4ea0 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -125,9 +125,6 @@ struct kvm_vcpu_arch { * Anything that is not used directly from assembly code goes * here. */ - /* dcache set/way operation pending */ - int last_pcpu; - cpumask_t require_dcache_flush; /* Don't run the guest on this vcpu */ bool pause; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 63e0ecc..286644c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, #define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x)) -void stage2_flush_vm(struct kvm *kvm); +void kvm_set_way_flush(struct kvm_vcpu *vcpu); +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled); #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 2d6d910..0b0d58a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu-cpu = cpu; vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); - /* - * Check whether this vcpu requires the cache to be flushed on - * this physical CPU. This is a consequence of doing dcache - * operations by set/way on this vcpu. We do it here to be in - * a non-preemptible section. - */ - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush)) - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */ - kvm_arm_set_running_vcpu(vcpu); } @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); vcpu-mode = OUTSIDE_GUEST_MODE; - vcpu-arch.last_pcpu = smp_processor_id(); kvm_guest_exit(); trace_kvm_exit(*vcpu_pc(vcpu)); /* diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 7928dbd..0afcc00 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@
Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend
Hi Nikolay, On 24/01/15 11:59, Nikolay Nikolaev wrote: In io_mem_abort remove the call to vgic_handle_mmio. The target is to have a single MMIO handling path - that is through the kvm_io_bus_ API. Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region. Both read and write calls are redirected to vgic_io_dev_access where kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c|3 - include/kvm/arm_vgic.h |3 - virt/kvm/arm/vgic.c| 123 3 files changed, 114 insertions(+), 15 deletions(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index d852137..8dc2fde 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, fault_ipa, 0); } - if (vgic_handle_mmio(vcpu, run, mmio)) - return 1; - Why is this (whole patch) actually needed? Is that just to make it nicer by pulling everything under one umbrella? For enabling ioeventfd you actually don't need this patch, right? (I am asking because this breaks GICv3 emulation, see below) if (handle_kernel_mmio(vcpu, run, mmio)) return 1; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 7c55dd5..60639b1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -237,6 +237,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct vgic_vm_ops vm_ops; + struct kvm_io_device*io_dev; #endif }; @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, bool level); void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio); #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel)) #define vgic_initialized(k) (!!((k)-arch.vgic.nr_cpus)) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0cc6ab6..195d2ba 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -31,6 +31,9 @@ #include asm/kvm_emulate.h #include asm/kvm_arm.h #include asm/kvm_mmu.h +#include asm/kvm.h + +#include iodev.h /* * How the whole thing works (courtesy of Christoffer Dall): @@ -77,6 +80,7 @@ #include vgic.h +static int vgic_register_kvm_io_dev(struct kvm *kvm); static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq) int kvm_vgic_map_resources(struct kvm *kvm) { + vgic_register_kvm_io_dev(kvm); return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic); } @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run, } /** - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC emulation * @vcpu: pointer to the vcpu performing the access - * @run: pointer to the kvm_run structure - * @mmio: pointer to the data describing the access + * @this: pointer to the kvm_io_device structure + * @addr: the MMIO address being accessed + * @len: the length of the accessed data + * @val: pointer to the value being written, + * or where the read operation will store its result + * @is_write: flag to show whether a write access is performed * - * returns true if the MMIO access has been performed in kernel space, - * and false if it needs to be emulated in user space. + * returns 0 if the MMIO access has been performed in kernel space, + * and 1 if it needs to be emulated in user space. * Calls the actual handling routine for the selected VGIC model. */ -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio) +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val, bool is_write) { - if (!irqchip_in_kernel(vcpu-kvm)) - return false; + struct kvm_exit_mmio mmio; + bool ret; + + mmio = (struct kvm_exit_mmio) { + .phys_addr = addr, + .len = len, + .is_write = is_write, + }; + + if (is_write) + memcpy(mmio.data, val, len); /* * This will currently call either vgic_v2_handle_mmio() or * vgic_v3_handle_mmio(), which in turn will call * vgic_handle_mmio_range()
KVM call for agenda for 2015-02-03
Hi Please, send any topic that you are interested in covering. Thanks, Juan. Call details: By popular demand, a google calendar public entry with it https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ (Let me know if you have any problems with the calendar entry. I just gave up about getting right at the same time CEST, CET, EDT and DST). If you need phone number details, contact me privately Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend
Hi, On 27/01/15 17:26, Eric Auger wrote: On 01/27/2015 05:51 PM, Nikolay Nikolaev wrote: Hi Andre, On Tue, Jan 27, 2015 at 3:31 PM, Andre Przywara andre.przyw...@arm.com wrote: Hi Nikolay, On 24/01/15 11:59, Nikolay Nikolaev wrote: In io_mem_abort remove the call to vgic_handle_mmio. The target is to have a single MMIO handling path - that is through the kvm_io_bus_ API. Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region. Both read and write calls are redirected to vgic_io_dev_access where kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com --- arch/arm/kvm/mmio.c|3 - include/kvm/arm_vgic.h |3 - virt/kvm/arm/vgic.c| 123 3 files changed, 114 insertions(+), 15 deletions(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index d852137..8dc2fde 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, fault_ipa, 0); } - if (vgic_handle_mmio(vcpu, run, mmio)) - return 1; - Why is this (whole patch) actually needed? Is that just to make it nicer by pulling everything under one umbrella? It started from this mail form Christofer: https://lkml.org/lkml/2014/3/28/403 Hi Nikolay, Andre, I also understood that the target was to handle all kernel mmio through the same API, hence the first patch. This patch shows that at least for GICv2 it was doable without upheavals in vgic code and it also serves ioeventd which is good. Andre do you think the price to pay to integrate missing redistributors and forthcoming components is too high? Hopefully not, actually I reckon that moving the upper level MMIO dispatching out of vgic.c and letting the specific VGIC models register what they need themselves (in their -emul.c files) sounds quite promising. But this particular patch does not serve this purpose: a) we replace two lines with a bunch of more layered code b) we copy the MMIOed data to convert between the interfaces c) we miss GICv3 emulation So this needs to be addressed in a more general way (which maybe I will give a try). That being sad I don't see why we would need to do this right now and hold back ioeventfd by this rather orthogonal issue. Christoffer, what's your take on this? Cheers, Andre. Best Regards Eric For enabling ioeventfd you actually don't need this patch, right? Yes, we don't need it. (I am asking because this breaks GICv3 emulation, see below) if (handle_kernel_mmio(vcpu, run, mmio)) return 1; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 7c55dd5..60639b1 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -237,6 +237,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct vgic_vm_ops vm_ops; + struct kvm_io_device*io_dev; #endif }; @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, bool level); void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio); #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel)) #define vgic_initialized(k) (!!((k)-arch.vgic.nr_cpus)) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0cc6ab6..195d2ba 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -31,6 +31,9 @@ #include asm/kvm_emulate.h #include asm/kvm_arm.h #include asm/kvm_mmu.h +#include asm/kvm.h + +#include iodev.h /* * How the whole thing works (courtesy of Christoffer Dall): @@ -77,6 +80,7 @@ #include vgic.h +static int vgic_register_kvm_io_dev(struct kvm *kvm); static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq) int kvm_vgic_map_resources(struct kvm *kvm) { + vgic_register_kvm_io_dev(kvm); return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic); } @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run, } /** - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC emulation * @vcpu: pointer to the vcpu performing the access - * @run: pointer to the kvm_run structure - * @mmio: pointer to the data describing the access + * @this: pointer to the kvm_io_device structure + * @addr: the MMIO address being
RE: [v3 00/26] Add VT-d Posted-Interrupts support
-Original Message- From: Wu, Feng Sent: Wednesday, January 21, 2015 10:26 AM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support -Original Message- From: Wu, Feng Sent: Friday, December 12, 2014 11:15 PM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: [v3 00/26] Add VT-d Posted-Interrupts support VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html v1-v2: * Use VFIO framework to enable this feature, the VFIO part of this series is base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, then revise some irq logic based on the new hierarchy irqdomain patches provided by Jiang Liu jiang@linux.intel.com v2-v3: * Adjust the Posted-interrupts Descriptor updating logic when vCPU is preempted or blocked. * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which can be used to change back to remapping mode. * Fix typo This patch series is made of the following groups: 1-6: Some preparation changes in iommu and irq component, this is based on the new hierarchy irqdomain logic. 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection, command line parameter. 10-17, 22-25: Changes related to KVM itself. 18-20: Changes in VFIO component, this part was previously sent out as [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d Posted-Interrupts 21: x86 irq related changes Feng Wu (26): genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU iommu: Add new member capability to struct irq_remap_ops iommu, x86: Define new irte structure for VT-d Posted-Interrupts iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller iommu, x86: No need to migrating irq for VT-d Posted-Interrupts iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Add intel_irq_remapping_capability() for Intel iommu, x86: define irq_remapping_cap() KVM: change struct pi_desc for VT-d Posted-Interrupts KVM: Add some helper functions for Posted-Interrupts KVM: Initialize VT-d Posted-Interrupts Descriptor KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu KVM: add interfaces to control PI outside vmx KVM: Make struct kvm_irq_routing_table accessible KVM: make kvm_set_msi_irq() public KVM: kvm-vfio: User API for VT-d Posted-Interrupts KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts KVM: x86: kvm-vfio: VT-d posted-interrupts setup x86, irq: Define a global vector for VT-d Posted-Interrupts KVM: Define a wakeup worker thread for vCPU KVM: Update Posted-Interrupts Descriptor when vCPU is preempted KVM: Update Posted-Interrupts Descriptor when vCPU is blocked KVM: Suppress posted-interrupt when 'SN' is set iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Documentation/kernel-parameters.txt| 1 + Documentation/virtual/kvm/devices/vfio.txt | 9 ++ arch/x86/include/asm/entry_arch.h | 2 + arch/x86/include/asm/hardirq.h | 1 + arch/x86/include/asm/hw_irq.h | 2 + arch/x86/include/asm/irq_remapping.h | 11 ++ arch/x86/include/asm/irq_vectors.h | 1 + arch/x86/include/asm/kvm_host.h| 12 ++ arch/x86/kernel/apic/msi.c | 1 + arch/x86/kernel/entry_64.S | 2 + arch/x86/kernel/irq.c | 27 arch/x86/kernel/irqinit.c | 2 + arch/x86/kvm/Makefile
[PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support
This patch series adds Page Modification Logging (PML) support in VMX. 1) Introduction PML is a new feature on Intel's Boardwell server platfrom targeted to reduce overhead of dirty logging mechanism. The specification can be found at: http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html Currently, dirty logging is done by write protection, which write protects guest memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works fine, except with overhead of additional write fault for logging each dirty GFN. The overhead can be large if the write operations from geust is intensive. PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from 0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU decreases PML index after logging one GPA, and eventually a PML buffer full VMEXIT happens when PML buffer is fully logged. With PML, we don't have to use write protection so the intensive write fault EPT violation can be avoided, with an additional PML buffer full VMEXIT for 512 dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in dirty logging mode, and therefore more CPU cycles can be allocated to guest, so it's expected benchmarks in guest will have better performance comparing to non-PML. 2) Design a. Enable/Disable PML PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need to enable/disable PML for all vcpus of guest. A dedicated 4K page will be allocated for each vcpu when PML is enabled for that vcpu. Currently, we choose to always enable PML for guest, which means we enables PML when creating VCPU, and never disable it during guest's life time. This avoids the complicated logic to enable PML by demand when guest is running. And to eliminate potential unnecessary GPA logging in non-dirty logging mode, we set D-bit manually for the slots with dirty logging disabled. b. Flush PML buffer When userspace querys dirty_bitmap, it's possible that there are GPAs logged in vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this case, we'd better to manually flush PML buffer for all vcpus and update the dirty GPAs to dirty_bitmap. We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap more updated, and also makes logic of flushing PML buffer for all vcpus easier -- we only need to kick all vcpus out of guest and PML buffer for each vcpu will be flushed automatically. 3) Tests and benchmark results I tested specjbb benchmark, which is memory intensive to measure PML. All tests are done in below configuration: Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state, S-state disabled. Swap disabled. Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic Guest: 4 vcpus + 1G memory. All vcpus are pinned. a. Comapre score with and without PML enabled. This is to make sure PML won't bring any performance regression as it's always enabled for guest. Booting guest with graphic window (no --nographic) NOPML PML 109755 109379 108786 109300 109234 109663 109257 107471 108514 108904 109740 107623 avg:109214 108723 performance regression: (109214 - 108723) / 109214 = 0.45% Booting guest without graphic window (--nographic) NOPML PML 109090 109686 109461 110533 110523 108550 109960 110775 109090 109802 110787 109192 avg:109818 109756 performance regression: (109818 - 109756) / 109818 = 0.06% So there's no noticeable performance regression leaving PML always enabled. b. Compare specjbb score between PML and Write Protection. This is used to see how much performance gain PML can bring when guest is in dirty logging mode. I modified qemu by adding an additional Monitoring thread to query dirty_bitmap periodically (once per 1 second). With this thread, we can get performance gain of PML by comparing specjbb score under PML code path and write protection code path. Again, I got score for both with/without graphic window of guest. Booting guest with graphic window (no --nographic) PML WP No monitoring thread 104748 101358 102934 99895 103525 98832 105331 100678 106038 99476 104776 99851 avg:104558 100015 108723 (== PML score in test a) percent: 96.17% 91.99% 100%
Re: [v3 00/26] Add VT-d Posted-Interrupts support
On Wed, 2015-01-28 at 03:01 +, Wu, Feng wrote: -Original Message- From: Wu, Feng Sent: Wednesday, January 21, 2015 10:26 AM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support -Original Message- From: Wu, Feng Sent: Friday, December 12, 2014 11:15 PM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: [v3 00/26] Add VT-d Posted-Interrupts support VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html v1-v2: * Use VFIO framework to enable this feature, the VFIO part of this series is base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, then revise some irq logic based on the new hierarchy irqdomain patches provided by Jiang Liu jiang@linux.intel.com v2-v3: * Adjust the Posted-interrupts Descriptor updating logic when vCPU is preempted or blocked. * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which can be used to change back to remapping mode. * Fix typo This patch series is made of the following groups: 1-6: Some preparation changes in iommu and irq component, this is based on the new hierarchy irqdomain logic. 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection, command line parameter. 10-17, 22-25: Changes related to KVM itself. 18-20: Changes in VFIO component, this part was previously sent out as [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d Posted-Interrupts 21: x86 irq related changes Feng Wu (26): genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU iommu: Add new member capability to struct irq_remap_ops iommu, x86: Define new irte structure for VT-d Posted-Interrupts iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller iommu, x86: No need to migrating irq for VT-d Posted-Interrupts iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Add intel_irq_remapping_capability() for Intel iommu, x86: define irq_remapping_cap() KVM: change struct pi_desc for VT-d Posted-Interrupts KVM: Add some helper functions for Posted-Interrupts KVM: Initialize VT-d Posted-Interrupts Descriptor KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu KVM: add interfaces to control PI outside vmx KVM: Make struct kvm_irq_routing_table accessible KVM: make kvm_set_msi_irq() public KVM: kvm-vfio: User API for VT-d Posted-Interrupts KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts KVM: x86: kvm-vfio: VT-d posted-interrupts setup x86, irq: Define a global vector for VT-d Posted-Interrupts KVM: Define a wakeup worker thread for vCPU KVM: Update Posted-Interrupts Descriptor when vCPU is preempted KVM: Update Posted-Interrupts Descriptor when vCPU is blocked KVM: Suppress posted-interrupt when 'SN' is set iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Documentation/kernel-parameters.txt| 1 + Documentation/virtual/kvm/devices/vfio.txt | 9 ++ arch/x86/include/asm/entry_arch.h | 2 + arch/x86/include/asm/hardirq.h | 1 + arch/x86/include/asm/hw_irq.h | 2 + arch/x86/include/asm/irq_remapping.h | 11 ++ arch/x86/include/asm/irq_vectors.h | 1 + arch/x86/include/asm/kvm_host.h| 12 ++ arch/x86/kernel/apic/msi.c | 1 + arch/x86/kernel/entry_64.S