Re: [PATCH RFC] virtio-net: remove useless disable on freeze
On Mon, 28 May 2012 15:53:25 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:19:54PM +0300, Michael S. Tsirkin wrote: disable_cb is just an optimization: it can not guarantee that there are no callbacks. I didn't yet figure out whether a callback in freeze will trigger a bug, but disable_cb won't address it in any case. So let's remove the useless calls as a first step. Signed-off-by: Michael S. Tsirkin m...@redhat.com Looks like this isn't in the 3.5 pull request - just lost in the shuffle? disable_cb is advisory so can't be relied upon. I always (try to?) reply as I accept patches. This one did slip by, but it's harmless so no need to push AFAICT. Applied. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation/kvm : Add documentation on Hypercalls
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Thanks Alex for KVM_HC_FEATURES inputs and Jan for VAPIC_POLL_IRQ Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 000..c79335a --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,46 @@ +KVM Hypercalls Documentation +=== +The template for each hypercall is: +1. Hypercall name, value. +2. Architecture(s) +3. Status (deprecated, obsolete, active) +4. Purpose + +1. KVM_HC_VAPIC_POLL_IRQ + +Value: 1 +Architecture: x86 +Purpose: Trigger guest exit so that the host can check for pending +interrupts on reentry. + +2. KVM_HC_MMU_OP + +Value: 2 +Architecture: x86 +Status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES + +Value: 3 +Architecture: PPC +Status: active +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE + +Value: 4 +Architecture: PPC +Status: active +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-net: remove useless disable on freeze
Hi all, On Wed, 30 May 2012 19:41:47 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 28 May 2012 15:53:25 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:19:54PM +0300, Michael S. Tsirkin wrote: disable_cb is just an optimization: it can not guarantee that there are no callbacks. I didn't yet figure out whether a callback in freeze will trigger a bug, but disable_cb won't address it in any case. So let's remove the useless calls as a first step. Signed-off-by: Michael S. Tsirkin m...@redhat.com Looks like this isn't in the 3.5 pull request - just lost in the shuffle? disable_cb is advisory so can't be relied upon. I always (try to?) reply as I accept patches. This one did slip by, but it's harmless so no need to push AFAICT. Applied. This patch exists in two trees in linux-next already ... Davem's net tree (so presumably he will send it to Linus shortly) and Michael's vhost tree (is that tree needed any more?). Presumably it is now also in the rr tree? -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpsAKJ5kzVVL.pgp Description: PGP signature
Re: [PATCH RFC] virtio-net: remove useless disable on freeze
On Thu, May 31, 2012 at 06:35:08PM +1000, Stephen Rothwell wrote: Hi all, On Wed, 30 May 2012 19:41:47 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 28 May 2012 15:53:25 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:19:54PM +0300, Michael S. Tsirkin wrote: disable_cb is just an optimization: it can not guarantee that there are no callbacks. I didn't yet figure out whether a callback in freeze will trigger a bug, but disable_cb won't address it in any case. So let's remove the useless calls as a first step. Signed-off-by: Michael S. Tsirkin m...@redhat.com Looks like this isn't in the 3.5 pull request - just lost in the shuffle? disable_cb is advisory so can't be relied upon. I always (try to?) reply as I accept patches. This one did slip by, but it's harmless so no need to push AFAICT. Applied. This patch exists in two trees in linux-next already ... Davem's net tree (so presumably he will send it to Linus shortly) and Michael's vhost tree (is that tree needed any more?). Yes and I dropped the patch from there, just did not push yet. Presumably it is now also in the rr tree? -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-net: remove useless disable on freeze
On Thu, May 31, 2012 at 11:47:17AM +0300, Michael S. Tsirkin wrote: On Thu, May 31, 2012 at 06:35:08PM +1000, Stephen Rothwell wrote: Hi all, On Wed, 30 May 2012 19:41:47 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 28 May 2012 15:53:25 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:19:54PM +0300, Michael S. Tsirkin wrote: disable_cb is just an optimization: it can not guarantee that there are no callbacks. I didn't yet figure out whether a callback in freeze will trigger a bug, but disable_cb won't address it in any case. So let's remove the useless calls as a first step. Signed-off-by: Michael S. Tsirkin m...@redhat.com Looks like this isn't in the 3.5 pull request - just lost in the shuffle? disable_cb is advisory so can't be relied upon. I always (try to?) reply as I accept patches. This one did slip by, but it's harmless so no need to push AFAICT. Applied. This patch exists in two trees in linux-next already ... Davem's net tree (so presumably he will send it to Linus shortly) and Michael's vhost tree (is that tree needed any more?). Yes and I dropped the patch from there, just did not push yet. pushed. There's usually not too much in my tree but it helps me a lot that it's in linux-next. Presumably it is now also in the rr tree? -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] KVM: Optimize vcpu-requests slow path slightly
On 05/30/2012 11:03 PM, Marcelo Tosatti wrote: On Sun, May 20, 2012 at 04:49:27PM +0300, Avi Kivity wrote: Instead of using a atomic operation per active request, use just one to get all requests at once, then check them with local ops. This probably isn't any faster, since simultaneous requests are rare, but it does reduce code size. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 953e692..c0209eb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5232,55 +5232,58 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_int_win = !irqchip_in_kernel(vcpu-kvm) vcpu-run-request_interrupt_window; bool req_immediate_exit = 0; +ulong reqs; if (unlikely(req_int_win)) kvm_make_request(KVM_REQ_EVENT, vcpu); if (vcpu-requests) { -if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) +reqs = xchg(vcpu-requests, 0UL); + +if (test_bit(KVM_REQ_MMU_RELOAD, reqs)) kvm_mmu_unload(vcpu); -if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) +if (test_bit(KVM_REQ_MIGRATE_TIMER, reqs)) __kvm_migrate_timers(vcpu); -if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { +if (test_bit(KVM_REQ_CLOCK_UPDATE, reqs)) { r = kvm_guest_time_update(vcpu); if (unlikely(r)) goto out; } Bailing out loses requests in reqs. Whoops, good catch. Caching the requests makes the following type of sequence behave strangely req = xchg(vcpu-requests); if request is set request handler ... set REQ_EVENT ... prepare for guest entry vcpu-requests set bail I don't really mind that. But I do want to reduce the overhead of a request, they're not that rare in normal workloads. How about for_each_set_bit(req, vcpu-requests, BITS_PER_LONG) { clear_bit(bit, vcpu-requests); r = request_handlers[bit](vcpu); if (r) goto out; } ? That makes for O(1) handling since usually we only have one request set (KVM_REQ_EVENT). We'll make that the last one to avoid the scenario above. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] virt: Add Fedora 17 to the list of guests
Hi Lucas, F17 is not working on xen hypervisor without some changes. I'm going to prepare fixes for xen handling. Mirek - Original Message - From: Lucas Meneghel Rodrigues l...@redhat.com To: autot...@test.kernel.org Cc: kvm@vger.kernel.org Sent: Wednesday, May 30, 2012 11:20:55 PM Subject: [Autotest] [PATCH] virt: Add Fedora 17 to the list of guests Also, make it the default guest for KVM autotest. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests.cfg.sample | 18 - client/tests/libvirt/tests.cfg.sample | 21 ++-- client/virt/guest-os.cfg.sample | 28 ++ client/virt/unattended/Fedora-17.ks | 35 + client/virt/virt_utils.py |6 +++--- 5 files changed, 86 insertions(+), 22 deletions(-) create mode 100644 client/virt/unattended/Fedora-17.ks diff --git a/client/tests/kvm/tests.cfg.sample b/client/tests/kvm/tests.cfg.sample index 9329a05..912232e 100644 --- a/client/tests/kvm/tests.cfg.sample +++ b/client/tests/kvm/tests.cfg.sample @@ -57,8 +57,8 @@ variants: # Subtest choice. You can modify that line to add more subtests only unattended_install.cdrom, boot, shutdown -# Runs qemu, f16 64 bit guest OS, install, boot, shutdown -- @qemu_f16_quick: +# Runs qemu, f17 64 bit guest OS, install, boot, shutdown +- @qemu_f17_quick: # We want qemu for this run qemu_binary = /usr/bin/qemu qemu_img_binary = /usr/bin/qemu-img @@ -72,13 +72,13 @@ variants: only no_9p_export only no_pci_assignable only smallpages -only Fedora.16.64 +only Fedora.17.64 only unattended_install.cdrom.extra_cdrom_ks, boot, shutdown # qemu needs -enable-kvm on the cmdline extra_params += ' -enable-kvm' -# Runs qemu-kvm, f16 64 bit guest OS, install, boot, shutdown -- @qemu_kvm_f16_quick: +# Runs qemu-kvm, f17 64 bit guest OS, install, boot, shutdown +- @qemu_kvm_f17_quick: # We want qemu-kvm for this run qemu_binary = /usr/bin/qemu-kvm qemu_img_binary = /usr/bin/qemu-img @@ -90,10 +90,10 @@ variants: only no_9p_export only no_pci_assignable only smallpages -only Fedora.16.64 +only Fedora.17.64 only unattended_install.cdrom.extra_cdrom_ks, boot, shutdown -# Runs qemu-kvm, f16 64 bit guest OS, install, starts qemu-kvm +# Runs qemu-kvm, f17 64 bit guest OS, install, starts qemu-kvm # with 9P support and runs 9P CI tests - @qemu_kvm_9p_export: qemu_binary = /usr/bin/qemu-kvm @@ -106,7 +106,7 @@ variants: only no_pci_assignable only smallpages only 9p_export -only Fedora.16.64 +only Fedora.17.64 only unattended_install.cdrom.extra_cdrom_ks, boot, 9p.9p_ci, shutdown # Runs your own guest image (qcow2, can be adjusted), all migration tests @@ -129,4 +129,4 @@ variants: only migrate # Choose your test list from the testsets defined -only qemu_kvm_f16_quick +only qemu_kvm_f17_quick diff --git a/client/tests/libvirt/tests.cfg.sample b/client/tests/libvirt/tests.cfg.sample index 548caf8..2b17d50 100644 --- a/client/tests/libvirt/tests.cfg.sample +++ b/client/tests/libvirt/tests.cfg.sample @@ -6,8 +6,8 @@ include tests-shared.cfg variants: -# Runs virt-install, f16 64 bit guest OS, install, boot, shutdown -- @libvirt_f16_quick: +# Runs virt-install, f17 64 bit guest OS, install, boot, shutdown +- @libvirt_f17_quick: virt_install_binary = /usr/bin/virt-install qemu_img_binary = /usr/bin/qemu-img hvm_or_pv = hvm @@ -18,11 +18,12 @@ variants: only no_9p_export only no_pci_assignable only smallpages -only Fedora.16.64 +only Fedora.17.64 only unattended_install.cdrom.extra_cdrom_ks, boot, reboot, shutdown, remove_guest.with_disk -# Runs virt-install, f16 64 as a 64 bit PV guest OS, install, boot, shutdown -- @libvirt_xenpv_f16_quick: + +# Runs virt-install, f17 64 as a 64 bit PV guest OS, install, boot, shutdown +- @libvirt_xenpv_f17_quick: virt_install_binary = /usr/bin/virt-install qemu_img_binary = /usr/bin/qemu-img hvm_or_pv = pv @@ -34,12 +35,12 @@ variants: only no_9p_export only no_pci_assignable only smallpages -only Fedora.16.64 +only Fedora.17.64 only unattended_install.cdrom.http_ks, boot, reboot, shutdown, remove_guest.with_disk -# Runs virt-install, f16 64 as a 64 bit HVM (full virt) guest OS, +# Runs virt-install, f17 64 as a 64 bit HVM (full virt) guest OS, # install, boot,
Re: [PATCHv5 5/5] kvm: host side for eoi optimization
On Tue, May 22, 2012 at 05:05:47PM +0300, Michael S. Tsirkin wrote: Implementation of PV EOI using shared memory. This reduces the number of exits an interrupt causes as much as by half. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changed: - Guest EOI is not required - Register is tested ISR is automatically cleared on exit For testing results see description of previous patch 'kvm_para: guest side for eoi avoidance'. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- arch/x86/include/asm/kvm_host.h | 12 arch/x86/kvm/cpuid.c|1 + arch/x86/kvm/lapic.c| 135 +- arch/x86/kvm/lapic.h|2 + arch/x86/kvm/trace.h| 34 ++ arch/x86/kvm/x86.c |5 ++ 6 files changed, 185 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 32297f2..3ded418 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -174,6 +174,13 @@ enum { /* apic attention bits */ #define KVM_APIC_CHECK_VAPIC 0 +/* + * The following bit is set with PV-EOI, unset on EOI. + * We detect PV-EOI changes by guest by comparing + * this bit with PV-EOI in guest memory. + * See the implementation in apic_update_pv_eoi. + */ +#define KVM_APIC_PV_EOI_PENDING 1 /* * We don't want allocation failures within the mmu code, so we preallocate @@ -485,6 +492,11 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + } pv_eoi; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index eba8345..a9f155a 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_NOP_IO_DELAY) | (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | + (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); if (sched_info_on()) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0d2985d..9d804e7 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -309,6 +309,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.pv_eoi.data, val, + sizeof(val)); +} + +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.pv_eoi.data, val, + sizeof(*val)); +} + +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.pv_eoi.msr_val KVM_MSR_ENABLED; +} + +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu) +{ + u8 val; + if (pv_eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.pv_eoi.msr_val); + return val 0x1; +} + +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu) +{ + if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.pv_eoi.msr_val); + return; + } + __set_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); +} + +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) +{ + if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) 0) { + apic_debug(Can't clear EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.pv_eoi.msr_val); + return; + } + __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); +} + static inline int apic_find_highest_isr(struct kvm_lapic *apic) { int result; @@ -525,15 +573,18 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) return vcpu1-arch.apic_arb_prio - vcpu2-arch.apic_arb_prio; } -static void apic_set_eoi(struct kvm_lapic *apic) +static int apic_set_eoi(struct kvm_lapic *apic) { int vector = apic_find_highest_isr(apic); + + trace_kvm_eoi(apic, vector);
Re: [PATCHv2] kvm: optimize ISR lookups
On Wed, May 30, 2012 at 05:54:42PM -0300, Marcelo Tosatti wrote: On Tue, May 22, 2012 at 03:54:56PM +0300, Michael S. Tsirkin wrote: We perform ISR lookups twice: during interrupt injection and on EOI. Typical workloads only have a single bit set there. So we can avoid ISR scans by 1. counting bits as we set/clear them in ISR 2. if count is 1, caching the vector number 3. if count != 1, invalidating the cache The real purpose of this is enabling PV EOI which needs to quickly validate the vector. But non PV guests might also benefit. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- I am well aware of Thomas and Peter's suggestion of reworking APIC register handling in kvm instead of adding a cache like this patch does. This revision does *not* address that comment yet: it only corrects a bug in the original patch. Posting in this form for ease of testing. Changes from v1: replace ASSERT by BUG_ON, correcting inverted logic arch/x86/kvm/lapic.c | 51 - arch/x86/kvm/lapic.h |2 + 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 93c1574..0d2985d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap) clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); } +static inline int __apic_test_and_set_vector(int vec, void *bitmap) +{ + return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + +static inline int __apic_test_and_clear_vector(int vec, void *bitmap) +{ + return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + static inline int apic_hw_enabled(struct kvm_lapic *apic) { return (apic)-vcpu-arch.apic_base MSR_IA32_APICBASE_ENABLE; @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap) return fls(word[word_offset 2]) - 1 + (word_offset 5); } +static u8 count_vectors(void *bitmap) +{ + u32 *word = bitmap; + int word_offset; + u8 count = 0; + for (word_offset = 0; word_offset MAX_APIC_VECTOR 5; ++word_offset) + count += hweight32(word[word_offset 2]); + return count; +} + static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic) { apic-irr_pending = true; @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic) apic-irr_pending = true; } +static inline void apic_set_isr(int vec, struct kvm_lapic *apic) +{ + if (!__apic_test_and_set_vector(vec, apic-regs + APIC_ISR)) + ++apic-isr_count; + BUG_ON(apic-isr_count MAX_APIC_VECTOR); + if (likely(apic-isr_count == 1)) + apic-isr_cache = vec; + else + apic-isr_cache = -1; +} + +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) +{ + if (__apic_test_and_clear_vector(vec, apic-regs + APIC_ISR)) + --apic-isr_count; + BUG_ON(apic-isr_count 0); + apic-isr_cache = -1; +} + int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu-arch.apic; @@ -273,6 +312,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) static inline int apic_find_highest_isr(struct kvm_lapic *apic) { int result; + if (!apic-isr_count) + return -1; + if (likely(apic-isr_cache != -1)) assert(isr_count == 1). Looks fine otherwise. Gleb can you review please? I am not convinced we need to keep track of isr_count. isr_cache should be enough. Why setting isr_cache to -1 if isr_count 1? Just overwrite isr_cache values with newly injected vector and use it on next EOI. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 5/5] kvm: host side for eoi optimization
On Thu, May 31, 2012 at 12:57:10PM +0300, Gleb Natapov wrote: @@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu) apic_set_tpr(vcpu-arch.apic, data 0xff); } +/* + * apic_sync_pv_eoi_to_guest - called before vmentry + * + * Detect whether it's safe to enable PV EOI and + * if yes do so. + */ +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu, + struct kvm_lapic *apic) +{ + if (!pv_eoi_enabled(vcpu) || + /* IRR set or many bits in ISR: could be nested. */ + unlikely(apic-irr_pending) || + unlikely(apic-isr_count != 1) || Remind me why pv_eoi should not be set if there is more than one isr? There's a comment below: it might be safe but we do not bother: no easy way to know which interrupt has higher priority. In my testing more than one bit almost never happens in practice so not worth optimizing for. + /* Cache not set: safe but we don't bother. */ + unlikely(apic-isr_cache == -1) || + /* Need EOI to update ioapic. */ + unlikely(kvm_ioapic_handles_vector(vcpu-kvm, apic-isr_cache))) + return; + + pv_eoi_set_pending(apic-vcpu); +} + apic_sync_pv_eoi_to_guest() is not paired with apic_sync_pv_eoi_from_guest() if event injection is canceled. You can enter guest with stale pv_eoi bit. Never. The pv_eoi bit is cleared on each exit. It will stay cleared unless we set it here. I will add a comment. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm: optimize ISR lookups
On Thu, May 31, 2012 at 01:06:57PM +0300, Gleb Natapov wrote: On Wed, May 30, 2012 at 05:54:42PM -0300, Marcelo Tosatti wrote: On Tue, May 22, 2012 at 03:54:56PM +0300, Michael S. Tsirkin wrote: We perform ISR lookups twice: during interrupt injection and on EOI. Typical workloads only have a single bit set there. So we can avoid ISR scans by 1. counting bits as we set/clear them in ISR 2. if count is 1, caching the vector number 3. if count != 1, invalidating the cache The real purpose of this is enabling PV EOI which needs to quickly validate the vector. But non PV guests might also benefit. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- I am well aware of Thomas and Peter's suggestion of reworking APIC register handling in kvm instead of adding a cache like this patch does. This revision does *not* address that comment yet: it only corrects a bug in the original patch. Posting in this form for ease of testing. Changes from v1: replace ASSERT by BUG_ON, correcting inverted logic arch/x86/kvm/lapic.c | 51 - arch/x86/kvm/lapic.h |2 + 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 93c1574..0d2985d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap) clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); } +static inline int __apic_test_and_set_vector(int vec, void *bitmap) +{ + return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + +static inline int __apic_test_and_clear_vector(int vec, void *bitmap) +{ + return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + static inline int apic_hw_enabled(struct kvm_lapic *apic) { return (apic)-vcpu-arch.apic_base MSR_IA32_APICBASE_ENABLE; @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap) return fls(word[word_offset 2]) - 1 + (word_offset 5); } +static u8 count_vectors(void *bitmap) +{ + u32 *word = bitmap; + int word_offset; + u8 count = 0; + for (word_offset = 0; word_offset MAX_APIC_VECTOR 5; ++word_offset) + count += hweight32(word[word_offset 2]); + return count; +} + static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic) { apic-irr_pending = true; @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic) apic-irr_pending = true; } +static inline void apic_set_isr(int vec, struct kvm_lapic *apic) +{ + if (!__apic_test_and_set_vector(vec, apic-regs + APIC_ISR)) + ++apic-isr_count; + BUG_ON(apic-isr_count MAX_APIC_VECTOR); + if (likely(apic-isr_count == 1)) + apic-isr_cache = vec; + else + apic-isr_cache = -1; +} + +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) +{ + if (__apic_test_and_clear_vector(vec, apic-regs + APIC_ISR)) + --apic-isr_count; + BUG_ON(apic-isr_count 0); + apic-isr_cache = -1; +} + int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu-arch.apic; @@ -273,6 +312,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) static inline int apic_find_highest_isr(struct kvm_lapic *apic) { int result; + if (!apic-isr_count) + return -1; + if (likely(apic-isr_cache != -1)) assert(isr_count == 1). Looks fine otherwise. Gleb can you review please? I am not convinced we need to keep track of isr_count. isr_cache should be enough. Why setting isr_cache to -1 if isr_count 1? So that data path can check cache and if != -1 know it is valid: we check the cache more than we set it. I'll add a comment to explain this. Just overwrite isr_cache values with newly injected vector and use it on next EOI. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 5/5] kvm: host side for eoi optimization
On Thu, May 31, 2012 at 01:11:11PM +0300, Michael S. Tsirkin wrote: On Thu, May 31, 2012 at 12:57:10PM +0300, Gleb Natapov wrote: @@ -1345,17 +1437,40 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu) apic_set_tpr(vcpu-arch.apic, data 0xff); } +/* + * apic_sync_pv_eoi_to_guest - called before vmentry + * + * Detect whether it's safe to enable PV EOI and + * if yes do so. + */ +static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu, + struct kvm_lapic *apic) +{ + if (!pv_eoi_enabled(vcpu) || + /* IRR set or many bits in ISR: could be nested. */ + unlikely(apic-irr_pending) || + unlikely(apic-isr_count != 1) || Remind me why pv_eoi should not be set if there is more than one isr? There's a comment below: it might be safe but we do not bother: no easy way to know which interrupt has higher priority. Last injected interrupt has highest priority. In my testing more than one bit almost never happens in practice so not worth optimizing for. + /* Cache not set: safe but we don't bother. */ + unlikely(apic-isr_cache == -1) || + /* Need EOI to update ioapic. */ + unlikely(kvm_ioapic_handles_vector(vcpu-kvm, apic-isr_cache))) + return; + + pv_eoi_set_pending(apic-vcpu); +} + apic_sync_pv_eoi_to_guest() is not paired with apic_sync_pv_eoi_from_guest() if event injection is canceled. You can enter guest with stale pv_eoi bit. Never. The pv_eoi bit is cleared on each exit. There will be no exit since there will be no entry. Search for goto out after kvm_lapic_sync_to_vapic(). It will stay cleared unless we set it here. I will add a comment. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix migration of real mode guests from host with unrestricted guest to a host without it.
For example migration between Westmere and Nehelem hosts. The patch fixes the guest segments similar to enter_rmode function. Signed-off-by: Orit Wasserman owass...@rehdat.com --- arch/x86/kvm/vmx.c | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..2eca18f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3229,6 +3229,44 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, vmcs_write32(sf-ar_bytes, ar); __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); + + /* +* Fix segments for real mode guest in hosts that don't have +* unrestricted_mode or it was disabled. +* This is done to allow migration of the guests from hosts with +* unrestricted guest like Westmere to older host that don't have +* unrestricted guest like Nehelem. +*/ + if (!enable_unrestricted_guest vmx-rmode.vm86_active) { + switch (seg) { + case VCPU_SREG_CS: + vmcs_write32(GUEST_CS_AR_BYTES, 0xf3); + vmcs_write32(GUEST_CS_LIMIT, 0x); + if (vmcs_readl(GUEST_CS_BASE) == 0x) + vmcs_writel(GUEST_CS_BASE, 0xf); + vmcs_write16(GUEST_CS_SELECTOR, +vmcs_readl(GUEST_CS_BASE) 4); + break; + case VCPU_SREG_ES: + fix_rmode_seg(VCPU_SREG_ES, vmx-rmode.es); + break; + case VCPU_SREG_DS: + fix_rmode_seg(VCPU_SREG_DS, vmx-rmode.ds); + break; + case VCPU_SREG_GS: + fix_rmode_seg(VCPU_SREG_GS, vmx-rmode.gs); + break; + case VCPU_SREG_FS: + fix_rmode_seg(VCPU_SREG_FS, vmx-rmode.fs); + break; + case VCPU_SREG_SS: + vmcs_write16(GUEST_SS_SELECTOR, +vmcs_readl(GUEST_SS_BASE) 4); + vmcs_write32(GUEST_SS_LIMIT, 0x); + vmcs_write32(GUEST_SS_AR_BYTES, 0xf3); + break; + } + } } static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Fix migration of real mode guests from host with unrestricted guest to a host without it.
For example migration between Westmere and Nehelem hosts. The code that fixes the segments for real mode guest was moved from enter_rmode to vmx_set_segments. enter_rmode calls vmx_set_segments for each segment. Signed-off-by: Orit Wasserman owass...@rehdat.com --- arch/x86/kvm/vmx.c | 70 +++- 1 files changed, 58 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..548c9b5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -615,6 +615,10 @@ static void kvm_cpu_vmxon(u64 addr); static void kvm_cpu_vmxoff(void); static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr); +static void vmx_set_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg); +static void vmx_get_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2770,6 +2774,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu) { unsigned long flags; struct vcpu_vmx *vmx = to_vmx(vcpu); + struct kvm_segment var; if (enable_unrestricted_guest) return; @@ -2813,20 +2818,23 @@ static void enter_rmode(struct kvm_vcpu *vcpu) if (emulate_invalid_guest_state) goto continue_rmode; - vmcs_write16(GUEST_SS_SELECTOR, vmcs_readl(GUEST_SS_BASE) 4); - vmcs_write32(GUEST_SS_LIMIT, 0x); - vmcs_write32(GUEST_SS_AR_BYTES, 0xf3); + vmx_get_segment(vcpu, var, VCPU_SREG_SS); + vmx_set_segment(vcpu, var, VCPU_SREG_SS); + + vmx_get_segment(vcpu, var, VCPU_SREG_CS); + vmx_set_segment(vcpu, var, VCPU_SREG_CS); + + vmx_get_segment(vcpu, var, VCPU_SREG_ES); + vmx_set_segment(vcpu, var, VCPU_SREG_ES); - vmcs_write32(GUEST_CS_AR_BYTES, 0xf3); - vmcs_write32(GUEST_CS_LIMIT, 0x); - if (vmcs_readl(GUEST_CS_BASE) == 0x) - vmcs_writel(GUEST_CS_BASE, 0xf); - vmcs_write16(GUEST_CS_SELECTOR, vmcs_readl(GUEST_CS_BASE) 4); + vmx_get_segment(vcpu, var, VCPU_SREG_DS); + vmx_set_segment(vcpu, var, VCPU_SREG_DS); - fix_rmode_seg(VCPU_SREG_ES, vmx-rmode.es); - fix_rmode_seg(VCPU_SREG_DS, vmx-rmode.ds); - fix_rmode_seg(VCPU_SREG_GS, vmx-rmode.gs); - fix_rmode_seg(VCPU_SREG_FS, vmx-rmode.fs); + vmx_get_segment(vcpu, var, VCPU_SREG_GS); + vmx_set_segment(vcpu, var, VCPU_SREG_GS); + + vmx_get_segment(vcpu, var, VCPU_SREG_FS); + vmx_set_segment(vcpu, var, VCPU_SREG_FS); continue_rmode: kvm_mmu_reset_context(vcpu); @@ -3229,6 +3237,44 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, vmcs_write32(sf-ar_bytes, ar); __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); + + /* +* Fix segments for real mode guest in hosts that don't have +* unrestricted_mode or it was disabled. +* This is done to allow migration of the guests from hosts with +* unrestricted guest like Westmere to older host that don't have +* unrestricted guest like Nehelem. +*/ + if (!enable_unrestricted_guest vmx-rmode.vm86_active) { + switch (seg) { + case VCPU_SREG_CS: + vmcs_write32(GUEST_CS_AR_BYTES, 0xf3); + vmcs_write32(GUEST_CS_LIMIT, 0x); + if (vmcs_readl(GUEST_CS_BASE) == 0x) + vmcs_writel(GUEST_CS_BASE, 0xf); + vmcs_write16(GUEST_CS_SELECTOR, +vmcs_readl(GUEST_CS_BASE) 4); + break; + case VCPU_SREG_ES: + fix_rmode_seg(VCPU_SREG_ES, vmx-rmode.es); + break; + case VCPU_SREG_DS: + fix_rmode_seg(VCPU_SREG_DS, vmx-rmode.ds); + break; + case VCPU_SREG_GS: + fix_rmode_seg(VCPU_SREG_GS, vmx-rmode.gs); + break; + case VCPU_SREG_FS: + fix_rmode_seg(VCPU_SREG_FS, vmx-rmode.fs); + break; + case VCPU_SREG_SS: + vmcs_write16(GUEST_SS_SELECTOR, +vmcs_readl(GUEST_SS_BASE) 4); + vmcs_write32(GUEST_SS_LIMIT, 0x); + vmcs_write32(GUEST_SS_AR_BYTES, 0xf3); + break; + } + } } static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] Fix migration of real mode guests from host with unrestricted guest to a host without it.
On 05/31/2012 02:13 PM, Orit Wasserman wrote: For example migration between Westmere and Nehelem hosts. The patch fixes the guest segments similar to enter_rmode function. Signed-off-by: Orit Wasserman owass...@rehdat.com --- arch/x86/kvm/vmx.c | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..2eca18f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3229,6 +3229,44 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, vmcs_write32(sf-ar_bytes, ar); __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); + + /* + * Fix segments for real mode guest in hosts that don't have + * unrestricted_mode or it was disabled. + * This is done to allow migration of the guests from hosts with + * unrestricted guest like Westmere to older host that don't have + * unrestricted guest like Nehelem. + */ + if (!enable_unrestricted_guest vmx-rmode.vm86_active) { + switch (seg) { + case VCPU_SREG_CS: + vmcs_write32(GUEST_CS_AR_BYTES, 0xf3); + vmcs_write32(GUEST_CS_LIMIT, 0x); + if (vmcs_readl(GUEST_CS_BASE) == 0x) + vmcs_writel(GUEST_CS_BASE, 0xf); + vmcs_write16(GUEST_CS_SELECTOR, + vmcs_readl(GUEST_CS_BASE) 4); + break; + case VCPU_SREG_ES: + fix_rmode_seg(VCPU_SREG_ES, vmx-rmode.es); + break; + case VCPU_SREG_DS: + fix_rmode_seg(VCPU_SREG_DS, vmx-rmode.ds); + break; + case VCPU_SREG_GS: + fix_rmode_seg(VCPU_SREG_GS, vmx-rmode.gs); + break; + case VCPU_SREG_FS: + fix_rmode_seg(VCPU_SREG_FS, vmx-rmode.fs); + break; + case VCPU_SREG_SS: + vmcs_write16(GUEST_SS_SELECTOR, + vmcs_readl(GUEST_SS_BASE) 4); + vmcs_write32(GUEST_SS_LIMIT, 0x); + vmcs_write32(GUEST_SS_AR_BYTES, 0xf3); + break; + } + } } This duplicates the code in enter_rmode(). Please change enter_rmode() to vmx_set_segment(vmx_get_segment()) to avoid duplication. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm: optimize ISR lookups
On Thu, May 31, 2012 at 01:06:57PM +0300, Gleb Natapov wrote: On Wed, May 30, 2012 at 05:54:42PM -0300, Marcelo Tosatti wrote: On Tue, May 22, 2012 at 03:54:56PM +0300, Michael S. Tsirkin wrote: We perform ISR lookups twice: during interrupt injection and on EOI. Typical workloads only have a single bit set there. So we can avoid ISR scans by 1. counting bits as we set/clear them in ISR 2. if count is 1, caching the vector number 3. if count != 1, invalidating the cache The real purpose of this is enabling PV EOI which needs to quickly validate the vector. But non PV guests might also benefit. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- I am well aware of Thomas and Peter's suggestion of reworking APIC register handling in kvm instead of adding a cache like this patch does. This revision does *not* address that comment yet: it only corrects a bug in the original patch. Posting in this form for ease of testing. Changes from v1: replace ASSERT by BUG_ON, correcting inverted logic arch/x86/kvm/lapic.c | 51 - arch/x86/kvm/lapic.h |2 + 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 93c1574..0d2985d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap) clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); } +static inline int __apic_test_and_set_vector(int vec, void *bitmap) +{ + return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + +static inline int __apic_test_and_clear_vector(int vec, void *bitmap) +{ + return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); +} + static inline int apic_hw_enabled(struct kvm_lapic *apic) { return (apic)-vcpu-arch.apic_base MSR_IA32_APICBASE_ENABLE; @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap) return fls(word[word_offset 2]) - 1 + (word_offset 5); } +static u8 count_vectors(void *bitmap) +{ + u32 *word = bitmap; + int word_offset; + u8 count = 0; + for (word_offset = 0; word_offset MAX_APIC_VECTOR 5; ++word_offset) + count += hweight32(word[word_offset 2]); + return count; +} + static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic) { apic-irr_pending = true; @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic) apic-irr_pending = true; } +static inline void apic_set_isr(int vec, struct kvm_lapic *apic) +{ + if (!__apic_test_and_set_vector(vec, apic-regs + APIC_ISR)) + ++apic-isr_count; + BUG_ON(apic-isr_count MAX_APIC_VECTOR); + if (likely(apic-isr_count == 1)) + apic-isr_cache = vec; + else + apic-isr_cache = -1; +} + +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) +{ + if (__apic_test_and_clear_vector(vec, apic-regs + APIC_ISR)) + --apic-isr_count; + BUG_ON(apic-isr_count 0); + apic-isr_cache = -1; +} + int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu-arch.apic; @@ -273,6 +312,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) static inline int apic_find_highest_isr(struct kvm_lapic *apic) { int result; + if (!apic-isr_count) + return -1; + if (likely(apic-isr_cache != -1)) assert(isr_count == 1). Looks fine otherwise. Gleb can you review please? I am not convinced we need to keep track of isr_count. isr_cache should be enough. Why setting isr_cache to -1 if isr_count 1? Just overwrite isr_cache values with newly injected vector and use it on next EOI. I thought about this some more. This counter is a win because apic_find_highest_isr is called twice: 1. apic_set_eoi calls apic_find_highest_isr before clearing 2. apic_update_ppr calls apic_find_highest_isr after clearing So with isr_count you never scan ISR unless there is nesting. So we do need isr_count. I agree we can make this work for more cases because the latest ISR set is always the highest one. Ronen suggested such an optimization offline too, and it makes code simpler so I'll implement this optimization. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 5/5] kvm: host side for eoi optimization
On Thu, May 31, 2012 at 01:15:13PM +0300, Gleb Natapov wrote: + /* Cache not set: safe but we don't bother. */ + unlikely(apic-isr_cache == -1) || + /* Need EOI to update ioapic. */ + unlikely(kvm_ioapic_handles_vector(vcpu-kvm, apic-isr_cache))) + return; + + pv_eoi_set_pending(apic-vcpu); +} + apic_sync_pv_eoi_to_guest() is not paired with apic_sync_pv_eoi_from_guest() if event injection is canceled. You can enter guest with stale pv_eoi bit. Never. The pv_eoi bit is cleared on each exit. There will be no exit since there will be no entry. Search for goto out after kvm_lapic_sync_to_vapic(). I think you've found a bug, thanks a bunch. I have fixed it but the x86 guys asked me not to post more patches until merge window closes :( So I'll sit on a fix for several days. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls
On 05/31/2012 01:01 AM, Raghavendra K T wrote: From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Thanks Alex for KVM_HC_FEATURES inputs and Jan for VAPIC_POLL_IRQ Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 000..c79335a --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,46 @@ +KVM Hypercalls Documentation +=== +The template for each hypercall is: +1. Hypercall name, value. +2. Architecture(s) +3. Status (deprecated, obsolete, active) +4. Purpose + +1. KVM_HC_VAPIC_POLL_IRQ + +Value: 1 +Architecture: x86 No Status, does that imply Active ? +Purpose: Trigger guest exit so that the host can check for pending +interrupts on reentry. + +2. KVM_HC_MMU_OP + +Value: 2 +Architecture: x86 +Status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES + +Value: 3 +Architecture: PPC +Status: active +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE + +Value: 4 +Architecture: PPC +Status: active +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls. -- -- ~Randy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls
On 05/31/2012 01:01 AM, Raghavendra K T wrote: + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls. 1. definitely, including the hypercall ABI. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls
On 2012-05-31 19:44, Randy Dunlap wrote: On 05/31/2012 01:01 AM, Raghavendra K T wrote: From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Thanks Alex for KVM_HC_FEATURES inputs and Jan for VAPIC_POLL_IRQ Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 000..c79335a --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,46 @@ +KVM Hypercalls Documentation +=== +The template for each hypercall is: +1. Hypercall name, value. +2. Architecture(s) +3. Status (deprecated, obsolete, active) +4. Purpose + +1. KVM_HC_VAPIC_POLL_IRQ + +Value: 1 +Architecture: x86 No Status, does that imply Active ? It should actually state that this call is in use. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls
On 05/31/2012 11:55 AM, Jan Kiszka wrote: No Status, does that imply Active ? It should actually state that this call is in use. It might be nice to point out when a call was introduced, and for deprecated calls, when it was deprecated. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add PCID/INVPCID test
On Fri, May 18, 2012 at 06:27:27AM +, Mao, Junjie wrote: Add unit test for basic functionality of PCID/INVPCID feature exposing in kvm. Changes from v1: Hard code 'invpcid' instruction Signed-off-by: Junjie Mao junjie@intel.com --- config-x86-common.mak |2 + config-x86_64.mak |3 +- x86/README|1 + x86/pcid.c| 185 + x86/unittests.cfg |4 + 5 files changed, 194 insertions(+), 1 deletions(-) create mode 100644 x86/pcid.c Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ia64: Mark ia64 KVM as BROKEN
On Thu, May 17, 2012 at 01:14:08PM +0300, Avi Kivity wrote: Practically all patches to ia64 KVM are build fixes; numerous warnings remain; the last patch from the maintainer was committed more than three years ago. It is clear that no one is using this thing. Mark as BROKEN to ensure people don't get hit by pointless build problems. Signed-off-by: Avi Kivity a...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] qemu-kvm-1.1-rc4
qemu-kvm-1.1-rc4 is now available. This release is based on the upstream qemu 1.1-rc4, plus kvm-specific enhancements. Please see the original QEMU 1.1-rc4 release announcement [1] for details. This release can be used with the kvm kernel modules provided by your distribution kernel, or by the modules in the kvm-kmod package, such as kvm-kmod-3.1. http://www.linux-kvm.org [1] http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04401.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio_blk: unlock vblk-lock during kick
Hello Stefan, On 05/30/2012 09:19 PM, Stefan Hajnoczi wrote: Holding the vblk-lock across kick causes poor scalability in SMP guests. If one CPU is doing virtqueue kick and another CPU touches the vblk-lock it will have to spin until virtqueue kick completes. This patch reduces system% CPU utilization in SMP guests that are running multithreaded I/O-bound workloads. The improvements are small but show as iops and SMP are increased. Khoa Huynhk...@us.ibm.com provided initial performance data that indicates this optimization is worthwhile at high iops. Asias Heas...@redhat.com reports the following fio results: Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux Guest: same as host kernel Average 3 runs: with locked kick - readiops=119907.50 bw=59954.00 runt=35018.50 io=2048.00 write iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00 readiops=33948.00 bw=16974.50 runt=186820.50 io=3095.70 write iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70 clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00 clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00 clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00 clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00 cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00 cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50 cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00 cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50 with unlocked kick - readiops=118559.00 bw=59279.66 runt=35400.66 io=2048.00 write iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00 readiops=34567.66 bw=17284.00 runt=183497.33 io=3095.70 write iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70 clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00 clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00 clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00 clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00 cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33 cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00 cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33 cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00 FIO config file - [global] exec_prerun=echo 3 /proc/sys/vm/drop_caches group_reporting norandommap ioscheduler=noop thread bs=512 size=4MB direct=1 filename=/dev/vdb numjobs=256 ioengine=aio iodepth=64 loops=3 Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com --- drivers/block/virtio_blk.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 693187d..1a50f41 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -201,8 +201,14 @@ static void do_virtblk_request(struct request_queue *q) issued++; } - if (issued) - virtqueue_kick(vblk-vq); + if (!issued) + return; + + if (virtqueue_kick_prepare(vblk-vq)) { + spin_unlock(vblk-lock); + virtqueue_notify(vblk-vq); + spin_lock(vblk-lock); + } } Could you use vblk-disk-queue-queue_lock to reference the lock so that this patch will work on top of this one: virtio-blk: Use block layer provided spinlock BTW. Why the function name is changed to virtqueue_notify() from virtqueue_kick_notify()? Seems Rusty renamed it. I think the latter name is better because it is more consistent and easier to remember. virtqueue_kick() virtqueue_kick_prepare() virtqueue_kick_notify() I believe you used virtqueue_kick_notify() in your original patch. See: http://www.spinics.net/lists/linux-virtualization/msg14616.html -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html