Re: KVM call for agend for 2014-09-16
Juan Quintela quint...@redhat.com wrote: Hi Please, send any topic that you are interested in covering. People have complained on the past that I don't cancel the call until the very last minute. So, what do you think that deadline for submitting topics is 23:00UTC on Monday? ok, no topics no call. Have a productive week. Later, Juan. Call details: 15:00 CEST 13:00 UTC 09:00 EDT Every two weeks By popular demand, a google calendar public entry with it https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ (Let me know if you have any problems with the calendar entry) If you need phone number details, contact me privately Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
Il 16/09/2014 04:06, Andrew Jones ha scritto: We shouldn't try Load-Exclusive instructions unless we've enabled memory management, as these instructions depend on the data cache unit's coherency monitor. This patch adds a new setup boolean, initialized to false, that is used to guard Load-Exclusive instructions. Eventually we'll add more setup code that sets it true. Note: This problem became visible on boards with Cortex-A7 processors. Testing with Cortex-A15 didn't expose it, as those may have an external coherency monitor that still allows the instruction to execute (on A7s we got data aborts). Although even on A15's it's not clear from the spec if the instructions will behave as expected while caches are off, so we no longer allow Load-Exclusive instructions on those processors without caches enabled either. Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/arm/asm/setup.h | 2 ++ lib/arm/setup.c | 1 + lib/arm/spinlock.c | 10 ++ 3 files changed, 13 insertions(+) diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h index 21445ef2085fc..9c54c184e2866 100644 --- a/lib/arm/asm/setup.h +++ b/lib/arm/asm/setup.h @@ -20,6 +20,8 @@ extern phys_addr_t __phys_offset, __phys_end; #define PHYS_SIZE(1ULL PHYS_SHIFT) #define PHYS_MASK(PHYS_SIZE - 1ULL) +extern bool mem_caches_enabled; + #define L1_CACHE_SHIFT 6 #define L1_CACHE_BYTES (1 L1_CACHE_SHIFT) #define SMP_CACHE_BYTES L1_CACHE_BYTES diff --git a/lib/arm/setup.c b/lib/arm/setup.c index 3941c9757dcb2..f7ed639c9d499 100644 --- a/lib/arm/setup.c +++ b/lib/arm/setup.c @@ -25,6 +25,7 @@ u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) }; int nr_cpus; phys_addr_t __phys_offset, __phys_end; +bool mem_caches_enabled; static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused) { diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c index d8a6d4c3383d6..43539c5e84062 100644 --- a/lib/arm/spinlock.c +++ b/lib/arm/spinlock.c @@ -1,12 +1,22 @@ #include libcflat.h #include asm/spinlock.h #include asm/barrier.h +#include asm/setup.h void spin_lock(struct spinlock *lock) { u32 val, fail; dmb(); + + /* + * Without caches enabled Load-Exclusive instructions may fail. + * In that case we do nothing, and just hope the caller knows + * what they're doing. + */ + if (!mem_caches_enabled) + return; Should it at least write 1 to the spinlock? Paolo do { asm volatile( 1: ldrex %0, [%2]\n -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
Il 15/09/2014 20:14, Matt Mullins ha scritto: On Tue, Sep 09, 2014 at 11:53:49PM -0700, Matt Mullins wrote: On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote: What version of QEMU? Can you try the 12.04 qemu (which IIRC is 1.0) on top of the newer kernel? I did reproduce this on qemu 1.0.1. What would you like me to test next? I've got both VM hosts currently running 3.17-rc4, so I'll know tomorrow if that works. I've been looking into this off-and-on for a while now, and this time around I may have found other folks experiencing the same issue: https://issues.apache.org/jira/browse/CLOUDSTACK-6788 That one's a little empty on the details (do y'all know more about to whom that bug was known than I do?), but https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218 sees similar symptoms due to running NTP on the host. I may try disabling that after the current round of testing-3.17-rc4 finishes up. A summary of my testing from last week: * 3.17-rc4 (with qemu 2.0) seems to have had the same problem as well. * Disabling ntpd didn't help either. * timer name=kvmclock present=no/ _does_ seem to make my VM migrate without issue. I'm not really sure what to look at from here. I suppose leaving kvmclock disabled is a workaround for now, but is there a major disadvantage to doing so? Sorry for not following up. I think we have QEMU patches to fix this issue. I'll reply as soon as they are available in a git tree. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Using the tlb flush util function where applicable
Il 16/09/2014 00:49, Liang Chen ha scritto: --- (And what about a possible followup patch that replaces kvm_mmu_flush_tlb() with kvm_make_request() again? It would free the namespace a bit and we could call something similarly named from vcpu_enter_guest() to do the job.) That seems good. I can take the labor to make the followup patch ;) Ok, so I'll not apply your first patch. Thanks! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo
On September 12, 2014 at 7:29 PM Jan Kiszka jan.kis...@siemens.com wrote: On 2014-09-12 19:15, Jan Kiszka wrote: On 2014-09-12 14:29, Erik Rull wrote: On September 11, 2014 at 3:32 PM Jan Kiszka jan.kis...@siemens.com wrote: On 2014-09-11 15:25, Erik Rull wrote: On August 6, 2014 at 1:19 PM Erik Rull erik.r...@rdsoftware.de wrote: Hi all, I did already several tests and I'm not completely sure what's going wrong, but here my scenario: When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla kernel 3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any error. When I dump the CPU registers via info registers, nothing changes, that means the system really stalled. Same happens with QEMU 2.0.0. But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 on the host side it works on the Core2Duo. Also the system above but just with an i3 or i5 CPU it works, too. I already disabled networking and USB for the guest and changed the graphics card - no effect. I assume that some mean bits and bytes have to be set up properly to get the thing running. Any hint what to change / test would be really appreciated. Thanks in advance, Best regards, Erik Hi all, I opened a qemu bug report on that and Jan helped me creating a kvm trace. I attached it to the bug report. https://bugs.launchpad.net/qemu/+bug/1366836 If you have further questions, please let me know. File possibly truncated. Need at least 346583040, but file size is 133414912. Does trace-cmd report work for you? Is your file larger? Again, please also validate the behavior on latest next branch from kvm.git. Jan Hi all, confirmed. The issue is still existing in the kvm.git Version of the kernel. The trace.tgz was uploaded to the bugtracker. Thanks. Could you provide a good-case of your setup as well, i.e. with that older kernel version? At least I'm not yet seeing something obviously wrong. Well, except that we have continuously EXTERNAL_INTERRUPTs, vector 0xf6, throughout most of the trace. Maybe a self-IPI (this is single-core), maybe something external that is stuck. You could do a full trace (-e all) and check for what happens after things like kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8168ed83 info 0 80ef Jan The huge number of interrupts seem to be rescheduling interrupts from qemu/kvm. I disabled SMP (kernel cmdline nosmp) and retried - same effect, Windows 8 does not boot. But I was able to get rid of the reschedulding interrupts. The trace after / around a kvm_exit looks like this: qemu-system-x86-954 [001] 261013.227405: kvm_entry: vcpu 0 qemu-system-x86-952 [000] 261013.227405: kmem_cache_free: call_site=c10ef001 ptr=0xf1d2ae48 qemu-system-x86-952 [000] 261013.227406: mm_filemap_delete_from_page_cache: dev 0:3 ino 0 page=0xf5bcc9c0 pfn=4122790336 ofs=507641856 qemu-system-x86-952 [000] 261013.227406: kmem_cache_free: call_site=c10ef001 ptr=0xf1d2ae10 qemu-system-x86-952 [000] 261013.227406: mm_filemap_delete_from_page_cache: dev 0:3 ino 0 page=0xf5bcc9e0 pfn=4122790368 ofs=507645952 qemu-system-x86-954 [001] 261013.227406: kvm_exit: reason EXCEPTION_NMI rip 0x812a1d83 info 80201120 8b0e qemu-system-x86-954 [001] 261013.227407: kvm_page_fault: address 80201120 error_code 3 qemu-system-x86-952 [000] 261013.227407: mm_page_free_batched: page=0xf5bcc9e0 pfn=4122790368 order=0 cold=0 qemu-system-x86-954 [001] 261013.227407: kvm_mmu_pagetable_walk: addr 80201120 pferr 3 P|W qemu-system-x86-952 [000] 261013.227407: mm_page_free: page=0xf5bcc9e0 pfn=4122790368 order=0 qemu-system-x86-954 [001] 261013.227407: kvm_mmu_paging_element: pte 188001 level 3 qemu-system-x86-952 [000] 261013.227407: mm_page_free_batched: page=0xf5bcc9c0 pfn=4122790336 order=0 cold=0 qemu-system-x86-954 [001] 261013.227407: kvm_mmu_paging_element: pte 39b863 level 2 or qemu-system-x86-954 [001] 261013.276282: kvm_mmu_paging_element: pte 188001 level 3 qemu-system-x86-954 [001] 261013.276283: kvm_mmu_paging_element: pte 39b863 level 2 qemu-system-x86-954 [001] 261013.276283: kvm_mmu_paging_element: pte 80188963 level 1 qemu-system-x86-954 [001] 261013.276284: rcu_utilization: Start context switch qemu-system-x86-954 [001] 261013.276284: rcu_utilization: End context switch qemu-system-x86-954 [001] 261013.276284: kvm_entry: vcpu 0 qemu-system-x86-954 [001] 261013.276285: kvm_exit: reason EXCEPTION_NMI rip 0x812a1d83 info 80201120 8b0e qemu-system-x86-954 [001] 261013.276286: kvm_page_fault: address 80201120 error_code 3 qemu-system-x86-954 [001] 261013.276286: kvm_mmu_pagetable_walk: addr 80201120 pferr 3 P|W qemu-system-x86-954 [001] 261013.276286: kvm_mmu_paging_element: pte 188001 level 3
Re: [PATCH v3 0/4] Make kvm_device_ops registration dynamic
Il 02/09/2014 11:27, Will Deacon ha scritto: The mpic, flic and xics are still not ported over, as I don't want to risk breaking those devices Actually FLIC is ported. :) arch/s390/kvm/kvm-s390.c | 3 +- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm_host.h | 4 +- include/uapi/linux/kvm.h | 22 +-- virt/kvm/arm/vgic.c | 157 --- virt/kvm/kvm_main.c | 57 + virt/kvm/vfio.c | 22 --- 7 files changed, 142 insertions(+), 124 deletions(-) Thanks, applying to kvm/queue. Alex (Graf) and Paul, can you look at MPIC and XICS? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.
This patch only handle L1 and L2 vm share one apic access page situation. When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do nothing. When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx.c | 6 ++ arch/x86/kvm/x86.c | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 514183e..92b3e72 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu); +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu); void kvm_define_shared_msr(unsigned index, u32 msr); void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a1a9797..d0d5981 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, } /* +* We are now running in L2, mmu_notifier will force to reload the +* page's hpa for L2 vmcs. Need to reload it for L1 before entering L1. +*/ + kvm_vcpu_reload_apic_access_page(vcpu); + + /* * Exiting from L2 to L1, we're now back to L1 which thinks it just * finished a VMLAUNCH or VMRESUME instruction, so we need to set the * success or failure flag accordingly. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 27c3d30..3f458b2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_apic_update_tmr(vcpu, tmr); } -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { /* * apic access page could be migrated. When the page is being migrated, @@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, page_to_phys(vcpu-kvm-arch.apic_access_page)); } +EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page); /* * Returns 1 to let __vcpu_run() continue the guest execution loop without -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/6] kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page.
To make apic access page migratable, we do not pin it in memory now. When it is migrated, we should reload its physical address for all vmcses. But when we tried to do this, all vcpu will access kvm_arch-apic_access_page without any locking. This is not safe. Actually, we do not need kvm_arch-apic_access_page anymore. Since apic access page is not pinned in memory now, we can remove kvm_arch-apic_access_page. When we need to write its physical address into vmcs, use gfn_to_page() to get its page struct, which will also pin it. And unpin it after then. Suggested-by: Gleb Natapov g...@kernel.org Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/vmx.c | 15 +-- arch/x86/kvm/x86.c | 15 +-- 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 92b3e72..9daf754 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -576,7 +576,7 @@ struct kvm_arch { struct kvm_apic_map *apic_map; unsigned int tss_addr; - struct page *apic_access_page; + bool apic_access_page_done; gpa_t wall_clock; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d0d5981..61f3854 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4002,7 +4002,7 @@ static int alloc_apic_access_page(struct kvm *kvm) int r = 0; mutex_lock(kvm-slots_lock); - if (kvm-arch.apic_access_page) + if (kvm-arch.apic_access_page_done) goto out; kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT; kvm_userspace_mem.flags = 0; @@ -4018,7 +4018,12 @@ static int alloc_apic_access_page(struct kvm *kvm) goto out; } - kvm-arch.apic_access_page = page; + /* +* Do not pin apic access page in memory so that memory hotplug +* process is able to migrate it. +*/ + put_page(page); + kvm-arch.apic_access_page_done = true; out: mutex_unlock(kvm-slots_lock); return r; @@ -4534,8 +4539,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) } if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) - vmcs_write64(APIC_ACCESS_ADDR, - page_to_phys(vmx-vcpu.kvm-arch.apic_access_page)); + kvm_vcpu_reload_apic_access_page(vcpu); if (vmx_vm_has_apicv(vcpu-kvm)) memset(vmx-pi_desc, 0, sizeof(struct pi_desc)); @@ -7995,8 +7999,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) } else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) { exec_control |= SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; - vmcs_write64(APIC_ACCESS_ADDR, - page_to_phys(vcpu-kvm-arch.apic_access_page)); + kvm_vcpu_reload_apic_access_page(vcpu); } vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3f458b2..9094e13 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5991,15 +5991,20 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { + struct page *page; + /* * apic access page could be migrated. When the page is being migrated, * GUP will wait till the migrate entry is replaced with the new pte * entry pointing to the new page. */ - vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm, - APIC_DEFAULT_PHYS_BASE PAGE_SHIFT); - kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, - page_to_phys(vcpu-kvm-arch.apic_access_page)); + page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE PAGE_SHIFT); + kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, page_to_phys(page)); + /* +* Do not pin apic access page in memory so that memory hotplug +* process is able to migrate it. +*/ + put_page(page); } EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page); @@ -7253,8 +7258,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kfree(kvm-arch.vpic); kfree(kvm-arch.vioapic); kvm_free_vcpus(kvm); - if (kvm-arch.apic_access_page) - put_page(kvm-arch.apic_access_page); kfree(rcu_dereference_check(kvm-arch.apic_map, 1)); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed. Actually, it is not necessary to be pinned. The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the corresponding ept entry. This patch introduces a new vcpu request named KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time, and force all the vcpus exit guest, and re-enter guest till they updates the VMCS APIC_ACCESS_ADDR pointer to the new apic access page address, and updates kvm-arch.apic_access_page to the new page. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch/x86/kvm/x86.c | 15 +++ include/linux/kvm_host.h| 2 ++ virt/kvm/kvm_main.c | 12 6 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 35171c7..514183e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -739,6 +739,7 @@ struct kvm_x86_ops { void (*hwapic_isr_update)(struct kvm *kvm, int isr); void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1d941ad..f2eacc4 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) return; } +static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) +{ + return; +} + static int svm_vm_has_apicv(struct kvm *kvm) { return 0; @@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = { .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode, + .set_apic_access_page_addr = svm_set_apic_access_page_addr, .vm_has_apicv = svm_vm_has_apicv, .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_isr_update = svm_hwapic_isr_update, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 72a0470..a1a9797 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7090,6 +7090,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) vmx_set_msr_bitmap(vcpu); } +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) +{ + vmcs_write64(APIC_ACCESS_ADDR, hpa); +} + static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) { u16 status; @@ -8909,6 +8914,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode, + .set_apic_access_page_addr = vmx_set_apic_access_page_addr, .vm_has_apicv = vmx_vm_has_apicv, .load_eoi_exitmap = vmx_load_eoi_exitmap, .hwapic_irr_update = vmx_hwapic_irr_update, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e05bd58..27c3d30 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_apic_update_tmr(vcpu, tmr); } +static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +{ + /* +* apic access page could be migrated. When the page is being migrated, +* GUP will wait till the migrate entry is replaced with the new pte +* entry pointing to the new page. +*/ + vcpu-kvm-arch.apic_access_page = gfn_to_page(vcpu-kvm, + APIC_DEFAULT_PHYS_BASE PAGE_SHIFT); + kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, + page_to_phys(vcpu-kvm-arch.apic_access_page)); +} + /* * Returns 1 to let __vcpu_run() continue the guest execution loop without * exiting to the userspace. Otherwise, the value will be returned to the @@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_deliver_pmi(vcpu); if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) vcpu_scan_ioapic(vcpu); + if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) + kvm_vcpu_reload_apic_access_page(vcpu); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index
[PATCH v6 3/6] kvm: Make init_rmode_identity_map() return 0 on success.
In init_rmode_identity_map(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_create_vcpu(), and r is redundant. This patch removes the redundant variable r, and make init_rmode_identity_map() return 0 on success, -errno on failure. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/kvm/vmx.c | 31 +++ 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4fb84ad..72a0470 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3939,45 +3939,42 @@ out: static int init_rmode_identity_map(struct kvm *kvm) { - int i, idx, r, ret = 0; + int i, idx, ret = 0; pfn_t identity_map_pfn; u32 tmp; if (!enable_ept) - return 1; + return 0; /* Protect kvm-arch.ept_identity_pagetable_done. */ mutex_lock(kvm-slots_lock); - if (likely(kvm-arch.ept_identity_pagetable_done)) { - ret = 1; + if (likely(kvm-arch.ept_identity_pagetable_done)) goto out2; - } identity_map_pfn = kvm-arch.ept_identity_map_addr PAGE_SHIFT; - r = alloc_identity_pagetable(kvm); - if (r) + ret = alloc_identity_pagetable(kvm); + if (ret) goto out2; idx = srcu_read_lock(kvm-srcu); - r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); - if (r 0) + ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); + if (ret) goto out; /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i PT32_ENT_PER_PAGE; i++) { tmp = (i 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); - r = kvm_write_guest_page(kvm, identity_map_pfn, + ret = kvm_write_guest_page(kvm, identity_map_pfn, tmp, i * sizeof(tmp), sizeof(tmp)); - if (r 0) + if (ret) goto out; } kvm-arch.ept_identity_pagetable_done = true; - ret = 1; + out: srcu_read_unlock(kvm-srcu, idx); - out2: mutex_unlock(kvm-slots_lock); return ret; @@ -7604,11 +7601,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + /* Set err to -ENOMEM to handle memory allocation error. */ + err = -ENOMEM; + vmx-guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL); BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx-guest_msrs[0]) PAGE_SIZE); - err = -ENOMEM; if (!vmx-guest_msrs) { goto uninit_vcpu; } @@ -7641,8 +7640,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!kvm-arch.ept_identity_map_addr) kvm-arch.ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; - err = -ENOMEM; - if (!init_rmode_identity_map(kvm)) + err = init_rmode_identity_map(kvm); + if (err 0) goto free_vmcs; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.
ept identity pagetable and apic access page in kvm are pinned in memory. As a result, they cannot be migrated/hot-removed. But actually they don't need to be pinned in memory. [For ept identity page] Just do not pin it. When it is migrated, guest will be able to find the new page in the next ept violation. [For apic access page] The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer for each vcpu in addition. NOTE: Tested with -cpu xxx,-x2apic option. But since nested vm pins some other pages in memory, if user uses nested vm, memory hot-remove will not work. Change log v5 - v6: 1. Patch 1/6 has been applied by Paolo Bonzini pbonz...@redhat.com, just resend it. 2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 2/6. 3. Move err initialization forward in patch 3/6. 4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() and use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in patch 5/6. 5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and vmx_vcpu_reset() in patch 6/6. 6. Remove original patch 7 since we are not able to handle the situation in nested vm. Tang Chen (6): kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address. kvm: Remove ept_identity_pagetable from struct kvm_arch. kvm: Make init_rmode_identity_map() return 0 on success. kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest(). kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running. kvm, mem-hotplug: Unpin and remove kvm_arch-apic_access_page. arch/x86/include/asm/kvm_host.h | 5 ++- arch/x86/kvm/svm.c | 9 +++- arch/x86/kvm/vmx.c | 95 +++-- arch/x86/kvm/x86.c | 25 +-- include/linux/kvm_host.h| 2 + virt/kvm/kvm_main.c | 12 ++ 6 files changed, 99 insertions(+), 49 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm_arch-ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch-ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch-ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Reviewed-by: Gleb Natapov g...@kernel.org --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/vmx.c | 47 +++-- arch/x86/kvm/x86.c | 2 -- 3 files changed, 22 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7c492ed..35171c7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -580,7 +580,6 @@ struct kvm_arch { gpa_t wall_clock; - struct page *ept_identity_pagetable; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4b80ead..4fb84ad 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var); static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu); static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); +static int alloc_identity_pagetable(struct kvm *kvm); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3938,21 +3939,27 @@ out: static int init_rmode_identity_map(struct kvm *kvm) { - int i, idx, r, ret; + int i, idx, r, ret = 0; pfn_t identity_map_pfn; u32 tmp; if (!enable_ept) return 1; - if (unlikely(!kvm-arch.ept_identity_pagetable)) { - printk(KERN_ERR EPT: identity-mapping pagetable - haven't been allocated!\n); - return 0; + + /* Protect kvm-arch.ept_identity_pagetable_done. */ + mutex_lock(kvm-slots_lock); + + if (likely(kvm-arch.ept_identity_pagetable_done)) { + ret = 1; + goto out2; } - if (likely(kvm-arch.ept_identity_pagetable_done)) - return 1; - ret = 0; + identity_map_pfn = kvm-arch.ept_identity_map_addr PAGE_SHIFT; + + r = alloc_identity_pagetable(kvm); + if (r) + goto out2; + idx = srcu_read_lock(kvm-srcu); r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); if (r 0) @@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm) ret = 1; out: srcu_read_unlock(kvm-srcu, idx); + +out2: + mutex_unlock(kvm-slots_lock); return ret; } @@ -4019,31 +4029,20 @@ out: static int alloc_identity_pagetable(struct kvm *kvm) { - struct page *page; + /* Called with kvm-slots_lock held. */ + struct kvm_userspace_memory_region kvm_userspace_mem; int r = 0; - mutex_lock(kvm-slots_lock); - if (kvm-arch.ept_identity_pagetable) - goto out; + BUG_ON(kvm-arch.ept_identity_pagetable_done); + kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT; kvm_userspace_mem.flags = 0; kvm_userspace_mem.guest_phys_addr = kvm-arch.ept_identity_map_addr; kvm_userspace_mem.memory_size = PAGE_SIZE; r = __kvm_set_memory_region(kvm, kvm_userspace_mem); - if (r) - goto out; - - page = gfn_to_page(kvm, kvm-arch.ept_identity_map_addr PAGE_SHIFT); - if (is_error_page(page)) { - r = -EFAULT; - goto out; - } - kvm-arch.ept_identity_pagetable = page; -out: - mutex_unlock(kvm-slots_lock); return r; } @@ -7643,8 +7642,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) kvm-arch.ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; err = -ENOMEM; - if (alloc_identity_pagetable(kvm) != 0) - goto free_vmcs; if (!init_rmode_identity_map(kvm)) goto free_vmcs; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f1e22d..e05bd58 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7239,8 +7239,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_free_vcpus(kvm); if
[PATCH v6 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address of apic access page. So use this macro. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Reviewed-by: Gleb Natapov g...@kernel.org --- arch/x86/kvm/svm.c | 3 ++- arch/x86/kvm/vmx.c | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ddf7427..1d941ad 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) svm-asid_generation = 0; init_vmcb(svm); - svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE; + svm-vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | + MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(svm-vcpu)) svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bfe11cf..4b80ead 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm) goto out; kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT; kvm_userspace_mem.flags = 0; - kvm_userspace_mem.guest_phys_addr = 0xfee0ULL; + kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE; kvm_userspace_mem.memory_size = PAGE_SIZE; r = __kvm_set_memory_region(kvm, kvm_userspace_mem); if (r) goto out; - page = gfn_to_page(kvm, 0xfee00); + page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE PAGE_SHIFT); if (is_error_page(page)) { r = -EFAULT; goto out; @@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); kvm_set_cr8(vmx-vcpu, 0); - apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE; + apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(vmx-vcpu)) apic_base_msr.data |= MSR_IA32_APICBASE_BSP; apic_base_msr.host_initiated = true; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
Il 16/09/2014 12:42, Tang Chen ha scritto: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..0df82c1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm) make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC); } +void kvm_reload_apic_access_page(struct kvm *kvm) +{ + make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); +} + int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) { struct page *page; @@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, if (need_tlb_flush) kvm_flush_remote_tlbs(kvm); + /* + * The physical address of apic access page is stored in VMCS. + * Update it when it becomes invalid. + */ + if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE PAGE_SHIFT)) + kvm_reload_apic_access_page(kvm); This cannot be in the generic code. It is architecture-specific. Please add a new function kvm_arch_mmu_notifier_invalidate_page, and call it outside the mmu_lock. kvm_reload_apic_access_page need not be in virt/kvm/kvm_main.c, either. Paolo spin_unlock(kvm-mmu_lock); srcu_read_unlock(kvm-srcu, idx); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.
Il 16/09/2014 12:42, Tang Chen ha scritto: This patch only handle L1 and L2 vm share one apic access page situation. When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do nothing. When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com But if kvm_vcpu_reload_apic_access_page is called when the active VMCS is a VMCS02, the APIC access address will be corrupted, no? So, even if you are not touching the pages pinned by nested virt, you need an if (!is_guest_mode(vcpu) || !(vmx-nested.current_vmcs12-secondary_vm_exec_control SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) as suggested by Gleb in the review of v5. Paolo --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx.c | 6 ++ arch/x86/kvm/x86.c | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 514183e..92b3e72 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1046,6 +1046,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu); +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu); void kvm_define_shared_msr(unsigned index, u32 msr); void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a1a9797..d0d5981 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8795,6 +8795,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, } /* + * We are now running in L2, mmu_notifier will force to reload the + * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1. + */ + kvm_vcpu_reload_apic_access_page(vcpu); + + /* * Exiting from L2 to L1, we're now back to L1 which thinks it just * finished a VMLAUNCH or VMRESUME instruction, so we need to set the * success or failure flag accordingly. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 27c3d30..3f458b2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5989,7 +5989,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_apic_update_tmr(vcpu, tmr); } -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { /* * apic access page could be migrated. When the page is being migrated, @@ -6001,6 +6001,7 @@ static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) kvm_x86_ops-set_apic_access_page_addr(vcpu-kvm, page_to_phys(vcpu-kvm-arch.apic_access_page)); } +EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page); /* * Returns 1 to let __vcpu_run() continue the guest execution loop without -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Make init_rmode_tss() return 0 on success.
In init_rmode_tss(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_set_tss_addr(), and r is redundant. This patch removes the redundant variable, by making init_rmode_tss() return 0 on success, -errno on failure. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/vmx.c | 13 +++-- 1 files changed, 3 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 41bbddb..1e6b493 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3930,7 +3930,7 @@ static int init_rmode_tss(struct kvm *kvm) { gfn_t fn; u16 data = 0; - int r, idx, ret = 0; + int idx, r; idx = srcu_read_lock(kvm-srcu); fn = kvm-arch.tss_addr PAGE_SHIFT; @@ -3952,13 +3952,9 @@ static int init_rmode_tss(struct kvm *kvm) r = kvm_write_guest_page(kvm, fn, data, RMODE_TSS_SIZE - 2 * PAGE_SIZE - 1, sizeof(u8)); - if (r 0) - goto out; - - ret = 1; out: srcu_read_unlock(kvm-srcu, idx); - return ret; + return r; } static int init_rmode_identity_map(struct kvm *kvm) @@ -4750,10 +4746,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) if (ret) return ret; kvm-arch.tss_addr = addr; - if (!init_rmode_tss(kvm)) - return -ENOMEM; - - return 0; + return init_rmode_tss(kvm); } static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.
Il 16/09/2014 12:41, Tang Chen ha scritto: ept identity pagetable and apic access page in kvm are pinned in memory. As a result, they cannot be migrated/hot-removed. But actually they don't need to be pinned in memory. [For ept identity page] Just do not pin it. When it is migrated, guest will be able to find the new page in the next ept violation. [For apic access page] The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer for each vcpu in addition. NOTE: Tested with -cpu xxx,-x2apic option. But since nested vm pins some other pages in memory, if user uses nested vm, memory hot-remove will not work. Change log v5 - v6: 1. Patch 1/6 has been applied by Paolo Bonzini pbonz...@redhat.com, just resend it. 2. Simplify comment in alloc_identity_pagetable() and add a BUG_ON() in patch 2/6. 3. Move err initialization forward in patch 3/6. 4. Rename vcpu_reload_apic_access_page() to kvm_vcpu_reload_apic_access_page() and use it instead of kvm_reload_apic_access_page() in nested_vmx_vmexit() in patch 5/6. 5. Reuse kvm_vcpu_reload_apic_access_page() in prepare_vmcs02() and vmx_vcpu_reset() in patch 6/6. 6. Remove original patch 7 since we are not able to handle the situation in nested vm. I'll push 1-3 soon to kvm/queue. I think v7 will be good. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - Il 16/09/2014 04:06, Andrew Jones ha scritto: We shouldn't try Load-Exclusive instructions unless we've enabled memory management, as these instructions depend on the data cache unit's coherency monitor. This patch adds a new setup boolean, initialized to false, that is used to guard Load-Exclusive instructions. Eventually we'll add more setup code that sets it true. Note: This problem became visible on boards with Cortex-A7 processors. Testing with Cortex-A15 didn't expose it, as those may have an external coherency monitor that still allows the instruction to execute (on A7s we got data aborts). Although even on A15's it's not clear from the spec if the instructions will behave as expected while caches are off, so we no longer allow Load-Exclusive instructions on those processors without caches enabled either. Signed-off-by: Andrew Jones drjo...@redhat.com --- lib/arm/asm/setup.h | 2 ++ lib/arm/setup.c | 1 + lib/arm/spinlock.c | 10 ++ 3 files changed, 13 insertions(+) diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h index 21445ef2085fc..9c54c184e2866 100644 --- a/lib/arm/asm/setup.h +++ b/lib/arm/asm/setup.h @@ -20,6 +20,8 @@ extern phys_addr_t __phys_offset, __phys_end; #define PHYS_SIZE (1ULL PHYS_SHIFT) #define PHYS_MASK (PHYS_SIZE - 1ULL) +extern bool mem_caches_enabled; + #define L1_CACHE_SHIFT 6 #define L1_CACHE_BYTES (1 L1_CACHE_SHIFT) #define SMP_CACHE_BYTESL1_CACHE_BYTES diff --git a/lib/arm/setup.c b/lib/arm/setup.c index 3941c9757dcb2..f7ed639c9d499 100644 --- a/lib/arm/setup.c +++ b/lib/arm/setup.c @@ -25,6 +25,7 @@ u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) }; int nr_cpus; phys_addr_t __phys_offset, __phys_end; +bool mem_caches_enabled; static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused) { diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c index d8a6d4c3383d6..43539c5e84062 100644 --- a/lib/arm/spinlock.c +++ b/lib/arm/spinlock.c @@ -1,12 +1,22 @@ #include libcflat.h #include asm/spinlock.h #include asm/barrier.h +#include asm/setup.h void spin_lock(struct spinlock *lock) { u32 val, fail; dmb(); + + /* +* Without caches enabled Load-Exclusive instructions may fail. +* In that case we do nothing, and just hope the caller knows +* what they're doing. +*/ + if (!mem_caches_enabled) + return; Should it at least write 1 to the spinlock? I thought about that. So on one hand we might get a somewhat functional synchronization mechanism, which may be enough for some unit test that doesn't enable caches, but still needs it. On the other hand, we know its broken, so we don't really want any unit tests that need synchronization and don't enable caches. I chose to not write a 1 in the hope that if a unit test introduces a race, that that race will be easier to expose and fix. That said, I'm not strongly biased, as we'd still have a race, which may or may not be easy to expose, either way. So if the majority prefers a best effort approach, then I'll spin a v2. drew Paolo do { asm volatile( 1: ldrex %0, [%2]\n -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: x86: Using cpuid structs in KVM
Using cpuid structs in KVM to eliminate cryptic code with many bit operations. The code does not introduce functional changes. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/cpuid.c | 36 ++-- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..a7479ab 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -18,6 +18,7 @@ #include linux/uaccess.h #include asm/user.h #include asm/xsave.h +#include asm/cpuid_def.h #include cpuid.h #include lapic.h #include mmu.h @@ -357,16 +358,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, } /* function 4 has additional index. */ case 4: { - int i, cache_type; + int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; /* read more entries until cache_type is zero */ for (i = 1; ; ++i) { + union cpuid4_eax eax; + if (*nent = maxnent) goto out; - cache_type = entry[i - 1].eax 0x1f; - if (!cache_type) + eax.full = entry[i - 1].eax; + if (!eax.split.cache_type) break; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= @@ -423,16 +426,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, } /* function 0xb has additional index. */ case 0xb: { - int i, level_type; + int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; /* read more entries until level_type is zero */ for (i = 1; ; ++i) { + union cpuid11_ecx ecx; + if (*nent = maxnent) goto out; - level_type = entry[i - 1].ecx 0xff00; - if (!level_type) + ecx.full = entry[i - 1].ecx; + if (!ecx.split.level_type) break; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= @@ -505,13 +510,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry-eax = entry-ebx = entry-ecx = 0; break; case 0x8008: { - unsigned g_phys_as = (entry-eax 16) 0xff; - unsigned virt_as = max((entry-eax 8) 0xff, 48U); - unsigned phys_as = entry-eax 0xff; + union cpuid_8_8_eax eax; - if (!g_phys_as) - g_phys_as = phys_as; - entry-eax = g_phys_as | (virt_as 8); + eax.full = entry-eax; + eax.split.virt_as = max_t(int, eax.split.virt_as, 48); + if (!eax.split.guest_phys_as) + eax.split.guest_phys_as = eax.split.phys_as; + entry-eax = eax.full; entry-ebx = entry-edx = 0; break; } @@ -724,13 +729,16 @@ EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; + union cpuid_8_8_eax eax; best = kvm_find_cpuid_entry(vcpu, 0x8000, 0); if (!best || best-eax 0x8008) goto not_found; best = kvm_find_cpuid_entry(vcpu, 0x8008, 0); - if (best) - return best-eax 0xff; + if (best) { + eax.full = best-eax; + return eax.split.phys_as; + } not_found: return 36; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] x86: Use new cpuid structs in cpuid functions
The current code that decodes cpuid fields is somewhat cryptic, since it uses many bit operations. Using cpuid structs instead for clarifying the code. Introducing no functional change. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kernel/cpu/common.c | 56 +++- 1 file changed, 34 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e4ab2b4..b57c160 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -41,6 +41,7 @@ #include asm/pat.h #include asm/microcode.h #include asm/microcode_intel.h +#include asm/cpuid_def.h #ifdef CONFIG_X86_LOCAL_APIC #include asm/uv/uv.h @@ -444,13 +445,17 @@ static void get_model_name(struct cpuinfo_x86 *c) void cpu_detect_cache_sizes(struct cpuinfo_x86 *c) { - unsigned int n, dummy, ebx, ecx, edx, l2size; + unsigned int n, dummy, dummy2, l2size; + union cpuid8_5_ecx_edx ecx5, edx5; + union cpuid8_6_ebx ebx6; + union cpuid8_6_ecx ecx6; n = c-extended_cpuid_level; if (n = 0x8005) { - cpuid(0x8005, dummy, ebx, ecx, edx); - c-x86_cache_size = (ecx24) + (edx24); + cpuid(0x8005, dummy, dummy2, ecx5.full, edx5.full); + c-x86_cache_size = ecx5.split.cache_size + + edx5.split.cache_size; #ifdef CONFIG_X86_64 /* On K8 L1 TLB is inclusive, so don't count it */ c-x86_tlbsize = 0; @@ -460,11 +465,11 @@ void cpu_detect_cache_sizes(struct cpuinfo_x86 *c) if (n 0x8006) /* Some chips just has a large L1. */ return; - cpuid(0x8006, dummy, ebx, ecx, edx); - l2size = ecx 16; + cpuid(0x8006, dummy, ebx6.full, ecx6.full, dummy2); + l2size = ecx6.split.cache_size; #ifdef CONFIG_X86_64 - c-x86_tlbsize += ((ebx 16) 0xfff) + (ebx 0xfff); + c-x86_tlbsize += ebx6.split.dtlb_entries + ebx6.split.itlb_entries; #else /* do processor-specific cache resizing */ if (this_cpu-legacy_cache_size) @@ -505,9 +510,10 @@ void cpu_detect_tlb(struct cpuinfo_x86 *c) void detect_ht(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_HT - u32 eax, ebx, ecx, edx; + u32 eax, ecx, edx; int index_msb, core_bits; static bool printed; + union cpuid1_ebx ebx; if (!cpu_has(c, X86_FEATURE_HT)) return; @@ -518,9 +524,9 @@ void detect_ht(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_XTOPOLOGY)) return; - cpuid(1, eax, ebx, ecx, edx); + cpuid(1, eax, ebx.full, ecx, edx); - smp_num_siblings = (ebx 0xff) 16; + smp_num_siblings = ebx.split.max_logical_proc; if (smp_num_siblings == 1) { printk_once(KERN_INFO CPU0: Hyper-Threading is disabled\n); @@ -591,20 +597,22 @@ void cpu_detect(struct cpuinfo_x86 *c) c-x86 = 4; /* Intel-defined flags: level 0x0001 */ if (c-cpuid_level = 0x0001) { - u32 junk, tfms, cap0, misc; + u32 junk, cap0; + union cpuid1_eax tfms; + union cpuid1_ebx misc; - cpuid(0x0001, tfms, misc, junk, cap0); - c-x86 = (tfms 8) 0xf; - c-x86_model = (tfms 4) 0xf; - c-x86_mask = tfms 0xf; + cpuid(0x0001, tfms.full, misc.full, junk, cap0); + c-x86 = tfms.split.family_id; + c-x86_model = tfms.split.model; + c-x86_mask = tfms.split.stepping_id; if (c-x86 == 0xf) - c-x86 += (tfms 20) 0xff; + c-x86 += tfms.split.extended_family_id; if (c-x86 = 0x6) - c-x86_model += ((tfms 16) 0xf) 4; + c-x86_model += tfms.split.extended_model_id 4; - if (cap0 (119)) { - c-x86_clflush_size = ((misc 8) 0xff) * 8; + if (cap0 (1 (X86_FEATURE_CLFLUSH 31))) { + c-x86_clflush_size = misc.split.clflush_size * 8; c-x86_cache_alignment = c-x86_clflush_size; } } @@ -654,10 +662,11 @@ void get_cpu_cap(struct cpuinfo_x86 *c) } if (c-extended_cpuid_level = 0x8008) { - u32 eax = cpuid_eax(0x8008); + union cpuid_8_8_eax eax; - c-x86_virt_bits = (eax 8) 0xff; - c-x86_phys_bits = eax 0xff; + eax.full = cpuid_eax(0x8008); + c-x86_virt_bits = eax.split.virt_as; + c-x86_phys_bits = eax.split.phys_as; } #ifdef CONFIG_X86_32 else if (cpu_has(c, X86_FEATURE_PAE) || cpu_has(c, X86_FEATURE_PSE36)) @@ -814,7 +823,10 @@ static void generic_identify(struct cpuinfo_x86 *c)
[PATCH 0/3] x86: structs for cpuid info in x86
The code that deals with x86 cpuid fields is hard to follow since it performs many bit operations and does not refer to cpuid field explicitly. To eliminate the need of openning a spec whenever dealing with cpuid fields, this patch-set introduces structs that reflect the various cpuid functions. Thanks for reviewing the patch-set. Nadav Amit (3): x86: Adding structs to reflect cpuid fields x86: Use new cpuid structs in cpuid functions KVM: x86: Using cpuid structs in KVM arch/x86/include/asm/cpuid_def.h | 163 +++ arch/x86/kernel/cpu/common.c | 56 -- arch/x86/kvm/cpuid.c | 36 + 3 files changed, 219 insertions(+), 36 deletions(-) create mode 100644 arch/x86/include/asm/cpuid_def.h -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] x86: Adding structs to reflect cpuid fields
Adding structs that reflect various cpuid fields in x86 architecture. Structs were added only for functions that are not pure bitmaps. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/cpuid_def.h | 163 +++ 1 file changed, 163 insertions(+) create mode 100644 arch/x86/include/asm/cpuid_def.h diff --git a/arch/x86/include/asm/cpuid_def.h b/arch/x86/include/asm/cpuid_def.h new file mode 100644 index 000..0cf43ba --- /dev/null +++ b/arch/x86/include/asm/cpuid_def.h @@ -0,0 +1,163 @@ +#ifndef ARCH_X86_KVM_CPUID_DEF_H +#define ARCH_X86_KVM_CPUID_DEF_H + +union cpuid1_eax { + struct { + unsigned int stepping_id:4; + unsigned int model:4; + unsigned int family_id:4; + unsigned int processor_type:2; + unsigned int reserved:2; + unsigned int extended_model_id:4; + unsigned int extended_family_id:8; + unsigned int reserved2:4; + } split; + unsigned int full; +}; + +union cpuid1_ebx { + struct { + unsigned int brand_index:8; + unsigned int clflush_size:8; + unsigned int max_logical_proc:8; + unsigned int initial_apicid:8; + } split; + unsigned int full; +}; + +union cpuid4_eax { + struct { + unsigned int cache_type:5; + unsigned int cache_level:3; + unsigned int self_init_cache_level:1; + unsigned int fully_associative:1; + unsigned int reserved:4; + unsigned int max_logical_proc:12; + unsigned int max_package_proc:6; + } split; + unsigned int full; +}; + +union cpuid4_ebx { + struct { + unsigned int coherency_line_size:12; + unsigned int physical_line_partitions:10; + unsigned int ways:10; + } split; + unsigned int full; +}; + +union cpuid5_eax { + struct { + unsigned int min_monitor_line_size:16; + unsigned int reserved:16; + } split; + unsigned int full; +}; + +union cpuid5_ebx { + struct { + unsigned int max_monitor_line_size:16; + unsigned int reserved:16; + } split; + unsigned int full; +}; + +union cpuid5_ecx { + struct { + unsigned int monitor_mwait_ext:1; + unsigned int interrupt_as_break:1; + } split; + unsigned int full; +}; + +union cpuid5_edx { + struct { + unsigned int c0_sub_states:4; + unsigned int c1_sub_states:4; + unsigned int c2_sub_states:4; + unsigned int c3_sub_states:4; + unsigned int c4_sub_states:4; + unsigned int c5_sub_states:4; + unsigned int c6_sub_states:4; + unsigned int c7_sub_states:4; + } split; + unsigned int full; +}; + +union cpuid11_eax { + struct { + unsigned int x2apic_shift:5; + unsigned int reserved:27; + } split; + unsigned int full; +}; + +union cpuid11_ebx { + struct { + unsigned int logical_proc:16; + unsigned int reserved:16; + } split; + unsigned int full; +}; + +union cpuid11_ecx { + struct { + unsigned int level_number:8; + unsigned int level_type:8; + unsigned int reserved:16; + } split; + unsigned int full; +}; + +union cpuid8_5_eax_ebx { + struct { + unsigned int itlb_entries:8; + unsigned int itlb_assoc:8; + unsigned int dtlb_entries:8; + unsigned int dtlb_assoc:8; + } split; + unsigned int full; +}; + +union cpuid8_5_ecx_edx { + struct { + unsigned int line_size:8; + unsigned int lines_per_tag:8; + unsigned int associativity:8; + unsigned int cache_size:8; + } split; + unsigned int full; +}; + +union cpuid8_6_ebx { + struct { + unsigned int itlb_entries:12; + unsigned int itlb_assoc:4; + unsigned int dtlb_entries:12; + unsigned int dtlb_assoc:4; + } split; + unsigned int full; +}; + +union cpuid8_6_ecx { + struct { + unsigned int cache_line_size:8; + unsigned int lines_per_tag:4; + unsigned int l2_associativity:4; + unsigned int cache_size:16; + } split; + unsigned int full; +}; + +union cpuid_8_8_eax { + struct { + unsigned int phys_as:8; + unsigned int virt_as:8; + unsigned int guest_phys_as:8; + unsigned int reserved:8; + } split; + unsigned int full; +}; + +#endif -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
Il 16/09/2014 14:12, Andrew Jones ha scritto: Should it at least write 1 to the spinlock? I thought about that. So on one hand we might get a somewhat functional synchronization mechanism, which may be enough for some unit test that doesn't enable caches, but still needs it. On the other hand, we know its broken, so we don't really want any unit tests that need synchronization and don't enable caches. I chose to not write a 1 in the hope that if a unit test introduces a race, that that race will be easier to expose and fix. That said, I'm not strongly biased, as we'd still have a race, which may or may not be easy to expose, either way. So if the majority prefers a best effort approach, then I'll spin a v2. The case I was thinking about was something like spin_lock() enable caches start other processors spin_unlock() I'm not sure if it makes sense though. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - Il 16/09/2014 14:12, Andrew Jones ha scritto: Should it at least write 1 to the spinlock? I thought about that. So on one hand we might get a somewhat functional synchronization mechanism, which may be enough for some unit test that doesn't enable caches, but still needs it. On the other hand, we know its broken, so we don't really want any unit tests that need synchronization and don't enable caches. I chose to not write a 1 in the hope that if a unit test introduces a race, that that race will be easier to expose and fix. That said, I'm not strongly biased, as we'd still have a race, which may or may not be easy to expose, either way. So if the majority prefers a best effort approach, then I'll spin a v2. The case I was thinking about was something like spin_lock() enable caches start other processors spin_unlock() I'm not sure if it makes sense though. :) I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock drew Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
Il 16/09/2014 14:43, Andrew Jones ha scritto: I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Ok, I'll test and apply your patch then. Once you change the code to enable caches, please consider hanging on spin_lock with caches disabled. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - - Original Message - Il 16/09/2014 14:12, Andrew Jones ha scritto: Should it at least write 1 to the spinlock? I thought about that. So on one hand we might get a somewhat functional synchronization mechanism, which may be enough for some unit test that doesn't enable caches, but still needs it. On the other hand, we know its broken, so we don't really want any unit tests that need synchronization and don't enable caches. I chose to not write a 1 in the hope that if a unit test introduces a race, that that race will be easier to expose and fix. That said, I'm not strongly biased, as we'd still have a race, which may or may not be easy to expose, either way. So if the majority prefers a best effort approach, then I'll spin a v2. The case I was thinking about was something like spin_lock() enable caches start other processors spin_unlock() I'm not sure if it makes sense though. :) I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Oh drat. I just introduced an issue with enabling caches without working spin locks. This new boolean. This boolean will need to be per cpu. I'll send a v2. drew Paolo ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - Il 16/09/2014 14:43, Andrew Jones ha scritto: I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Ok, I'll test and apply your patch then. Actually, yeah, please apply now in order to get A7 boards working. I'll do a follow-on patch to fix the case above (which will require deciding how to hand per cpu data). drew Once you change the code to enable caches, please consider hanging on spin_lock with caches disabled. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - Il 16/09/2014 14:43, Andrew Jones ha scritto: I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Ok, I'll test and apply your patch then. Once you change the code to enable caches, please consider hanging on spin_lock with caches disabled. Unfortunately I can't do that without changing spin_lock into a wrapper function. Early setup code calls functions that use spin_locks, e.g. puts(), and we won't want to move the cache enablement into early setup code, as that should be left for unit tests to turn on off as they wish. Thus we either need to be able to change the spin_lock implementation dynamically, or just leave the test/return as is. drew Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] x86: structs for cpuid info in x86
* Nadav Amit na...@cs.technion.ac.il wrote: The code that deals with x86 cpuid fields is hard to follow since it performs many bit operations and does not refer to cpuid field explicitly. To eliminate the need of openning a spec whenever dealing with cpuid fields, this patch-set introduces structs that reflect the various cpuid functions. Thanks for reviewing the patch-set. Nadav Amit (3): x86: Adding structs to reflect cpuid fields x86: Use new cpuid structs in cpuid functions KVM: x86: Using cpuid structs in KVM arch/x86/include/asm/cpuid_def.h | 163 +++ arch/x86/kernel/cpu/common.c | 56 -- arch/x86/kvm/cpuid.c | 36 + 3 files changed, 219 insertions(+), 36 deletions(-) create mode 100644 arch/x86/include/asm/cpuid_def.h I personally like bitfields in theory (they provide type clarity and abstract robustness, compared to open-coded bitmask numeric literals that are often used in cpuid using code, obfuscating cpuid usage), with the big caveat that for many years I didn't like bitfields in practice: older versions of GCC did a really poor job of optimizing them. So such a series would only be acceptable if it's demonstrated that both 'latest' and 'reasonably old' GCC versions do a good job in that department, compared to the old open-coded bitmask ops ... Comparing the 'size vmlinux' output of before/after kernels would probably be a good start in seeing the impact of such a change. If those results are positive then this technique could be propagated to all cpuid using code in arch/x86/, of which there's plenty. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto: + if (!locked) { + BUG_ON(npages != -EBUSY); VM_BUG_ON perhaps? @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(current, current-mm, addr, write_fault, page); up_read(current-mm-mmap_sem); - } else - npages = get_user_pages_fast(addr, 1, write_fault, - page); + } else { + /* + * By now we have tried gup_fast, and possible async_pf, and we + * are certainly not atomic. Time to retry the gup, allowing + * mmap semaphore to be relinquished in the case of IO. + */ + npages = kvm_get_user_page_retry(current, current-mm, addr, + write_fault, page); This is a separate logical change. Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call for agend for 2014-09-16
On 16 September 2014 01:10, Juan Quintela quint...@redhat.com wrote: Juan Quintela quint...@redhat.com wrote: Hi Please, send any topic that you are interested in covering. People have complained on the past that I don't cancel the call until the very last minute. So, what do you think that deadline for submitting topics is 23:00UTC on Monday? ok, no topics no call. I can't make it, but weren't the people discussing reverse-execution saying they wanted to talk about that on the call? -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call for agend for 2014-09-16
Peter Maydell peter.mayd...@linaro.org wrote: On 16 September 2014 01:10, Juan Quintela quint...@redhat.com wrote: Juan Quintela quint...@redhat.com wrote: Hi Please, send any topic that you are interested in covering. People have complained on the past that I don't cancel the call until the very last minute. So, what do you think that deadline for submitting topics is 23:00UTC on Monday? ok, no topics no call. I can't make it, but weren't the people discussing reverse-execution saying they wanted to talk about that on the call? Dunno, but clearly they didn't answer to the call for topics, I didn't saw that thread. I can put it for the next call. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvmtool/arm{,64}: fix ARM initrd functionality
lkvm -i is currently broken on ARM/ARM64. We should not try to convert smaller-than-4GB addresses into 64-bit big endian and then stuff them into u32 variables if we expect to read anything other than 0 out of it. Adjust the type to u64 to write the proper address in BE format into the /chosen node (and also match the address size we formely posted) and let Linux thus read the right values. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/fdt.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c index 186a718..1a76b07c 100644 --- a/tools/kvm/arm/fdt.c +++ b/tools/kvm/arm/fdt.c @@ -119,8 +119,8 @@ static int setup_fdt(struct kvm *kvm) /* Initrd */ if (kvm-arch.initrd_size != 0) { - u32 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start); - u32 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start + + u64 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start); + u64 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start + kvm-arch.initrd_size); _FDT(fdt_property(fdt, linux,initrd-start, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
- Original Message - - Original Message - Il 16/09/2014 14:43, Andrew Jones ha scritto: I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Ok, I'll test and apply your patch then. Actually, yeah, please apply now in order to get A7 boards working. I'll do a follow-on patch to fix the case above (which will require deciding how to hand per cpu data). Post coffee, I don't see why I shouldn't just use SCTLR.C as my boolean, which is of course per cpu, and means the same thing, i.e. caches enabled or not. I'll send a v2 that drops mem_caches_enabled, and modifies the logic of the spin_lock asm. drew drew Once you change the code to enable caches, please consider hanging on spin_lock with caches disabled. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] virtio-blk performance degradation happened with virito-serial
If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. That could be the case if MSI is disabled. Do the windows virtio drivers enable MSIs, in their inf file? It depends on the version of the drivers, but it is a reasonable guess at what differs between Linux and Windows. Haoyu, can you give us the output of lspci from a Linux guest? I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. virtio_console module installed: 64K-write-sequence: 285 MBPS, 4380 IOPS virtio_console module uninstalled: 64K-write-sequence: 370 MBPS, 5670 IOPS I use top -d 1 -H -p qemu-pid to monitor the cpu usage, and found that, virtio_console module installed: qemu main thread cpu usage: 98% virtio_console module uninstalled: qemu main thread cpu usage: 60% I found that the statement err = register_virtio_driver(virtio_console); in virtio_console module's init() function will cause the degradation, if I directly return before err = register_virtio_driver(virtio_console);, then the degradation disappeared, if directly return after err = register_virtio_driver(virtio_console);, the degradation is still there. I will try below test case, 1. Dose not emulate virito-serial deivce, then install/uninstall virtio_console driver in guest, to see whether there is difference in virtio-blk performance and cpu usage. 2. Does not emulate virito-serial deivce, then install virtio_balloon driver (and also dose not emulate virtio-balloon device), to see whether virtio-blk performance degradation will happen. 3. Emulating virtio-balloon device instead of virtio-serial deivce , then to see whether the virtio-blk performance is hampered. Base on the test result, corresponding analysis will be performed. Any ideas? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()
On Tue, 16 Sep 2014 08:27:40 +0800 Amos Kong ak...@redhat.com wrote: Set timeout to 10: non-smp guest with quick backend (1.2M/s) - about 490K/s) That sounds like an awful lot. This is a 60% loss in throughput. I don't think we can live with that. -- Michael signature.asc Description: PGP signature
Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
Amos Kong ak...@redhat.com writes: On Sun, Sep 14, 2014 at 01:12:58AM +0800, Amos Kong wrote: On Thu, Sep 11, 2014 at 09:08:03PM +0930, Rusty Russell wrote: Amos Kong ak...@redhat.com writes: When I check hwrng attributes in sysfs, cat process always gets stuck if guest has only 1 vcpu and uses a slow rng backend. Currently we check if there is any tasks waiting to be run on current cpu in rng_dev_read() by need_resched(). But need_resched() doesn't work because rng_dev_read() is executing in user context. I don't understand this explanation? I'd expect the sysfs process to be woken by the mutex_unlock(). But actually sysfs process's not woken always, this is they the process gets stuck. %s/they/why/ Hi Rusty, Reference: http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-2.html Sure, that was true when I wrote it, and is still true when preempt is off. read() syscall of /dev/hwrng will enter into kernel, the read operation is rng_dev_read(), it's userspace context (not interrupt context). Userspace context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. This is true assuming preempt is off, yes. In this case, the need_resched() doesn't work. This is exactly what need_resched() is for: it should return true if there's another process of sufficient priority waiting to be run. It implies that schedule() would run it. git blame doesn't offer any enlightenment here, as to why we use schedule_timeout_interruptible() at all. I would expect mutex_unlock() to wake the other reader. The code certainly seems to, so it should now be runnable and need_resched() should return true. I suspect something else is happening which makes this work. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
On Tue, Sep 16, 2014 at 9:52 AM, Andres Lagar-Cavilla andre...@google.com wrote: Apologies to all. Resend as lists rejected my gmail-formatted version. Now on plain text. Won't happen again. On Tue, Sep 16, 2014 at 6:51 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto: + if (!locked) { + BUG_ON(npages != -EBUSY); VM_BUG_ON perhaps? Sure. @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(current, current-mm, addr, write_fault, page); up_read(current-mm-mmap_sem); - } else - npages = get_user_pages_fast(addr, 1, write_fault, - page); + } else { + /* + * By now we have tried gup_fast, and possible async_pf, and we + * are certainly not atomic. Time to retry the gup, allowing + * mmap semaphore to be relinquished in the case of IO. + */ + npages = kvm_get_user_page_retry(current, current-mm, addr, + write_fault, page); This is a separate logical change. Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? Nope. The intention was to pass FAULT_FLAG_RETRY to the vma fault handler (without _NOWAIT). And once you do that, if you come back without holding the mmap sem, you need to call yet again. By that point in the call chain I felt comfortable dropping the _fast. All paths that get there have already tried _fast (and some have tried _NOWAIT). I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the fault handler (e.g. filemap) know that we've been there and waited on the IO already, so in the common case we won't need to redo the IO. Have a look at how FAULT_FLAG_TRIED is used in e.g. arch/x86/mm/fault.c. Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. Thanks! Andres Paolo -- Andres Lagar-Cavilla | Google Cloud Platform | andre...@google.com | 647-778-4380 -- Andres Lagar-Cavilla | Google Cloud Platform | andre...@google.com | 647-778-4380 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off
On Tue, Sep 16, 2014 at 10:38:11AM -0400, Andrew Jones wrote: - Original Message - - Original Message - Il 16/09/2014 14:43, Andrew Jones ha scritto: I don't think we need to worry about this case. AFAIU, enabling the caches for a particular cpu shouldn't require any synchronization. So we should be able to do enable caches spin_lock start other processors spin_unlock Ok, I'll test and apply your patch then. Actually, yeah, please apply now in order to get A7 boards working. I'll do a follow-on patch to fix the case above (which will require deciding how to hand per cpu data). Post coffee, I don't see why I shouldn't just use SCTLR.C as my boolean, which is of course per cpu, and means the same thing, i.e. caches enabled or not. I'll send a v2 that drops mem_caches_enabled, and modifies the logic of the spin_lock asm. Scratch this idea. It breaks the use of spin_lock from usr mode. I think plan B is the best, which is; apply this patch, and then I'll make mem_caches_enabled per cpu later, once we support per cpu data. Once you change the code to enable caches, please consider hanging on spin_lock with caches disabled. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
Il 16/09/2014 18:52, Andres Lagar-Cavilla ha scritto: Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? I meant the intention of the original author, not yours. By that point in the call chain I felt comfortable dropping the _fast. All paths that get there have already tried _fast (and some have tried _NOWAIT). Yes, understood. I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the fault handler (e.g. filemap) know that we've been there and waited on the IO already, so in the common case we won't need to redo the IO. Yes, that's not what FOLL_TRIED does. But it's the difference between get_user_pages and kvm_get_user_page_retry, right? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
On Tue, Sep 16, 2014 at 11:29 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/09/2014 18:52, Andres Lagar-Cavilla ha scritto: Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? I meant the intention of the original author, not yours. Yes, in all likelihood. I hope! By that point in the call chain I felt comfortable dropping the _fast. All paths that get there have already tried _fast (and some have tried _NOWAIT). Yes, understood. I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the fault handler (e.g. filemap) know that we've been there and waited on the IO already, so in the common case we won't need to redo the IO. Yes, that's not what FOLL_TRIED does. But it's the difference between get_user_pages and kvm_get_user_page_retry, right? Unfortunately get_user_pages does not expose the param (int *nonblocking) that __gup will use to set FAULT_FLAG_ALLOW_RETRY. So that's one difference. The second difference is that kvm_gup_retry will call two times if necessary (the second without _RETRY but with _TRIED). Thanks Andres Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool/arm{,64}: fix ARM initrd functionality
On 2014-09-16 15:37, Andre Przywara wrote: lkvm -i is currently broken on ARM/ARM64. We should not try to convert smaller-than-4GB addresses into 64-bit big endian and then stuff them into u32 variables if we expect to read anything other than 0 out of it. Adjust the type to u64 to write the proper address in BE format into the /chosen node (and also match the address size we formely posted) and let Linux thus read the right values. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/fdt.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c index 186a718..1a76b07c 100644 --- a/tools/kvm/arm/fdt.c +++ b/tools/kvm/arm/fdt.c @@ -119,8 +119,8 @@ static int setup_fdt(struct kvm *kvm) /* Initrd */ if (kvm-arch.initrd_size != 0) { - u32 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start); - u32 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start + + u64 ird_st_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start); + u64 ird_end_prop = cpu_to_fdt64(kvm-arch.initrd_guest_start + kvm-arch.initrd_size); _FDT(fdt_property(fdt, linux,initrd-start, Who need an initrd anyway? ;-) Acked-by: Marc Zyngier marc.zyng...@arm.com M. -- Fast, cheap, reliable. Pick two. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] x86: structs for cpuid info in x86
On 9/16/14 4:22 PM, Ingo Molnar wrote: * Nadav Amit na...@cs.technion.ac.il wrote: The code that deals with x86 cpuid fields is hard to follow since it performs many bit operations and does not refer to cpuid field explicitly. To eliminate the need of openning a spec whenever dealing with cpuid fields, this patch-set introduces structs that reflect the various cpuid functions. Thanks for reviewing the patch-set. Nadav Amit (3): x86: Adding structs to reflect cpuid fields x86: Use new cpuid structs in cpuid functions KVM: x86: Using cpuid structs in KVM arch/x86/include/asm/cpuid_def.h | 163 +++ arch/x86/kernel/cpu/common.c | 56 -- arch/x86/kvm/cpuid.c | 36 + 3 files changed, 219 insertions(+), 36 deletions(-) create mode 100644 arch/x86/include/asm/cpuid_def.h I personally like bitfields in theory (they provide type clarity and abstract robustness, compared to open-coded bitmask numeric literals that are often used in cpuid using code, obfuscating cpuid usage), with the big caveat that for many years I didn't like bitfields in practice: older versions of GCC did a really poor job of optimizing them. So such a series would only be acceptable if it's demonstrated that both 'latest' and 'reasonably old' GCC versions do a good job in that department, compared to the old open-coded bitmask ops ... Comparing the 'size vmlinux' output of before/after kernels would probably be a good start in seeing the impact of such a change. If those results are positive then this technique could be propagated to all cpuid using code in arch/x86/, of which there's plenty. Thanks for the quick response. I was not aware GCC behaves this way. I made some small experiments with GCC-4.8 and GCC-4.4 and in brief my conclusions are: 1. The assembled code of bitmask and bitfields is indeed different. 2. GCC-4.8 and GCC-4.4 behave pretty much the same, yet GCC-4.8 appears to make better instructions reordering. 3. Loading/storing a single bitfield seems to be pretty much optimized (marginal advantage from code size point-of-view for bitmask, same number of instructions). 4. Loading/storing multiple bitfields seems to be somewhat under-optimized - multiple accesses to the original value result in ~30% more instructions and code-size. So you are correct - bitfields are less optimized. Nonetheless, since cpuid data is mostly used during startup, and otherwise a single bitfield is usually accessed in each function - I wonder whether it worth keeping the optimized but obfuscate code. Obviously, I can guess your answer to this question... Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm, The suffix '_retry' is not best suited for this. On first reading, I imagined we will be retrying something from before, possibly calling it in a loop, but we are actually doing the first and last try in one call. Hard to find something that conveys our lock-dropping mechanic, '_polite' is my best candidate at the moment. + int flags = FOLL_TOUCH | FOLL_HWPOISON | (FOLL_HWPOISON wasn't used before, but it's harmless.) 2014-09-16 15:51+0200, Paolo Bonzini: Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto: @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(current, current-mm, addr, write_fault, page); up_read(current-mm-mmap_sem); - } else - npages = get_user_pages_fast(addr, 1, write_fault, -page); + } else { + /* +* By now we have tried gup_fast, and possible async_pf, and we ^ (If we really tried get_user_pages_fast, we wouldn't be here, so I'd prepend two underscores here as well.) +* are certainly not atomic. Time to retry the gup, allowing +* mmap semaphore to be relinquished in the case of IO. +*/ + npages = kvm_get_user_page_retry(current, current-mm, addr, +write_fault, page); This is a separate logical change. Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? I believe so as well. (Looking at get_user_pages_fast and __get_user_pages_fast made my abstraction detector very sad.) I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). Not sure if that would help to understand the goal ... Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. I would prefer to have the last hunk in a separate patch, but still, Acked-by: Radim Krčmář rkrc...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm, The suffix '_retry' is not best suited for this. On first reading, I imagined we will be retrying something from before, possibly calling it in a loop, but we are actually doing the first and last try in one call. We are doing ... the second and third in most scenarios. async_pf did the first with _NOWAIT. We call this from the async pf retrier, or if async pf couldn't be notified to the guest. Hard to find something that conveys our lock-dropping mechanic, '_polite' is my best candidate at the moment. I'm at a loss towards finding a better name than '_retry'. + int flags = FOLL_TOUCH | FOLL_HWPOISON | (FOLL_HWPOISON wasn't used before, but it's harmless.) Ok. Wasn't 100% sure TBH. 2014-09-16 15:51+0200, Paolo Bonzini: Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto: @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(current, current-mm, addr, write_fault, page); up_read(current-mm-mmap_sem); - } else - npages = get_user_pages_fast(addr, 1, write_fault, -page); + } else { + /* +* By now we have tried gup_fast, and possible async_pf, and we ^ (If we really tried get_user_pages_fast, we wouldn't be here, so I'd prepend two underscores here as well.) Yes, async pf tries and fails to do fast, and then we fallback to slow, and so on. +* are certainly not atomic. Time to retry the gup, allowing +* mmap semaphore to be relinquished in the case of IO. +*/ + npages = kvm_get_user_page_retry(current, current-mm, addr, +write_fault, page); This is a separate logical change. Was this: down_read(mm-mmap_sem); npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); up_read(mm-mmap_sem); the intention rather than get_user_pages_fast? I believe so as well. (Looking at get_user_pages_fast and __get_user_pages_fast made my abstraction detector very sad.) It's clunky, but a separate battle. I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This properly relinquishes mmap semaphore if the filemap/swap has to wait on page lock (and retries the gup to completion after that). Not sure if that would help to understand the goal ... Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. I would prefer to have the last hunk in a separate patch, but still, Acked-by: Radim Krčmář rkrc...@redhat.com Awesome, thanks much. I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything else from this email should go into the recut. Andres -- Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4 resend] Introduce device assignment flag operation helper function
On Tue, Sep 09, 2014 at 10:21:24AM +0800, Ethan Zhao wrote: This patch set introduces three PCI device flag operation helper functions when set pci device PF/VF to assigned or deassigned status also check it. and patch 2,3,4 apply these helper functions to KVM,XEN and PCI. v2: simplify unnecessory ternary operation in function pci_is_dev_assigned(). v3: amend helper function naming. v4: fix incorrect type in return expression warning found by Fengguang Wu. Appreciate suggestion from alex.william...@redhat.com, david.vra...@citrix.com, alexander.h.du...@intel.com Resend for v3.18 building. Applied to pci/virtualization for v3.18, thanks. Thanks, Ethan --- Ethan Zhao (4): PCI: introduce helper functions for device flag operation KVM: use pci device flag operation helper functions xen-pciback: use pci device flag operation helper function PCI: use device flag operation helper function in iov.c drivers/pci/iov.c |2 +- drivers/xen/xen-pciback/pci_stub.c |4 ++-- include/linux/pci.h| 13 + virt/kvm/assigned-dev.c|2 +- virt/kvm/iommu.c |4 ++-- 5 files changed, 19 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu
On Thu, Sep 11, 2014 at 9:24 PM, Andre Przywara andre.przyw...@arm.com wrote: Hi Anup, On 08/09/14 09:17, Anup Patel wrote: Instead, of trying out each and every target type we should use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target type for KVM ARM/ARM64. If KVM_ARM_PREFERRED_TARGET vm ioctl fails then we fallback to old method of trying all known target types. If KVM_ARM_PREFERRED_TARGET vm ioctl succeeds but the returned target type is not known to KVMTOOL then we forcefully init VCPU with target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/arm/kvm-cpu.c | 52 +-- 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c index aeaa4cf..ba7a762 100644 --- a/tools/kvm/arm/kvm-cpu.c +++ b/tools/kvm/arm/kvm-cpu.c @@ -33,7 +33,8 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) struct kvm_arm_target *target; struct kvm_cpu *vcpu; int coalesced_offset, mmap_size, err = -1; - unsigned int i; + unsigned int i, target_type; + struct kvm_vcpu_init preferred_init; struct kvm_vcpu_init vcpu_init = { .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id) }; @@ -55,19 +56,47 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) if (vcpu-kvm_run == MAP_FAILED) die(unable to mmap vcpu fd); - /* Find an appropriate target CPU type. */ - for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { - if (!kvm_arm_targets[i]) - continue; - target = kvm_arm_targets[i]; - vcpu_init.target = target-id; - err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); - if (!err) - break; + /* + * If preferred target ioctl successful then use preferred target + * else try each and every target type. + */ + err = ioctl(kvm-vm_fd, KVM_ARM_PREFERRED_TARGET, preferred_init); + if (!err) { + /* Match preferred target CPU type. */ + target = NULL; + for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { + if (!kvm_arm_targets[i]) + continue; + if (kvm_arm_targets[i]-id == preferred_init.target) { + target = kvm_arm_targets[i]; + target_type = kvm_arm_targets[i]-id; + break; + } + } + if (!target) { + target = kvm_arm_targets[0]; I think you missed the part of the patch which adds the now magic zero member of kvm_arm_targets[]. A simple static initializer should work. Yes, good catch. I will fix this. + target_type = preferred_init.target; Can't you move that out of the loop, in front of it actually? Then you can get rid of the line above setting the target_type also, since you always use the same value now, regardless whether you found that CPU in the list or not. Sure, I'll rearrange this. + } + } else { + /* Find an appropriate target CPU type. */ + for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { + if (!kvm_arm_targets[i]) + continue; + target = kvm_arm_targets[i]; + target_type = target-id; + vcpu_init.target = target_type; + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); + if (!err) + break; + } + if (err) + die(Unable to find matching target); } + vcpu_init.target = target_type; + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); You should do this only in the if-branch above, since you (try to) call KVM_ARM_VCPU_INIT already in the else branch before. Otherwise in the latter case you would do it twice. I was trying to follow the old code which was doing KVM_ARM_VCPU_INIT twice for the same VCPU but it makes sense to do it only once. Regards, Anup Regards, Andre. if (err || target-init(vcpu)) - die(Unable to initialise ARM vcpu); + die(Unable to initialise vcpu); coalesced_offset = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO); @@ -81,6 +110,7 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) vcpu-cpu_type = target-id; vcpu-cpu_compatible= target-compatible; vcpu-is_running= true; + return vcpu; } CONFIDENTIALITY
Re: [PATCH v3 2/4] kvmtool: ARM64: Add target type potenza for aarch64
On Thu, Sep 11, 2014 at 9:37 PM, Andre Przywara andre.przyw...@arm.com wrote: Anup, On 08/09/14 09:17, Anup Patel wrote: The VCPU target type KVM_ARM_TARGET_XGENE_POTENZA is available in latest Linux-3.16-rcX or higher hence register aarch64 target type for it. This patch enables us to run KVMTOOL on X-Gene Potenza host. Why do you need this still if the previous patch got rid of the need for naming each and every CPU in kvmtool? Do you care about kernels older than 3.12? I wouldn't bother so much since you'd need a much newer kvmtool anyway. Yes, actually APM SW team uses 3.12 kernel. Can you consider dropping this patch then? I'd rather avoid adding CPUs to this list needlessly from now on. I think lets keep this patch because there are quite a few X-Gene users using older kernel. Regards, Anup Regards, Andre. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/arm/aarch64/arm-cpu.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index ce5ea2f..ce526e3 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -41,10 +41,17 @@ static struct kvm_arm_target target_cortex_a57 = { .init = arm_cpu__vcpu_init, }; +static struct kvm_arm_target target_potenza = { + .id = KVM_ARM_TARGET_XGENE_POTENZA, + .compatible = arm,arm-v8, + .init = arm_cpu__vcpu_init, +}; + static int arm_cpu__core_init(struct kvm *kvm) { return (kvm_cpu__register_kvm_arm_target(target_aem_v8) || kvm_cpu__register_kvm_arm_target(target_foundation_v8) || - kvm_cpu__register_kvm_arm_target(target_cortex_a57)); + kvm_cpu__register_kvm_arm_target(target_cortex_a57) || + kvm_cpu__register_kvm_arm_target(target_potenza)); } core_init(arm_cpu__core_init); CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT
On Thu, Sep 11, 2014 at 9:56 PM, Andre Przywara andre.przyw...@arm.com wrote: On 08/09/14 09:17, Anup Patel wrote: The KVM_EXIT_SYSTEM_EVENT exit reason was added to define architecture independent system-wide events for a Guest. Currently, it is used by in-kernel PSCI-0.2 emulation of KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF or PSCI SYSTEM_RESET request. For now, we simply treat all system-wide guest events as shutdown request in KVMTOOL. Is that really a good idea to default to exit_kvm? I find a shutdown a rather drastic default. Also I'd like to see RESET not easily mapped to shutdown. If the user resets the box explicitly, it's probably expected to come up again (to load an updated kernel or proceed with an install). Absolutely correct but we don't have VM reboot API in KVMTOOL so I choose this route. So what about a more explicit message like: ... please restart the VM until we gain proper reboot support in kvmtool? Sure, I will print additional info for reset event such as: INFO: VM reboot support not available INFO: Please restart the VM manually Regards, Anup Regards, Andre. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/kvm-cpu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index ee0a8ec..6d01192 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -160,6 +160,25 @@ int kvm_cpu__start(struct kvm_cpu *cpu) goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; + case KVM_EXIT_SYSTEM_EVENT: + /* + * Print the type of system event and + * treat all system events as shutdown request. + */ + switch (cpu-kvm_run-system_event.type) { + case KVM_SYSTEM_EVENT_SHUTDOWN: + printf( # Info: shutdown system event\n); + break; + case KVM_SYSTEM_EVENT_RESET: + printf( # Info: reset system event\n); + break; + default: + printf( # Warning: unknown system event type=%d\n, +cpu-kvm_run-system_event.type); + break; + }; + printf( # Info: exiting KVMTOOL\n); + goto exit_kvm; default: { bool ret; CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
[Emergency posting to fix the tag and couldn't find unmangled Cc list, so some recipients were dropped, sorry. (I guess you are glad though).] 2014-09-16 14:01-0700, Andres Lagar-Cavilla: On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm, The suffix '_retry' is not best suited for this. On first reading, I imagined we will be retrying something from before, possibly calling it in a loop, but we are actually doing the first and last try in one call. We are doing ... the second and third in most scenarios. async_pf did the first with _NOWAIT. We call this from the async pf retrier, or if async pf couldn't be notified to the guest. I was thinking more about what the function does, not how we currently use it -- nothing prevents us from using it as first somewhere -- but yeah, even comments would be off then. Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. I would prefer to have the last hunk in a separate patch, but still, Acked-by: Radim Krčmář rkrc...@redhat.com Awesome, thanks much. I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything else from this email should go into the recut. Ah, sorry, I'm not maintaining mm ... what I meant was Reviewed-by: Radim Krčmář rkrc...@redhat.com and I had to leave before I could find a good apology for VM_WARN_ON_ONCE(), so if you are replacing BUG_ON, you might want to look at that one as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu
On Wed, Sep 17, 2014 at 3:43 AM, Anup Patel apa...@apm.com wrote: On Thu, Sep 11, 2014 at 9:24 PM, Andre Przywara andre.przyw...@arm.com wrote: Hi Anup, On 08/09/14 09:17, Anup Patel wrote: Instead, of trying out each and every target type we should use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target type for KVM ARM/ARM64. If KVM_ARM_PREFERRED_TARGET vm ioctl fails then we fallback to old method of trying all known target types. If KVM_ARM_PREFERRED_TARGET vm ioctl succeeds but the returned target type is not known to KVMTOOL then we forcefully init VCPU with target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/arm/kvm-cpu.c | 52 +-- 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c index aeaa4cf..ba7a762 100644 --- a/tools/kvm/arm/kvm-cpu.c +++ b/tools/kvm/arm/kvm-cpu.c @@ -33,7 +33,8 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) struct kvm_arm_target *target; struct kvm_cpu *vcpu; int coalesced_offset, mmap_size, err = -1; - unsigned int i; + unsigned int i, target_type; + struct kvm_vcpu_init preferred_init; struct kvm_vcpu_init vcpu_init = { .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id) }; @@ -55,19 +56,47 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) if (vcpu-kvm_run == MAP_FAILED) die(unable to mmap vcpu fd); - /* Find an appropriate target CPU type. */ - for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { - if (!kvm_arm_targets[i]) - continue; - target = kvm_arm_targets[i]; - vcpu_init.target = target-id; - err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); - if (!err) - break; + /* + * If preferred target ioctl successful then use preferred target + * else try each and every target type. + */ + err = ioctl(kvm-vm_fd, KVM_ARM_PREFERRED_TARGET, preferred_init); + if (!err) { + /* Match preferred target CPU type. */ + target = NULL; + for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { + if (!kvm_arm_targets[i]) + continue; + if (kvm_arm_targets[i]-id == preferred_init.target) { + target = kvm_arm_targets[i]; + target_type = kvm_arm_targets[i]-id; + break; + } + } + if (!target) { + target = kvm_arm_targets[0]; I think you missed the part of the patch which adds the now magic zero member of kvm_arm_targets[]. A simple static initializer should work. Yes, good catch. I will fix this. + target_type = preferred_init.target; Can't you move that out of the loop, in front of it actually? Then you can get rid of the line above setting the target_type also, since you always use the same value now, regardless whether you found that CPU in the list or not. Sure, I'll rearrange this. + } + } else { + /* Find an appropriate target CPU type. */ + for (i = 0; i ARRAY_SIZE(kvm_arm_targets); ++i) { + if (!kvm_arm_targets[i]) + continue; + target = kvm_arm_targets[i]; + target_type = target-id; + vcpu_init.target = target_type; + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); + if (!err) + break; + } + if (err) + die(Unable to find matching target); } + vcpu_init.target = target_type; + err = ioctl(vcpu-vcpu_fd, KVM_ARM_VCPU_INIT, vcpu_init); You should do this only in the if-branch above, since you (try to) call KVM_ARM_VCPU_INIT already in the else branch before. Otherwise in the latter case you would do it twice. I was trying to follow the old code which was doing KVM_ARM_VCPU_INIT twice for the same VCPU but it makes sense to do it only once. Regards, Anup Regards, Andre. if (err || target-init(vcpu)) - die(Unable to initialise ARM vcpu); + die(Unable to initialise vcpu); coalesced_offset = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO); @@ -81,6 +110,7 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) vcpu-cpu_type = target-id; vcpu-cpu_compatible= target-compatible;
Re: [PATCH v3 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT
On Wed, Sep 17, 2014 at 3:59 AM, Anup Patel apa...@apm.com wrote: On Thu, Sep 11, 2014 at 9:56 PM, Andre Przywara andre.przyw...@arm.com wrote: On 08/09/14 09:17, Anup Patel wrote: The KVM_EXIT_SYSTEM_EVENT exit reason was added to define architecture independent system-wide events for a Guest. Currently, it is used by in-kernel PSCI-0.2 emulation of KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF or PSCI SYSTEM_RESET request. For now, we simply treat all system-wide guest events as shutdown request in KVMTOOL. Is that really a good idea to default to exit_kvm? I find a shutdown a rather drastic default. Also I'd like to see RESET not easily mapped to shutdown. If the user resets the box explicitly, it's probably expected to come up again (to load an updated kernel or proceed with an install). Absolutely correct but we don't have VM reboot API in KVMTOOL so I choose this route. So what about a more explicit message like: ... please restart the VM until we gain proper reboot support in kvmtool? Sure, I will print additional info for reset event such as: INFO: VM reboot support not available INFO: Please restart the VM manually Regards, Anup Regards, Andre. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/kvm-cpu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index ee0a8ec..6d01192 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -160,6 +160,25 @@ int kvm_cpu__start(struct kvm_cpu *cpu) goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; + case KVM_EXIT_SYSTEM_EVENT: + /* + * Print the type of system event and + * treat all system events as shutdown request. + */ + switch (cpu-kvm_run-system_event.type) { + case KVM_SYSTEM_EVENT_SHUTDOWN: + printf( # Info: shutdown system event\n); + break; + case KVM_SYSTEM_EVENT_RESET: + printf( # Info: reset system event\n); + break; + default: + printf( # Warning: unknown system event type=%d\n, +cpu-kvm_run-system_event.type); + break; + }; + printf( # Info: exiting KVMTOOL\n); + goto exit_kvm; default: { bool ret; CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Please ignore this notice. I used wrong email in my last reply. Regards, Anup ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] kvmtool: ARM64: Add target type potenza for aarch64
On Wed, Sep 17, 2014 at 3:54 AM, Anup Patel apa...@apm.com wrote: On Thu, Sep 11, 2014 at 9:37 PM, Andre Przywara andre.przyw...@arm.com wrote: Anup, On 08/09/14 09:17, Anup Patel wrote: The VCPU target type KVM_ARM_TARGET_XGENE_POTENZA is available in latest Linux-3.16-rcX or higher hence register aarch64 target type for it. This patch enables us to run KVMTOOL on X-Gene Potenza host. Why do you need this still if the previous patch got rid of the need for naming each and every CPU in kvmtool? Do you care about kernels older than 3.12? I wouldn't bother so much since you'd need a much newer kvmtool anyway. Yes, actually APM SW team uses 3.12 kernel. Can you consider dropping this patch then? I'd rather avoid adding CPUs to this list needlessly from now on. I think lets keep this patch because there are quite a few X-Gene users using older kernel. Regards, Anup Regards, Andre. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org --- tools/kvm/arm/aarch64/arm-cpu.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index ce5ea2f..ce526e3 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -41,10 +41,17 @@ static struct kvm_arm_target target_cortex_a57 = { .init = arm_cpu__vcpu_init, }; +static struct kvm_arm_target target_potenza = { + .id = KVM_ARM_TARGET_XGENE_POTENZA, + .compatible = arm,arm-v8, + .init = arm_cpu__vcpu_init, +}; + static int arm_cpu__core_init(struct kvm *kvm) { return (kvm_cpu__register_kvm_arm_target(target_aem_v8) || kvm_cpu__register_kvm_arm_target(target_foundation_v8) || - kvm_cpu__register_kvm_arm_target(target_cortex_a57)); + kvm_cpu__register_kvm_arm_target(target_cortex_a57) || + kvm_cpu__register_kvm_arm_target(target_potenza)); } core_init(arm_cpu__core_init); CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Please ignore this notice. I used wrong email in my last reply. Regards, Anup ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Using the tlb flush util function where applicable
Hi Radim, On Mon, Sep 15, 2014 at 09:33:52PM +0200, Radim Krčmář wrote: 2014-09-12 17:06-0400, Liang Chen: Using kvm_mmu_flush_tlb as the other places to make sure vcpu stat is incremented Signed-off-by: Liang Chen liangchen.li...@gmail.com --- Good catch. arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bfe11cf..439682e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1810,7 +1810,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) struct desc_ptr *gdt = __get_cpu_var(host_gdt); unsigned long sysenter_esp; -kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); +kvm_mmu_flush_tlb(vcpu); local_irq_disable(); crash_disable_local_vmclear(cpu); And to hijack this thread ... I noticed three other deficits in stat.tlb_flush, patch below. Do you prefer the current behavior? --- 8 --- KVM: x86: count actual tlb flushes - we count KVM_REQ_TLB_FLUSH requests, not actual flushes So there maybe multiple requests accumulated at the point of kvm_check_request, if your patch account these accumulations correctly? Regards, Wanpeng Li (KVM can have multiple requests for one flush) - flushes from kvm_flush_remote_tlbs aren't counted - it's easy to make a direct request by mistake Solve these by postponing the counting to kvm_check_request(). Signed-off-by: Radim Krčmář rkrc...@redhat.com --- (And what about a possible followup patch that replaces kvm_mmu_flush_tlb() with kvm_make_request() again? It would free the namespace a bit and we could call something similarly named from vcpu_enter_guest() to do the job.) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9314678..b41fd97 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3452,7 +3452,6 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu, void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu) { - ++vcpu-stat.tlb_flush; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 916e895..0f0ad08 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6041,8 +6041,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu)) kvm_mmu_sync_roots(vcpu); - if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) + if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) { + ++vcpu-stat.tlb_flush; kvm_x86_ops-tlb_flush(vcpu); + } if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) { vcpu-run-exit_reason = KVM_EXIT_TPR_ACCESS; r = 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it
Hi 在 08/01/2014 10:52 PM, Dr. David Alan Gilbert 写道: * Yang Hongyang (yan...@cn.fujitsu.com) wrote: We need a buffer to store migration data. On save side: all saved data was write into colo buffer first, so that we can know the total size of the migration data. this can also separate the data transmission from colo control data, we use colo control data over socket fd to synchronous both side's stat. On restore side: all migration data was read into colo buffer first, then load data from the buffer: If network error happens while data transmission, the slaver can still functinal because the migration data are not yet loaded. This is very similar to the QEMUSizedBuffer based QEMUFile's that Stefan Berger wrote and that I use in both my postcopy and BER patchsets: http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00846.html (and to the similar code from Isaku Yamahata). I think we should be able to use a shared version even if we need some changes. I've being using this patch as COLO buffer, see my latest branch: https://github.com/macrosheep/qemu/tree/colo_v0.4 But this QEMUSizedBuffer does not exactly match our needs, although it can work. We need a static buffer that lives through COLO process so that we don't need to alloc/free buffers every checkpoint. Currently open the buffer for write always alloc a new buffer, I think I need the following modification: 1. ability to open an existing QEMUSizedBuffer for write 2. a reset_buf() api, that will clear buffered data, or just rewind QEMUFile related to the buffer is enough. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 112 +++ 1 file changed, 112 insertions(+) diff --git a/migration-colo.c b/migration-colo.c index d566b9d..b90d9b6 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -11,6 +11,7 @@ #include qemu/main-loop.h #include qemu/thread.h #include block/coroutine.h +#include qemu/error-report.h #include migration/migration-colo.h static QEMUBH *colo_bh; @@ -20,14 +21,122 @@ bool colo_supported(void) return true; } +/* colo buffer */ + +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) +#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL) Powers of 2 are nicer! Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK . -- Thanks, Yang. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How KVM hypervisor allocates physical pages to the VM.
Dear KVM Developers: I have some questions about how KVM hypervisor requests and allocate physical pages to the VM. I am using kernel version 3.2.14. I run a microbenchmark in the VM, which declares an array with certain size and then assigns some value to all the elements in the array, which causes page fault captured by the hypervisor and allocates the machine physical pages. Depending on the array size, I got two different trace results. (1) When array size == 10MB or 20 MB, the trace file has the page allocation events for the qemu-kvm process as follows qemu-kvm-14402 [004] 1912063.581683: kmalloc_node: call_site=813df54a ptr=880779a42800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1 qemu-kvm-14402 [004] 1912063.581701: kmem_cache_alloc_node: call_site=813e302c ptr=8800229ff500 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1 qemu-kvm-14402 [004] 1912063.581701: kmalloc_node: call_site=813df54a ptr=880779a42800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1 qemu-kvm-14402 [004] 1912063.581710: kmem_cache_alloc_node: call_site=813e302c ptr=8800229ff800 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1 qemu-kvm-14402 [004] 1912063.581710: kmalloc_node: call_site=813df54a ptr=880779a47800 bytes_req=640 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT node=-1 qemu-kvm-14402 [004] 1912063.581728: kmem_cache_alloc_node: call_site=813e302c ptr=8800229ffe00 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_REPEAT ... ... (2) When array size == 40MB, the trace file has the page allocation events as qemu-kvm-14450 [005] 1911006.440538: mm_page_alloc: page=ea0002570f40 pfn=613437 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440542: mm_page_alloc: page=ea0002577480 pfn=613842 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440545: mm_page_alloc: page=ea70a3c0 pfn=115343 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440549: mm_page_alloc: page=ea0001a9a500 pfn=435860 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440552: mm_page_alloc: page=ea00016f7e80 pfn=376314 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440556: mm_page_alloc: page=ea0001a9d700 pfn=436060 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440559: mm_page_alloc: page=ea0002576880 pfn=613794 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440563: mm_page_alloc: page=ea00016f7b40 pfn=376301 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZERO qemu-kvm-14450 [005] 1911006.440569: mm_page_alloc: page=ea00016f7f80 pfn=376318 order=0 migratetype=2 gfp_flags=GFP_HIGHUSER_MOVABLE|GFP_ZER -- When size = 10MB and 20MB, it looks like that KVM use kmem_cache_alloc_node and kmalloc_node to allocate physical pages. However, when size = 40MB, KVM hypervisor uses mm_page_alloc to allocator physical pages. The former is based on the slab allocator, while the latter is directly from the buddy allocator. So what is the heuristic used by the KVM to determine when to use the slab allocator or directly from the buddy allocator? Or is there anything wrong with my trace file? Thanks in advance. - Hui -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem
On Tue, Sep 16, 2014 at 3:34 PM, Radim Krčmář rkrc...@redhat.com wrote: [Emergency posting to fix the tag and couldn't find unmangled Cc list, so some recipients were dropped, sorry. (I guess you are glad though).] 2014-09-16 14:01-0700, Andres Lagar-Cavilla: On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct mm_struct *mm, The suffix '_retry' is not best suited for this. On first reading, I imagined we will be retrying something from before, possibly calling it in a loop, but we are actually doing the first and last try in one call. We are doing ... the second and third in most scenarios. async_pf did the first with _NOWAIT. We call this from the async pf retrier, or if async pf couldn't be notified to the guest. I was thinking more about what the function does, not how we currently use it -- nothing prevents us from using it as first somewhere -- but yeah, even comments would be off then. Good point. Happy to expand comments. What about _complete? _io? _full? Apart from this, the patch looks good. The mm/ parts are minimal, so I think it's best to merge it through the KVM tree with someone's Acked-by. I would prefer to have the last hunk in a separate patch, but still, Acked-by: Radim Krčmář rkrc...@redhat.com Awesome, thanks much. I'll recut with the VM_BUG_ON from Paolo and your Ack. LMK if anything else from this email should go into the recut. Ah, sorry, I'm not maintaining mm ... what I meant was Reviewed-by: Radim Krčmář rkrc...@redhat.com Cool cool cool Andres and I had to leave before I could find a good apology for VM_WARN_ON_ONCE(), so if you are replacing BUG_ON, you might want to look at that one as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html