Re: Seeing DMAR errors after multiple load/unload with SR-IOV
On Tue, Jun 7, 2011 at 4:04 AM, Chris Wright chr...@sous-sol.org wrote: * Alex Williamson (alex.william...@redhat.com) wrote: On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote: Hi, I am using linux kernel 2.6.39. I have a IBM x3650 M3 system. I have used following boot options - intel_iommu=on iommu=pt I was loading/unloading my NIC driver(be2net) with num_vfs=7. After some iterations I get following DMAR errors - Jun 4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0. Jun 4 03:50:20 rhel6 kernel: Do you have a strange power saving mode enabled? Jun 4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue Jun 4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2 Jun 4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2] fault addr 78077000 Jun 4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in context entry is clear I was trying to debug this. I dont understand iommu code much. The physical address belongs the printed PCI function and there should not have been an error. I am unable to see pci_dev(pdev) of VFs getting removed from si_domain-devices list(intel-iommu.c) when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs. Looks like issue happens when when freed pdev is allocated again and as it is already in list, required initializations dont happen. I dont know if my understanding is correct. Can anyone point me to what the issue may be? Yes, that's correct. The (now replaced) check identity_mapping() will succeed when the pci_dev is recycled (it's freed, but never removed from the list, this is an issue with passtrhough mode and device creation/desctruction). This false match happens w/ a brand new pci_dev which still has default 32bit DMA mask, so it is removed from pt domain. During removal domain_remove_one_dev_info() test that matches only on bus/devfn (now also segment) will match despite the fact that the info-pdev != pdev-dev.archdata.iommu. Then...Oops Typically devices are removed from the domain via drivers/pci/intel-iommu.c:device_notifier(), which is called as the device is unbound from the driver. However, this seems to get skipped when running in passthrough mode, so I'm not sure where that's supposed to occur. Does it happen w/o passthrough? I had tried without passthrough on RHEL 6.1 GA kernel. Was seeing hangs and panics. Will check if non passthrough mode works on latest kernel. If you blacklist the driver then a create/delete may do similar (haven't tested that idea). Also note that some intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update and see if anything is better there. Thanks, The change in identity_mapping() means we won't demote to 32-bit DMA (drop out of pt domain), so I don't think we'll see the same issue. For testing I had made a hack in 2.6.39 kernel which will prevent demoting to 32bit DMA mask and thereby prevent calling of domain_remove_one_dev_info() for the specific VF device I was using and it had worked. So as you said I may not hit the issue in latest kernel. Will try that. thanks, -chris Thanks for the response and suggestions. Padmanabh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Fix build warnings
On Tue, May 31, 2011 at 12:26:55PM +0200, Ingo Molnar wrote: * Avi Kivity a...@redhat.com wrote: On 05/31/2011 10:38 AM, Ingo Molnar wrote: * Borislav Petkovb...@alien8.de wrote: +++ b/arch/x86/kvm/paging_tmpl.h @@ -121,7 +121,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, gva_t addr, u32 access) { pt_element_t pte; - pt_element_t __user *ptep_user; + pt_element_t __user *uninitialized_var(ptep_user); Note that doing this is actually actively dangerous for two reasons. snip lots of good advice Please fix it instead. s/instead/in addition/; while all those changes are good, they are much too large for 3.0. Let's push the simple fix for 3.0 and queue the bigger refactoring to 3.1. Yeah, that's probably wise, this is a tricky function. So, any progress on this front? Warning is still there in -rc2. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Fix build warnings
On 06/07/2011 10:28 AM, Borislav Petkov wrote: So, any progress on this front? Warning is still there in -rc2. Thanks for the reminder, applied and queued. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] Graphics card pass-through working with two pass pci-initialization
On 2011-06-06 08:30, Gerd Hoffmann wrote: Hi, As Jan points out though, is a dynamic PCI region really needed? Those that need a large PCI region are also likely to need a large amount of memory. Maybe the space for PCI should just be increased. Just changing it will not work as it will break live migration. Changing logic in the BIOS won't break migration (the active BIOS is included in the migration of RAM, current mappings are part of the device states). Changing the 4G mapping in qemu's hw/pc_piix.c would break it and needs to be coupled to the machine version. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it
On 06/07/2011 01:04 AM, Jan Kiszka wrote: On 2011-06-06 23:48, Alex Williamson wrote: On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com At least kernels 2.6.38 and 2.6.39 do not properly support issuing a reset on an assigned device and corrupt its config space. Prevent this by checking for a host kernel with the required support, tagged by the to-be-introduced KVM_CAP_DEVICE_RESET. Wouldn't it be easier just to revert ed78661f in 2.6.39 stable? I guess we don't have an option to do that for .38 since stable is done there, but there are also some intel-iommu breakages that won't make stable for that release. It seems like the userspace invoked reset resolves known, demonstrable issues of devices continuing to DMA into guest memory while ed78661f is mostly a theoretical change. Easier would be this patch. But I don't mind reverting the problematic commit in 39, whatever is preferred. We should just resolve the issue finally. Kernel problems should be solved in the kernel (with exceptions of course, but don't see the need here). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it
On 2011-06-07 10:06, Avi Kivity wrote: On 06/07/2011 01:04 AM, Jan Kiszka wrote: On 2011-06-06 23:48, Alex Williamson wrote: On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com At least kernels 2.6.38 and 2.6.39 do not properly support issuing a reset on an assigned device and corrupt its config space. Prevent this by checking for a host kernel with the required support, tagged by the to-be-introduced KVM_CAP_DEVICE_RESET. Wouldn't it be easier just to revert ed78661f in 2.6.39 stable? I guess we don't have an option to do that for .38 since stable is done there, but there are also some intel-iommu breakages that won't make stable for that release. It seems like the userspace invoked reset resolves known, demonstrable issues of devices continuing to DMA into guest memory while ed78661f is mostly a theoretical change. Easier would be this patch. But I don't mind reverting the problematic commit in 39, whatever is preferred. We should just resolve the issue finally. Kernel problems should be solved in the kernel (with exceptions of course, but don't see the need here). Then please file a revert for stable ASAP. Jan signature.asc Description: OpenPGP digital signature
Re: KVM: VMX: do not overwrite uptodate vcpu-arch.cr3 on KVM_SET_SREGS
On 06/06/2011 08:27 PM, Marcelo Tosatti wrote: Only decache guest CR3 value if vcpu-arch.cr3 is stale. Fixes loadvm with live guest. @@ -2049,7 +2049,9 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, unsigned long cr0, struct kvm_vcpu *vcpu) { - vmx_decache_cr3(vcpu); + + if (!test_bit(VCPU_EXREG_CR3, (ulong *)vcpu-arch.regs_avail)) + vmx_decache_cr3(vcpu); if (!(cr0 X86_CR0_PG)) { /* From paging/starting to nonpaging */ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, Applied and queued, but I think there is something rotten here. How does arch.cr3 get into GUEST_CR3 after KVM_SET_SREGS? arch.cr3 is a supposed to be write-through cache - it only has a bit in regs_avail, not regs_dirty. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio-spec: Fix wrong bit number of device status
From: Amos Kong ak...@redhat.com qemu-kvm/hw/virtio_config.h: #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1 #define VIRTIO_CONFIG_S_DRIVER 2 #define VIRTIO_CONFIG_S_DRIVER_OK 4 #define VIRTIO_CONFIG_S_FAILED 0x80 virtio-spec: ACKNOWLEDGE(1) : DRIVER(2) : DRIVER_OK(3) : FAILED(128): The spec refers to bit numbers and the headers use absolute numbers, they are not consistent. it shoule be 'FAILED(8)'. 2^(8-1) = 128 Signed-off-by: Amos Kong ak...@redhat.com --- virtio-spec.lyx |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 448af76..41b7657 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -1552,7 +1552,7 @@ FAILED \begin_inset space ~ \end_inset -(128) Indicates that something went wrong in the guest, and it has given +(7) Indicates that something went wrong in the guest, and it has given up on the device. This could be an internal error, or the driver didn't like the device for some reason, or even a fatal error during device operation. -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/15] KVM: optimize for MMIO handled
The idea of this patchset is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) The aim of this patchset is to support fast mmio emulate, it reduce searching mmio gfn from memslots which is very expensive since we need to walk all slots for mmio gfn, and the other advantage is: we can reduce guest page table walking for soft mmu. Lockless walk shadow page table is introduced in this patchset, it is the light way to check the page fault is the real mmio page fault or something is running out of our mind. And, if shadow_notrap_nonpresent_pte is enabled(bypass_guest_pf=1), mmio page fault and normal page fault is mixed(the reserved is set for all page fault), it has little regression, if the box can generate lots of mmio access, for example, the network server, it can disable shadow_notrap_nonpresent_pte and enable mmio pf, after all, we can enable/disable mmio pf at the runtime. The performance test result: Netperf (TCP_RR): === ept is enabled: Before After 1st 709.58 734.60 2nd 715.40 723.75 3rd 713.45 724.22 ept=0 bypass_guest_pf=0: Before After 1st 706.10 709.63 2nd 709.38 715.80 3rd 695.90 710.70 Kernbech (do not redirect output to /dev/null) == ept is enabled: Before After 1st 2m34.749s 2m33.482s 2nd 2m34.651s 2m33.161s 3rd 2m34.543s 2m34.271s ept=0 bypass_guest_pf=0: Before After 1st 4m43.467s 4m41.873s 2nd 4m45.225s 4m41.668s 3rd 4m47.029s 4m40.128s -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/15] KVM: MMU: fix walking shadow page table
Properly check the last mapping, and do not walk to the next level if last spte is met Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2d14434..cda666a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1517,10 +1517,6 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) if (iterator-level PT_PAGE_TABLE_LEVEL) return false; - if (iterator-level == PT_PAGE_TABLE_LEVEL) - if (is_large_pte(*iterator-sptep)) - return false; - iterator-index = SHADOW_PT_INDEX(iterator-addr, iterator-level); iterator-sptep = ((u64 *)__va(iterator-shadow_addr)) + iterator-index; return true; @@ -1528,6 +1524,11 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) { + if (is_last_spte(*iterator-sptep, iterator-level)) { + iterator-level = 0; + return; + } + iterator-shadow_addr = *iterator-sptep PT64_BASE_ADDR_MASK; --iterator-level; } -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent
Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index cda666a..125f78d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) struct kvm_mmu_page *sp; unsigned long *rmapp; - if (!is_rmap_spte(*spte)) - return 0; - sp = page_header(__pa(spte)); kvm_mmu_page_set_gfn(sp, spte - sp-spt, gfn); rmapp = gfn_to_rmap(vcpu-kvm, gfn, sp-role.level); @@ -2078,11 +2075,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (!was_rmapped is_large_pte(*sptep)) ++vcpu-kvm-stat.lpages; - page_header_update_slot(vcpu-kvm, sptep, gfn); - if (!was_rmapped) { - rmap_count = rmap_add(vcpu, sptep, gfn); - if (rmap_count RMAP_RECYCLE_THRESHOLD) - rmap_recycle(vcpu, sptep, gfn); + if (is_shadow_present_pte(*sptep)) { + page_header_update_slot(vcpu-kvm, sptep, gfn); + if (!was_rmapped) { + rmap_count = rmap_add(vcpu, sptep, gfn); + if (rmap_count RMAP_RECYCLE_THRESHOLD) + rmap_recycle(vcpu, sptep, gfn); + } } kvm_release_pfn_clean(pfn); if (speculative) { -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/15] KVM: x86: avoid unnecessarily guest page table walking
We already get the guest physical address, so use it to read guest data directly to avoid walking guest page table again Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/x86.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 694538a..8be9ff6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3930,8 +3930,7 @@ static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt, if ((gpa PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) goto mmio; - if (kvm_read_guest_virt(ctxt, addr, val, bytes, exception) - == X86EMUL_CONTINUE) + if (!kvm_read_guest(vcpu-kvm, gpa, val, bytes)) return X86EMUL_CONTINUE; mmio: -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/15] KVM: MMU: cache mmio info on page fault path
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |5 +++ arch/x86/kvm/mmu.c | 21 +-- arch/x86/kvm/mmu.h | 23 + arch/x86/kvm/paging_tmpl.h | 21 ++- arch/x86/kvm/x86.c | 52 ++ arch/x86/kvm/x86.h | 36 +++ 6 files changed, 126 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d167039..326af42 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -414,6 +414,11 @@ struct kvm_vcpu_arch { u64 mcg_ctl; u64 *mce_banks; + /* Cache MMIO info */ + u64 mmio_gva; + unsigned access; + gfn_t mmio_gfn; + /* used for guest single stepping over the given code position */ unsigned long singlestep_rip; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 125f78d..415030e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -217,11 +217,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, } EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); -static bool is_write_protection(struct kvm_vcpu *vcpu) -{ - return kvm_read_cr0_bits(vcpu, X86_CR0_WP); -} - static int is_cpuid_PSE36(void) { return 1; @@ -243,11 +238,6 @@ static int is_large_pte(u64 pte) return pte PT_PAGE_SIZE_MASK; } -static int is_writable_pte(unsigned long pte) -{ - return pte PT_WRITABLE_MASK; -} - static int is_dirty_gpte(unsigned long pte) { return pte PT_DIRTY_MASK; @@ -2238,15 +2228,17 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct * send_sig_info(SIGBUS, info, tsk); } -static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn) +static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gva_t gva, + unsigned access, gfn_t gfn, pfn_t pfn) { kvm_release_pfn_clean(pfn); if (is_hwpoison_pfn(pfn)) { - kvm_send_hwpoison_signal(gfn_to_hva(kvm, gfn), current); + kvm_send_hwpoison_signal(gfn_to_hva(vcpu-kvm, gfn), current); return 0; } else if (is_fault_pfn(pfn)) return -EFAULT; + vcpu_cache_mmio_info(vcpu, gva, gfn, access); return 1; } @@ -2328,7 +2320,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn, /* mmio */ if (is_error_pfn(pfn)) - return kvm_handle_bad_page(vcpu-kvm, gfn, pfn); + return kvm_handle_bad_page(vcpu, v, ACC_ALL, gfn, pfn); spin_lock(vcpu-kvm-mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) @@ -2555,6 +2547,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu) if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) return; + vcpu_clear_mmio_info(vcpu, ~0ull); trace_kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); if (vcpu-arch.mmu.root_level == PT64_ROOT_LEVEL) { hpa_t root = vcpu-arch.mmu.root_hpa; @@ -2701,7 +2694,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, /* mmio */ if (is_error_pfn(pfn)) - return kvm_handle_bad_page(vcpu-kvm, gfn, pfn); + return kvm_handle_bad_page(vcpu, 0, 0, gfn, pfn); spin_lock(vcpu-kvm-mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 7086ca8..05310b1 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -76,4 +76,27 @@ static inline int is_present_gpte(unsigned long pte) return pte PT_PRESENT_MASK; } +static inline int is_writable_pte(unsigned long pte) +{ + return pte PT_WRITABLE_MASK; +} + +static inline bool is_write_protection(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr0_bits(vcpu, X86_CR0_WP); +} + +static inline bool check_write_user_access(struct kvm_vcpu *vcpu, + bool write_fault, bool user_fault, + unsigned long pte) +{ + if (unlikely(write_fault !is_writable_pte(pte) + (user_fault || is_write_protection(vcpu + return false; + + if (unlikely(user_fault !(pte PT_USER_MASK))) + return false; + + return true; +} #endif diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 6c4dc01..b0c8184 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -201,11 +201,8 @@ walk: break; } - if (unlikely(write_fault
[PATCH 05/15] KVM: MMU: optimize to handle dirty bit
If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 13 ++--- arch/x86/kvm/paging_tmpl.h | 30 ++ 2 files changed, 16 insertions(+), 27 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 415030e..a10afd4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1923,7 +1923,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, int user_fault, - int write_fault, int dirty, int level, + int write_fault, int level, gfn_t gfn, pfn_t pfn, bool speculative, bool can_unsync, bool host_writable) { @@ -1938,8 +1938,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, spte = PT_PRESENT_MASK; if (!speculative) spte |= shadow_accessed_mask; - if (!dirty) - pte_access = ~ACC_WRITE_MASK; + if (pte_access ACC_EXEC_MASK) spte |= shadow_x_mask; else @@ -2014,7 +2013,7 @@ done: static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pt_access, unsigned pte_access, -int user_fault, int write_fault, int dirty, +int user_fault, int write_fault, int *ptwrite, int level, gfn_t gfn, pfn_t pfn, bool speculative, bool host_writable) @@ -2050,7 +2049,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, - dirty, level, gfn, pfn, speculative, true, + level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) *ptwrite = 1; @@ -2120,7 +2119,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, for (i = 0; i ret; i++, gfn++, start++) mmu_set_spte(vcpu, start, ACC_ALL, -access, 0, 0, 1, NULL, +access, 0, 0, NULL, sp-role.level, gfn, page_to_pfn(pages[i]), true, true); @@ -2184,7 +2183,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, unsigned pte_access = ACC_ALL; mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access, -0, write, 1, pt_write, +0, write, pt_write, level, gfn, pfn, prefault, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu-stat.pf_fixed; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index b0c8184..67971da 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -106,6 +106,9 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) unsigned access; access = (gpte (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; + if (!is_dirty_gpte(gpte)) + access = ~ACC_WRITE_MASK; + #if PTTYPE == 64 if (vcpu-arch.mmu.nx) access = ~(gpte PT64_NX_SHIFT); @@ -378,7 +381,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * vcpu-arch.update_pte.pfn was fetched from get_user_pages(write = 1). */ mmu_set_spte(vcpu, spte, sp-role.access, pte_access, 0, 0, -is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL, +NULL, PT_PAGE_TABLE_LEVEL, gpte_to_gfn(gpte), pfn, true, true); } @@ -429,7 +432,6 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, unsigned pte_access; gfn_t gfn; pfn_t pfn; - bool dirty; if (spte == sptep) continue; @@ -444,16 +446,15 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); gfn = gpte_to_gfn(gpte); - dirty = is_dirty_gpte(gpte); pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn, - (pte_access ACC_WRITE_MASK) dirty); + pte_access ACC_WRITE_MASK); if (is_error_pfn(pfn)) { kvm_release_pfn_clean(pfn); break; } mmu_set_spte(vcpu, spte, sp-role.access, pte_access, 0, 0, -
[PATCH 06/15] KVM: MMU: cleanup for FNAME(fetch)
gw-pte_access is the final access permission, since it is unified with gw-pt_access when we walked guest page table: FNAME(walk_addr_generic): pte_access = pt_access FNAME(gpte_access)(vcpu, pte); Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 67971da..95da29e 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -477,7 +477,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_present_gpte(gw-ptes[gw-level - 1])) return NULL; - direct_access = gw-pt_access gw-pte_access; + direct_access = gw-pte_access; top_level = vcpu-arch.mmu.root_level; if (top_level == PT32E_ROOT_LEVEL) @@ -535,7 +535,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, link_shadow_page(it.sptep, sp); } - mmu_set_spte(vcpu, it.sptep, access, gw-pte_access access, + mmu_set_spte(vcpu, it.sptep, access, gw-pte_access, user_fault, write_fault, ptwrite, it.level, gw-gfn, pfn, prefault, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep); -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/15] KVM: MMU: rename 'pt_write' to 'emulate'
If 'pt_write' is true, we need to emulate the fault. And in later patch, we need to emulate the fault even though it is not a pt_write event, so rename it to better fit the meaning Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 10 +- arch/x86/kvm/paging_tmpl.h | 16 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a10afd4..05e604d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2014,7 +2014,7 @@ done: static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pt_access, unsigned pte_access, int user_fault, int write_fault, -int *ptwrite, int level, gfn_t gfn, +int *emulate, int level, gfn_t gfn, pfn_t pfn, bool speculative, bool host_writable) { @@ -2052,7 +2052,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *ptwrite = 1; + *emulate = 1; kvm_mmu_flush_tlb(vcpu); } @@ -2175,7 +2175,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; - int pt_write = 0; + int emulate = 0; gfn_t pseudo_gfn; for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { @@ -2183,7 +2183,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, unsigned pte_access = ACC_ALL; mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access, -0, write, pt_write, +0, write, emulate, level, gfn, pfn, prefault, map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu-stat.pf_fixed; @@ -2211,7 +2211,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, | shadow_accessed_mask); } } - return pt_write; + return emulate; } static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 95da29e..8353b69 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -465,7 +465,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct guest_walker *gw, int user_fault, int write_fault, int hlevel, -int *ptwrite, pfn_t pfn, bool map_writable, +int *emulate, pfn_t pfn, bool map_writable, bool prefault) { unsigned access = gw-pt_access; @@ -536,7 +536,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } mmu_set_spte(vcpu, it.sptep, access, gw-pte_access, -user_fault, write_fault, ptwrite, it.level, +user_fault, write_fault, emulate, it.level, gw-gfn, pfn, prefault, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep); @@ -570,7 +570,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, int user_fault = error_code PFERR_USER_MASK; struct guest_walker walker; u64 *sptep; - int write_pt = 0; + int emulate = 0; int r; pfn_t pfn; int level = PT_PAGE_TABLE_LEVEL; @@ -631,19 +631,19 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, if (!force_pt_level) transparent_hugepage_adjust(vcpu, walker.gfn, pfn, level); sptep = FNAME(fetch)(vcpu, addr, walker, user_fault, write_fault, -level, write_pt, pfn, map_writable, prefault); +level, emulate, pfn, map_writable, prefault); (void)sptep; - pgprintk(%s: shadow pte %p %llx ptwrite %d\n, __func__, -sptep, *sptep, write_pt); + pgprintk(%s: shadow pte %p %llx emulate %d\n, __func__, +sptep, *sptep, emulate); - if (!write_pt) + if (!emulate) vcpu-arch.last_pt_write_count = 0; /* reset fork detector */ ++vcpu-stat.pf_fixed; trace_kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); spin_unlock(vcpu-kvm-mmu_lock); - return write_pt; + return emulate; out_unlock: spin_unlock(vcpu-kvm-mmu_lock); -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body
[PATCH 08/15] KVM: MMU: count used shadow pages on preparing path
Move counting used shadow pages from committing path to preparing path to reduce tlb flush on some paths Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 05e604d..43e7ca1 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1039,7 +1039,7 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr) percpu_counter_add(kvm_total_used_mmu_pages, nr); } -static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) +static void kvm_mmu_free_page(struct kvm_mmu_page *sp) { ASSERT(is_empty_shadow_page(sp-spt)); hlist_del(sp-hash_link); @@ -1048,7 +1048,6 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) if (!sp-role.direct) free_page((unsigned long)sp-gfns); kmem_cache_free(mmu_page_header_cache, sp); - kvm_mod_used_mmu_pages(kvm, -1); } static unsigned kvm_page_table_hashfn(gfn_t gfn) @@ -1655,6 +1654,7 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, /* Count self */ ret++; list_move(sp-link, invalid_list); + kvm_mod_used_mmu_pages(kvm, -1); } else { list_move(sp-link, kvm-arch.active_mmu_pages); kvm_reload_remote_mmus(kvm); @@ -1678,7 +1678,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, do { sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); WARN_ON(!sp-role.invalid || sp-root_count); - kvm_mmu_free_page(kvm, sp); + kvm_mmu_free_page(sp); } while (!list_empty(invalid_list)); } @@ -1704,8 +1704,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); kvm_mmu_prepare_zap_page(kvm, page, invalid_list); - kvm_mmu_commit_zap_page(kvm, invalid_list); } + kvm_mmu_commit_zap_page(kvm, invalid_list); goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages; } @@ -3290,9 +3290,9 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) sp = container_of(vcpu-kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list); - kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list); ++vcpu-kvm-stat.mmu_recycled; } + kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list); } int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/15] KVM: MMU: split kvm_mmu_free_page
Split kvm_mmu_free_page to kvm_mmu_free_lock_parts and kvm_mmu_free_unlock_parts One is used to free the parts which is under mmu lock and the other is used to free the parts which can allow be freed out of mmu lock It is used by later patch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 43e7ca1..9f3a746 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1039,17 +1039,27 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr) percpu_counter_add(kvm_total_used_mmu_pages, nr); } -static void kvm_mmu_free_page(struct kvm_mmu_page *sp) +static void kvm_mmu_free_lock_parts(struct kvm_mmu_page *sp) { ASSERT(is_empty_shadow_page(sp-spt)); hlist_del(sp-hash_link); - list_del(sp-link); - free_page((unsigned long)sp-spt); if (!sp-role.direct) free_page((unsigned long)sp-gfns); +} + +static void kvm_mmu_free_unlock_parts(struct kvm_mmu_page *sp) +{ + list_del(sp-link); + free_page((unsigned long)sp-spt); kmem_cache_free(mmu_page_header_cache, sp); } +static void kvm_mmu_free_page(struct kvm_mmu_page *sp) +{ + kvm_mmu_free_lock_parts(sp); + kvm_mmu_free_unlock_parts(sp); +} + static unsigned kvm_page_table_hashfn(gfn_t gfn) { return gfn ((1 KVM_MMU_HASH_SHIFT) - 1); -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/15] KVM: MMU: lockless walking shadow page table
Using rcu to protect shadow pages table to be freed, so we can safely walk it, it should run fast and is needed by mmio page fault Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |4 ++ arch/x86/kvm/mmu.c | 79 ++- arch/x86/kvm/mmu.h |4 +- arch/x86/kvm/vmx.c |2 +- 4 files changed, 69 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 326af42..260582b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -232,6 +232,8 @@ struct kvm_mmu_page { unsigned int unsync_children; unsigned long parent_ptes; /* Reverse mapping for parent_pte */ DECLARE_BITMAP(unsync_child_bitmap, 512); + + struct rcu_head rcu; }; struct kvm_pv_mmu_op_buffer { @@ -478,6 +480,8 @@ struct kvm_arch { u64 hv_guest_os_id; u64 hv_hypercall; + atomic_t reader_counter; + #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; #endif diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9f3a746..52d4682 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1675,6 +1675,30 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, return ret; } +static void free_mmu_pages_unlock_parts(struct list_head *invalid_list) +{ + struct kvm_mmu_page *sp; + + list_for_each_entry(sp, invalid_list, link) + kvm_mmu_free_lock_parts(sp); +} + +static void free_invalid_pages_rcu(struct rcu_head *head) +{ + struct kvm_mmu_page *next, *sp; + + sp = container_of(head, struct kvm_mmu_page, rcu); + while (sp) { + if (!list_empty(sp-link)) + next = list_first_entry(sp-link, + struct kvm_mmu_page, link); + else + next = NULL; + kvm_mmu_free_unlock_parts(sp); + sp = next; + } +} + static void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list) { @@ -1685,6 +1709,14 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); + if (atomic_read(kvm-arch.reader_counter)) { + free_mmu_pages_unlock_parts(invalid_list); + sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); + list_del_init(invalid_list); + call_rcu(sp-rcu, free_invalid_pages_rcu); + return; + } + do { sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); WARN_ON(!sp-role.invalid || sp-root_count); @@ -2601,6 +2633,35 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, return vcpu-arch.nested_mmu.translate_gpa(vcpu, vaddr, access); } +int kvm_mmu_walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr, + u64 sptes[4]) +{ + struct kvm_shadow_walk_iterator iterator; + int nr_sptes = 0; + + rcu_read_lock(); + + atomic_inc(vcpu-kvm-arch.reader_counter); + /* Increase the counter before walking shadow page table */ + smp_mb__after_atomic_inc(); + + for_each_shadow_entry(vcpu, addr, iterator) { + sptes[iterator.level-1] = *iterator.sptep; + nr_sptes++; + if (!is_shadow_present_pte(*iterator.sptep)) + break; + } + + /* Decrease the counter after walking shadow page table finished */ + smp_mb__before_atomic_dec(); + atomic_dec(vcpu-kvm-arch.reader_counter); + + rcu_read_unlock(); + + return nr_sptes; +} +EXPORT_SYMBOL_GPL(kvm_mmu_walk_shadow_page_lockless); + static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code, bool prefault) { @@ -3684,24 +3745,6 @@ out: return r; } -int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]) -{ - struct kvm_shadow_walk_iterator iterator; - int nr_sptes = 0; - - spin_lock(vcpu-kvm-mmu_lock); - for_each_shadow_entry(vcpu, addr, iterator) { - sptes[iterator.level-1] = *iterator.sptep; - nr_sptes++; - if (!is_shadow_present_pte(*iterator.sptep)) - break; - } - spin_unlock(vcpu-kvm-mmu_lock); - - return nr_sptes; -} -EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy); - void kvm_mmu_destroy(struct kvm_vcpu *vcpu) { ASSERT(vcpu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 05310b1..e7725c4 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -48,7 +48,9 @@ #define PFERR_RSVD_MASK (1U 3) #define PFERR_FETCH_MASK (1U 4) -int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64
[PATCH 11/15] KVM: MMU: filter out the mmio pfn from the fault pfn
If the page fault is caused by mmio, the gfn can not be found in memslots, and 'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify the mmio page fault. And, to clarify the meaning of mmio pfn, we return fault page instead of bad page when the gfn is not allowed to prefetch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |4 ++-- include/linux/kvm_host.h |5 + virt/kvm/kvm_main.c | 16 ++-- 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 52d4682..7286d2a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2133,8 +2133,8 @@ static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log); if (!slot) { - get_page(bad_page); - return page_to_pfn(bad_page); + get_page(fault_page); + return page_to_pfn(fault_page); } hva = gfn_to_hva_memslot(slot, gfn); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b9c3299..16d6d3f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -326,12 +326,17 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) static inline int is_error_hpa(hpa_t hpa) { return hpa HPA_MSB; } extern struct page *bad_page; +extern struct page *fault_page; + extern pfn_t bad_pfn; +extern pfn_t fault_pfn; int is_error_page(struct page *page); int is_error_pfn(pfn_t pfn); int is_hwpoison_pfn(pfn_t pfn); int is_fault_pfn(pfn_t pfn); +int is_mmio_pfn(pfn_t pfn); +int is_invalid_pfn(pfn_t pfn); int kvm_is_error_hva(unsigned long addr); int kvm_set_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f78ddb8..93a1ce1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -97,8 +97,8 @@ static bool largepages_enabled = true; static struct page *hwpoison_page; static pfn_t hwpoison_pfn; -static struct page *fault_page; -static pfn_t fault_pfn; +struct page *fault_page; +pfn_t fault_pfn; inline int kvm_is_mmio_pfn(pfn_t pfn) { @@ -926,6 +926,18 @@ int is_fault_pfn(pfn_t pfn) } EXPORT_SYMBOL_GPL(is_fault_pfn); +int is_mmio_pfn(pfn_t pfn) +{ + return pfn == bad_pfn; +} +EXPORT_SYMBOL_GPL(is_mmio_pfn); + +int is_invalid_pfn(pfn_t pfn) +{ + return pfn == hwpoison_pfn || pfn == fault_pfn; +} +EXPORT_SYMBOL_GPL(is_invalid_pfn); + static inline unsigned long bad_hva(void) { return PAGE_OFFSET; -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/15] KVM: MMU: abstract some functions to handle fault pfn
Introduce handle_abnormal_pfn to handle fault pfn on page fault path, introduce mmu_invalid_pfn to handle fault pfn on prefetch path It is the preparing work for mmio page fault support Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 47 --- arch/x86/kvm/paging_tmpl.h | 12 +- 2 files changed, 41 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7286d2a..4f475ab 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2269,18 +2269,15 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct * send_sig_info(SIGBUS, info, tsk); } -static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gva_t gva, - unsigned access, gfn_t gfn, pfn_t pfn) +static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, pfn_t pfn) { kvm_release_pfn_clean(pfn); if (is_hwpoison_pfn(pfn)) { kvm_send_hwpoison_signal(gfn_to_hva(vcpu-kvm, gfn), current); return 0; - } else if (is_fault_pfn(pfn)) - return -EFAULT; + } - vcpu_cache_mmio_info(vcpu, gva, gfn, access); - return 1; + return -EFAULT; } static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, @@ -2325,6 +2322,33 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, } } +static bool mmu_invalid_pfn(pfn_t pfn) +{ + return unlikely(is_invalid_pfn(pfn) || is_mmio_pfn(pfn)); +} + +static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, + pfn_t pfn, unsigned access, int *ret_val) +{ + bool ret = true; + + /* The pfn is invalid, report the error! */ + if (unlikely(is_invalid_pfn(pfn))) { + *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn); + goto exit; + } + + if (unlikely(is_mmio_pfn(pfn))) { + vcpu_cache_mmio_info(vcpu, gva, gfn, ACC_ALL); + *ret_val = 1; + goto exit; + } + + ret = false; +exit: + return ret; +} + static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gva_t gva, pfn_t *pfn, bool write, bool *writable); @@ -2359,9 +2383,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn, if (try_async_pf(vcpu, prefault, gfn, v, pfn, write, map_writable)) return 0; - /* mmio */ - if (is_error_pfn(pfn)) - return kvm_handle_bad_page(vcpu, v, ACC_ALL, gfn, pfn); + if (handle_abnormal_pfn(vcpu, v, gfn, pfn, ACC_ALL, r)) + return r; spin_lock(vcpu-kvm-mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) @@ -2762,9 +2785,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, write, map_writable)) return 0; - /* mmio */ - if (is_error_pfn(pfn)) - return kvm_handle_bad_page(vcpu, 0, 0, gfn, pfn); + if (handle_abnormal_pfn(vcpu, 0, gfn, pfn, ACC_ALL, r)) + return r; + spin_lock(vcpu-kvm-mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 8353b69..4f960b2 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -371,7 +371,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte); pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); pfn = gfn_to_pfn_atomic(vcpu-kvm, gpte_to_gfn(gpte)); - if (is_error_pfn(pfn)) { + if (mmu_invalid_pfn(pfn)) { kvm_release_pfn_clean(pfn); return; } @@ -448,7 +448,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, gfn = gpte_to_gfn(gpte); pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn, pte_access ACC_WRITE_MASK); - if (is_error_pfn(pfn)) { + if (mmu_invalid_pfn(pfn)) { kvm_release_pfn_clean(pfn); break; } @@ -618,10 +618,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, map_writable)) return 0; - /* mmio */ - if (is_error_pfn(pfn)) - return kvm_handle_bad_page(vcpu, mmu_is_nested(vcpu) ? 0 : - addr, walker.pte_access, walker.gfn, pfn); + if (handle_abnormal_pfn(vcpu, mmu_is_nested(vcpu) ? 0 : addr, + walker.gfn, pfn, walker.pte_access, r)) + return r; +
[PATCH 13/15] KVM: VMX: modify the default value of nontrap shadow pte
Modify the default value to identify nontrap shadow pte and mmio shadow pte whill will be introduced in later patch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 20dbf7f..8c3d343 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7110,7 +7110,7 @@ static int __init vmx_init(void) kvm_disable_tdp(); if (bypass_guest_pf) - kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull); + kvm_mmu_set_nonpresent_ptes(0xfull 49 | 1ull, 0ull); return 0; -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/15] KVM: MMU: mmio page fault support
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. This feature can be disabled/enabled at the runtime, if shadow_notrap_nonpresent_pte is enabled, the PFER.RSVD is always set, we need to walk shadow page table for all page fault, so disable this feature if shadow_notrap_nonpresent is enabled. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 149 --- arch/x86/kvm/mmu.h |4 +- arch/x86/kvm/paging_tmpl.h | 32 +- arch/x86/kvm/vmx.c | 12 +++- 4 files changed, 180 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4f475ab..227cf10 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -91,6 +91,9 @@ module_param(dbg, bool, 0644); static int oos_shadow = 1; module_param(oos_shadow, bool, 0644); +static int __read_mostly mmio_pf = 1; +module_param(mmio_pf, bool, 0644); + #ifndef MMU_DEBUG #define ASSERT(x) do { } while (0) #else @@ -193,6 +196,44 @@ static u64 __read_mostly shadow_x_mask;/* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; +static u64 __read_mostly shadow_mmio_mask = (0xffull 49 | 1ULL); + +static void __set_spte(u64 *sptep, u64 spte) +{ + set_64bit(sptep, spte); +} + +static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access) +{ + access = ACC_WRITE_MASK | ACC_USER_MASK; + + __set_spte(sptep, shadow_mmio_mask | access | gfn PAGE_SHIFT); +} + +static bool is_mmio_spte(u64 spte) +{ + return (spte shadow_mmio_mask) == shadow_mmio_mask; +} + +static gfn_t get_mmio_spte_gfn(u64 spte) +{ + return (spte ~shadow_mmio_mask) PAGE_SHIFT; +} + +static unsigned get_mmio_spte_access(u64 spte) +{ + return (spte ~shadow_mmio_mask) ~PAGE_MASK; +} + +static bool set_mmio_spte(u64 *sptep, gfn_t gfn, pfn_t pfn, unsigned access) +{ + if (unlikely(is_mmio_pfn(pfn))) { + mark_mmio_spte(sptep, gfn, access); + return true; + } + + return false; +} static inline u64 rsvd_bits(int s, int e) { @@ -203,6 +244,8 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) { shadow_trap_nonpresent_pte = trap_pte; shadow_notrap_nonpresent_pte = notrap_pte; + if (trap_pte != notrap_pte) + mmio_pf = 0; } EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); @@ -230,7 +273,8 @@ static int is_nx(struct kvm_vcpu *vcpu) static int is_shadow_present_pte(u64 pte) { return pte != shadow_trap_nonpresent_pte -pte != shadow_notrap_nonpresent_pte; +pte != shadow_notrap_nonpresent_pte +!is_mmio_spte(pte); } static int is_large_pte(u64 pte) @@ -269,11 +313,6 @@ static gfn_t pse36_gfn_delta(u32 gpte) return (gpte PT32_DIR_PSE36_MASK) shift; } -static void __set_spte(u64 *sptep, u64 spte) -{ - set_64bit(sptep, spte); -} - static u64 __xchg_spte(u64 *sptep, u64 new_spte) { #ifdef CONFIG_X86_64 @@ -1972,6 +2011,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 spte, entry = *sptep; int ret = 0; + if (set_mmio_spte(sptep, gfn, pfn, pte_access)) + return 0; + /* * We don't set the accessed bit, since we sometimes want to see * whether the guest actually used the pte (in order to detect @@ -2098,6 +2140,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, kvm_mmu_flush_tlb(vcpu); } + if (unlikely(is_mmio_spte(*sptep) emulate)) + *emulate = 1; + pgprintk(%s: setting spte %llx\n, __func__, *sptep); pgprintk(instantiating %s PTE (%s) at %llx (%llx) addr %p\n, is_large_pte(*sptep)? 2MB : 4kB, @@ -2324,7 +2369,10 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, static bool mmu_invalid_pfn(pfn_t pfn) { - return unlikely(is_invalid_pfn(pfn) || is_mmio_pfn(pfn)); + if (unlikely(!mmio_pf is_mmio_pfn(pfn))) + return true; + + return unlikely(is_invalid_pfn(pfn)); } static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, @@ -2340,8 +2388,10 @@
[PATCH 15/15] KVM: MMU: trace mmio page fault
Add tracepoints to trace mmio page fault Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |4 +++ arch/x86/kvm/mmutrace.h| 48 arch/x86/kvm/x86.c |5 +++- include/trace/events/kvm.h | 24 ++ 4 files changed, 80 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 227cf10..aff8f52 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -207,6 +207,7 @@ static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access) { access = ACC_WRITE_MASK | ACC_USER_MASK; + trace_mark_mmio_spte(sptep, gfn, access); __set_spte(sptep, shadow_mmio_mask | access | gfn PAGE_SHIFT); } @@ -1752,6 +1753,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, free_mmu_pages_unlock_parts(invalid_list); sp = list_first_entry(invalid_list, struct kvm_mmu_page, link); list_del_init(invalid_list); + trace_kvm_mmu_delay_free_pages(sp); call_rcu(sp-rcu, free_invalid_pages_rcu); return; } @@ -2765,6 +2767,8 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, if (direct) addr = 0; + + trace_handle_mmio_page_fault(addr, gfn, access); vcpu_cache_mmio_info(vcpu, addr, gfn, access); return 1; } diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index b60b4fd..eed67f3 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -196,6 +196,54 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page, TP_ARGS(sp) ); +DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_delay_free_pages, + TP_PROTO(struct kvm_mmu_page *sp), + + TP_ARGS(sp) +); + +TRACE_EVENT( + mark_mmio_spte, + TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access), + TP_ARGS(sptep, gfn, access), + + TP_STRUCT__entry( + __field(void *, sptep) + __field(gfn_t, gfn) + __field(unsigned, access) + ), + + TP_fast_assign( + __entry-sptep = sptep; + __entry-gfn = gfn; + __entry-access = access; + ), + + TP_printk(sptep:%p gfn %llx access %x, __entry-sptep, __entry-gfn, + __entry-access) +); + +TRACE_EVENT( + handle_mmio_page_fault, + TP_PROTO(u64 addr, gfn_t gfn, unsigned access), + TP_ARGS(addr, gfn, access), + + TP_STRUCT__entry( + __field(u64, addr) + __field(gfn_t, gfn) + __field(unsigned, access) + ), + + TP_fast_assign( + __entry-addr = addr; + __entry-gfn = gfn; + __entry-access = access; + ), + + TP_printk(addr:%llx gfn %llx access %x, __entry-addr, __entry-gfn, + __entry-access) +); + TRACE_EVENT( kvm_mmu_audit, TP_PROTO(struct kvm_vcpu *vcpu, int audit_point), diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a136181..c75f845 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3914,6 +3914,7 @@ static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, vcpu-arch.access)) { *gpa = vcpu-arch.mmio_gfn PAGE_SHIFT | (gva (PAGE_SIZE - 1)); + trace_vcpu_match_mmio(gva, *gpa, write, false); return 1; } @@ -3929,8 +3930,10 @@ static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, if ((*gpa PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) return 1; - if (vcpu_match_mmio_gpa(vcpu, *gpa)) + if (vcpu_match_mmio_gpa(vcpu, *gpa)) { + trace_vcpu_match_mmio(gva, *gpa, write, true); return 1; + } return 0; } diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 46e3cd8..571e972 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -306,6 +306,30 @@ TRACE_EVENT( #endif +TRACE_EVENT( + vcpu_match_mmio, + TP_PROTO(gva_t gva, gpa_t gpa, bool write, bool gpa_match), + TP_ARGS(gva, gpa, write, gpa_match), + + TP_STRUCT__entry( + __field(gva_t, gva) + __field(gpa_t, gpa) + __field(bool, write) + __field(bool, gpa_match) + ), + + TP_fast_assign( + __entry-gva = gva; + __entry-gpa = gpa; + __entry-write = write; + __entry-gpa_match = gpa_match + ), + + TP_printk(gva %#lx gpa %#llx %s %s, __entry-gva, __entry-gpa, + __entry-write ? Write : Read, + __entry-gpa_match ? GPA : GVA) +); + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ -- 1.7.4.4
[PATCH v2] virtio-spec: Fix wrong bit number of device status
qemu-kvm/hw/virtio_config.h: #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1 #define VIRTIO_CONFIG_S_DRIVER 2 #define VIRTIO_CONFIG_S_DRIVER_OK 4 #define VIRTIO_CONFIG_S_FAILED 0x80 virtio-spec: ACKNOWLEDGE(1) : DRIVER(2) : DRIVER_OK(3) : FAILED(128): The spec refers to bit numbers and the headers use absolute numbers, they are not consistent. it shoule be 'FAILED(8)'. 2^(8-1) = 128 Changes from V1: - Fix wrong patch body Signed-off-by: Amos Kong ak...@redhat.com --- virtio-spec.lyx |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 448af76..1fc3e59 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -1552,7 +1552,7 @@ FAILED \begin_inset space ~ \end_inset -(128) Indicates that something went wrong in the guest, and it has given +(8) Indicates that something went wrong in the guest, and it has given up on the device. This could be an internal error, or the driver didn't like the device for some reason, or even a fatal error during device operation. -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07/06/11 04:22, Eric Dumazet wrote: Could you please try latest linux-2.6 tree ? We fixed many networking bugs that could explain your crash. No good I'm afraid. [ 543.040056] = [ 543.040136] BUG ip_dst_cache: Padding overwritten. 0x8803e4217ffe-0x8803e4217fff [ 543.040194] - [ 543.040198] [ 543.040298] INFO: Slab 0xea000d9e74d0 objects=25 used=25 fp=0x (null) flags=0x80004081 [ 543.040364] Pid: 4576, comm: kworker/1:2 Not tainted 3.0.0-rc2 #1 [ 543.040415] Call Trace: [ 543.040472] [810b9c1d] ? slab_err+0xad/0xd0 [ 543.040528] [8102e034] ? check_preempt_wakeup+0xa4/0x160 [ 543.040595] [810ba206] ? slab_pad_check+0x126/0x170 [ 543.040650] [8133045b] ? dst_destroy+0x8b/0x110 [ 543.040701] [810ba29a] ? check_slab+0x4a/0xc0 [ 543.040753] [810baf2d] ? free_debug_processing+0x2d/0x250 [ 543.040808] [810bb27b] ? __slab_free+0x12b/0x140 [ 543.040862] [810bbe99] ? kmem_cache_free+0x99/0xa0 [ 543.040915] [8133045b] ? dst_destroy+0x8b/0x110 [ 543.040967] [813307f6] ? dst_gc_task+0x196/0x1f0 [ 543.041021] [8104e954] ? queue_delayed_work_on+0x154/0x160 [ 543.041081] [813066fe] ? do_dbs_timer+0x20e/0x3d0 [ 543.041133] [81330660] ? dst_alloc+0x180/0x180 [ 543.041187] [8104f28b] ? process_one_work+0xfb/0x3b0 [ 543.041242] [8104f964] ? worker_thread+0x144/0x3d0 [ 543.041296] [8102cc10] ? __wake_up_common+0x50/0x80 [ 543.041678] [8104f820] ? rescuer_thread+0x2e0/0x2e0 [ 543.041729] [8104f820] ? rescuer_thread+0x2e0/0x2e0 [ 543.041782] [81053436] ? kthread+0x96/0xa0 [ 543.041835] [813e1d14] ? kernel_thread_helper+0x4/0x10 [ 543.041890] [810533a0] ? kthread_worker_fn+0x120/0x120 [ 543.041944] [813e1d10] ? gs_change+0xb/0xb [ 543.041993] Padding 0x8803e4217f40: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.042718] Padding 0x8803e4217f50: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.043433] Padding 0x8803e4217f60: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.044155] Padding 0x8803e4217f70: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.044866] Padding 0x8803e4217f80: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.045590] Padding 0x8803e4217f90: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.046311] Padding 0x8803e4217fa0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.047034] Padding 0x8803e4217fb0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.047755] Padding 0x8803e4217fc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.048474] Padding 0x8803e4217fd0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.049203] Padding 0x8803e4217fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.049909] Padding 0x8803e4217ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 00 00 ZZ.. [ 543.050021] FIX ip_dst_cache: Restoring 0x8803e4217f40-0x8803e4217fff=0x5a [ 543.050021] Dropped -mm, Hugh and Andrea from CC as this does not appear to be mm or ksm related. I'll pare down the firewall and see if I can make it break easier with a smaller test set. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07.06.2011 05:33, Brad Campbell wrote: On 07/06/11 04:10, Bart De Schuymer wrote: Hi Brad, This has probably nothing to do with ebtables, so please rmmod in case it's loaded. A few questions I didn't directly see an answer to in the threads I scanned... I'm assuming you actually use the bridging firewall functionality. So, what iptables modules do you use? Can you reduce your iptables rules to a core that triggers the bug? Or does it get triggered even with an empty set of firewall rules? Are you using a stock .35 kernel or is it patched? Is this something I can trigger on a poor guy's laptop or does it require specialized hardware (I'm catching up on qemu/kvm...)? Not specialised hardware as such, I've just not been able to reproduce it outside of this specific operating scenario. The last similar problem we've had was related to the 32/64 bit compat code. Are you running 32 bit userspace on a 64 bit kernel? I can't trigger it with empty firewall rules as it relies on a DNAT to occur. If I try it directly to the internal IP address (as I have to without netfilter loaded) then of course nothing fails. It's a pain in the bum as a fault, but it's one I can easily reproduce as long as I use the same set of circumstances. I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it on that then I'll attempt to pare down the IPTABLES rules to a bare minimum. It is nothing to do with ebtables as I don't compile it. I'm not really sure about bridging firewall functionality. I just use a couple of hand coded bash scripts to set the tables up. From one of your previous mails: # CONFIG_BRIDGE_NF_EBTABLES is not set How about CONFIG_BRIDGE_NETFILTER? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
Le mardi 07 juin 2011 à 21:27 +0800, Brad Campbell a écrit : On 07/06/11 04:22, Eric Dumazet wrote: Could you please try latest linux-2.6 tree ? We fixed many networking bugs that could explain your crash. No good I'm afraid. [ 543.040056] = [ 543.040136] BUG ip_dst_cache: Padding overwritten. 0x8803e4217ffe-0x8803e4217fff [ 543.040194] Thats pretty strange : These are the last two bytes of a page, set to 0x (a 16 bit value) There is no way a dst field could actually sit on this location (its a padding), since a dst is a bit less than 256 bytes (0xe8), and each entry is aligned on a 64byte address. grep dst /proc/slabinfo ip_dst_cache 32823 62944256 322 : tunables00 0 : slabdata 1967 1967 0 sizeof(struct rtable)=0xe8 - [ 543.040198] [ 543.040298] INFO: Slab 0xea000d9e74d0 objects=25 used=25 fp=0x (null) flags=0x80004081 [ 543.040364] Pid: 4576, comm: kworker/1:2 Not tainted 3.0.0-rc2 #1 [ 543.040415] Call Trace: [ 543.040472] [810b9c1d] ? slab_err+0xad/0xd0 [ 543.040528] [8102e034] ? check_preempt_wakeup+0xa4/0x160 [ 543.040595] [810ba206] ? slab_pad_check+0x126/0x170 [ 543.040650] [8133045b] ? dst_destroy+0x8b/0x110 [ 543.040701] [810ba29a] ? check_slab+0x4a/0xc0 [ 543.040753] [810baf2d] ? free_debug_processing+0x2d/0x250 [ 543.040808] [810bb27b] ? __slab_free+0x12b/0x140 [ 543.040862] [810bbe99] ? kmem_cache_free+0x99/0xa0 [ 543.040915] [8133045b] ? dst_destroy+0x8b/0x110 [ 543.040967] [813307f6] ? dst_gc_task+0x196/0x1f0 [ 543.041021] [8104e954] ? queue_delayed_work_on+0x154/0x160 [ 543.041081] [813066fe] ? do_dbs_timer+0x20e/0x3d0 [ 543.041133] [81330660] ? dst_alloc+0x180/0x180 [ 543.041187] [8104f28b] ? process_one_work+0xfb/0x3b0 [ 543.041242] [8104f964] ? worker_thread+0x144/0x3d0 [ 543.041296] [8102cc10] ? __wake_up_common+0x50/0x80 [ 543.041678] [8104f820] ? rescuer_thread+0x2e0/0x2e0 [ 543.041729] [8104f820] ? rescuer_thread+0x2e0/0x2e0 [ 543.041782] [81053436] ? kthread+0x96/0xa0 [ 543.041835] [813e1d14] ? kernel_thread_helper+0x4/0x10 [ 543.041890] [810533a0] ? kthread_worker_fn+0x120/0x120 [ 543.041944] [813e1d10] ? gs_change+0xb/0xb [ 543.041993] Padding 0x8803e4217f40: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.042718] Padding 0x8803e4217f50: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.043433] Padding 0x8803e4217f60: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.044155] Padding 0x8803e4217f70: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.044866] Padding 0x8803e4217f80: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.045590] Padding 0x8803e4217f90: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.046311] Padding 0x8803e4217fa0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.047034] Padding 0x8803e4217fb0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.047755] Padding 0x8803e4217fc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.048474] Padding 0x8803e4217fd0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.049203] Padding 0x8803e4217fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 543.049909] Padding 0x8803e4217ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 00 00 ZZ.. [ 543.050021] FIX ip_dst_cache: Restoring 0x8803e4217f40-0x8803e4217fff=0x5a [ 543.050021] Dropped -mm, Hugh and Andrea from CC as this does not appear to be mm or ksm related. I'll pare down the firewall and see if I can make it break easier with a smaller test set. Hmm, not sure now :( Could you reproduce another bug please ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
* padmanabh ratnakar (pratnaka...@gmail.com) wrote: On Tue, Jun 7, 2011 at 4:04 AM, Chris Wright chr...@sous-sol.org wrote: * Alex Williamson (alex.william...@redhat.com) wrote: On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote: Hi, I am using linux kernel 2.6.39. I have a IBM x3650 M3 system. I have used following boot options - intel_iommu=on iommu=pt I was loading/unloading my NIC driver(be2net) with num_vfs=7. After some iterations I get following DMAR errors - Jun 4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0. Jun 4 03:50:20 rhel6 kernel: Do you have a strange power saving mode enabled? Jun 4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue Jun 4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2 Jun 4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2] fault addr 78077000 Jun 4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in context entry is clear I was trying to debug this. I dont understand iommu code much. The physical address belongs the printed PCI function and there should not have been an error. I am unable to see pci_dev(pdev) of VFs getting removed from si_domain-devices list(intel-iommu.c) when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs. Looks like issue happens when when freed pdev is allocated again and as it is already in list, required initializations dont happen. I dont know if my understanding is correct. Can anyone point me to what the issue may be? Yes, that's correct. The (now replaced) check identity_mapping() will succeed when the pci_dev is recycled (it's freed, but never removed from the list, this is an issue with passtrhough mode and device creation/desctruction). This false match happens w/ a brand new pci_dev which still has default 32bit DMA mask, so it is removed from pt domain. During removal domain_remove_one_dev_info() test that matches only on bus/devfn (now also segment) will match despite the fact that the info-pdev != pdev-dev.archdata.iommu. Then...Oops Typically devices are removed from the domain via drivers/pci/intel-iommu.c:device_notifier(), which is called as the device is unbound from the driver. However, this seems to get skipped when running in passthrough mode, so I'm not sure where that's supposed to occur. Does it happen w/o passthrough? I had tried without passthrough on RHEL 6.1 GA kernel. Was seeing hangs and panics. Will check if non passthrough mode works on latest kernel. If you blacklist the driver then a create/delete may do similar (haven't tested that idea). Also note that some intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update and see if anything is better there. Thanks, The change in identity_mapping() means we won't demote to 32-bit DMA (drop out of pt domain), so I don't think we'll see the same issue. For testing I had made a hack in 2.6.39 kernel which will prevent demoting to 32bit DMA mask and thereby prevent calling of domain_remove_one_dev_info() for the specific VF device I was using and it had worked. So as you said I may not hit the issue in latest kernel. Will try that. I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio scsi host draft specification, v3
Hi all, after some preliminary discussion on the QEMU mailing list, I present a draft specification for a virtio-based SCSI host (controller, HBA, you name it). The virtio SCSI host is the basis of an alternative storage stack for KVM. This stack would overcome several limitations of the current solution, virtio-blk: 1) scalability limitations: virtio-blk-over-PCI puts a strong upper limit on the number of devices that can be added to a guest. Common configurations have a limit of ~30 devices. While this can be worked around by implementing a PCI-to-PCI bridge, or by using multifunction virtio-blk devices, these solutions either have not been implemented yet, or introduce management restrictions. On the other hand, the SCSI architecture is well known for its scalability and virtio-scsi supports advanced feature such as multiqueueing. 2) limited flexibility: virtio-blk does not support all possible storage scenarios. For example, it does not allow SCSI passthrough or persistent reservations. In principle, virtio-scsi provides anything that the underlying SCSI target (be it physical storage, iSCSI or the in-kernel target) supports. 3) limited extensibility: over the time, many features have been added to virtio-blk. Each such change requires modifications to the virtio specification, to the guest drivers, and to the device model in the host. The virtio-scsi spec has been written to follow SAM conventions, and exposing new features to the guest will only require changes to the host's SCSI target implementation. Comments are welcome. Paolo --- 8 --- Virtio SCSI Host Device Spec The virtio SCSI host device groups together one or more simple virtual devices (ie. disk), and allows communicating to these devices using the SCSI protocol. An instance of the device represents a SCSI host with possibly many buses, targets and LUN attached. The virtio SCSI device services two kinds of requests: - command requests for a logical unit; - task management functions related to a logical unit, target or command. The device is also able to send out notifications about added and removed logical units. v1: First public version v2: Merged all virtqueues into one, removed separate TARGET fields v3: Added configuration information and reworked descriptor structure. Added back multiqueue on Avi's request, while still leaving TARGET fields out. Added dummy event and clarified some aspects of the event protocol. First version sent to a wider audience (linux-kernel and virtio lists). Configuration - Subsystem Device ID TBD Virtqueues 0:controlq 1:eventq 2..n:request queues Feature bits VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both read-only and write-only data buffers. Device configuration layout struct virtio_scsi_config { u32 num_queues; u32 event_info_size; u32 sense_size; u32 cdb_size; } num_queues is the total number of virtqueues exposed by the device. The driver is free to use only one request queue, or it can use more to achieve better performance. event_info_size is the maximum size that the device will fill for buffers that the driver places in the eventq. The driver should always put buffers at least of this size. sense_size is the maximum size of the sense data that the device will write. The default value is written by the device and will always be 96, but the driver can modify it. cdb_size is the maximum size of the CBD that the driver will write. The default value is written by the device and will always be 32, but the driver can likewise modify it. Device initialization - The initialization routine should first of all discover the device's virtqueues. The driver should then place at least a buffer in the eventq. Buffers returned by the device on the eventq may be referred to as events in the rest of the document. The driver can immediately issue requests (for example, INQUIRY or REPORT LUNS) or task management functions (for example, I_T RESET). Device operation: request queues The driver queues requests to an arbitrary request queue, and they are used by the device on that same queue. Requests have the following format: struct virtio_scsi_req_cmd { u8 lun[8]; u64 id; u8 task_attr; u8 prio; u8 crn; char cdb[cdb_size]; char dataout[]; u8 sense[sense_size]; u32 sense_len; u32 residual; u16 status_qualifier; u8 status; u8 response; char datain[]; }; /* command-specific response values */ #define VIRTIO_SCSI_S_OK 0 #define VIRTIO_SCSI_S_UNDERRUN1 #define VIRTIO_SCSI_S_ABORTED 2 #define
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote: I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We should figure out the matching DMAR unit directly from the ACPI table at ADD_DEVICE time, and store it in pdev-archdata.iommu. I saw patches which were going in that direction... -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07/06/11 21:30, Patrick McHardy wrote: On 07.06.2011 05:33, Brad Campbell wrote: On 07/06/11 04:10, Bart De Schuymer wrote: Hi Brad, This has probably nothing to do with ebtables, so please rmmod in case it's loaded. A few questions I didn't directly see an answer to in the threads I scanned... I'm assuming you actually use the bridging firewall functionality. So, what iptables modules do you use? Can you reduce your iptables rules to a core that triggers the bug? Or does it get triggered even with an empty set of firewall rules? Are you using a stock .35 kernel or is it patched? Is this something I can trigger on a poor guy's laptop or does it require specialized hardware (I'm catching up on qemu/kvm...)? Not specialised hardware as such, I've just not been able to reproduce it outside of this specific operating scenario. The last similar problem we've had was related to the 32/64 bit compat code. Are you running 32 bit userspace on a 64 bit kernel? No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel. Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is current git I can't trigger it with empty firewall rules as it relies on a DNAT to occur. If I try it directly to the internal IP address (as I have to without netfilter loaded) then of course nothing fails. It's a pain in the bum as a fault, but it's one I can easily reproduce as long as I use the same set of circumstances. I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it on that then I'll attempt to pare down the IPTABLES rules to a bare minimum. It is nothing to do with ebtables as I don't compile it. I'm not really sure about bridging firewall functionality. I just use a couple of hand coded bash scripts to set the tables up. From one of your previous mails: # CONFIG_BRIDGE_NF_EBTABLES is not set How about CONFIG_BRIDGE_NETFILTER? It was compiled in. With the following table set I was able to reproduce the problem on 3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified root@srv:~# iptables-save # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *filter :INPUT ACCEPT [978:107619] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1659:291870] -A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT -A INPUT -i ppp0 -j DROP COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *nat :PREROUTING ACCEPT [813:49170] :INPUT ACCEPT [91:7090] :OUTPUT ACCEPT [267:20731] :POSTROUTING ACCEPT [296:22281] -A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443 -j DNAT --to-destination 192.168.253.198 COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *mangle :PREROUTING ACCEPT [2729:274392] :INPUT ACCEPT [2508:262976] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1674:293701] :POSTROUTING ACCEPT [2131:346411] -A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu COMMIT # Completed on Tue Jun 7 22:11:30 2011 I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
* David Woodhouse (dw...@infradead.org) wrote: On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote: I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We should figure out the matching DMAR unit directly from the ACPI table at ADD_DEVICE time, and store it in pdev-archdata.iommu. I saw patches which were going in that direction... Cool, where are they? I'm working on something similar, and missed them. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools, ui: Add simple keyboard support to SDL UI
This patch wires up hw/i8042.c to the SDL UI for simple guest keyboard support. Cc: Cyrill Gorcunov gorcu...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: John Floren j...@jfloren.net Cc: Sasha Levin levinsasha...@gmail.com Signed-off-by: Pekka Enberg penb...@kernel.org --- tools/kvm/kvm-run.c |1 + tools/kvm/ui/sdl.c | 76 +++ 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index 8398287..b688ef7 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -643,6 +643,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) } if (sdl) { + kbd__init(kvm); if (fb) sdl__init(fb); } diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c index bc69ed9..878df1d 100644 --- a/tools/kvm/ui/sdl.c +++ b/tools/kvm/ui/sdl.c @@ -1,6 +1,7 @@ #include kvm/sdl.h #include kvm/framebuffer.h +#include kvm/i8042.h #include kvm/util.h #include SDL/SDL.h @@ -13,6 +14,63 @@ static void sdl__write(struct framebuffer *fb, u64 addr, u8 *data, u32 len) memcpy(fb-mem[addr - fb-mem_addr], data, len); } +static u8 keymap[255] = { + [10]= 0x16, /* 1 */ + [11]= 0x1e, /* 2 */ + [12]= 0x26, /* 3 */ + [13]= 0x25, /* 4 */ + [14]= 0x27, /* 5 */ + [15]= 0x36, /* 6 */ + [16]= 0x3d, /* 7 */ + [17]= 0x3e, /* 8 */ + [18]= 0x46, /* 9 */ + [19]= 0x45, /* 9 */ + + [22]= 0x66, /* backspace */ + + [24]= 0x15, /* q */ + [25]= 0x1d, /* w */ + [26]= 0x24, /* e */ + [27]= 0x2d, /* r */ + [28]= 0x2c, /* t */ + [29]= 0x35, /* y */ + [30]= 0x3c, /* u */ + [31]= 0x43, /* i */ + [32]= 0x44, /* o */ + [33]= 0x4d, /* p */ + + [36]= 0x5a, /* enter */ + + [38]= 0x1c, /* a */ + [39]= 0x1b, /* s */ + [40]= 0x23, /* d */ + [41]= 0x2b, /* f */ + [42]= 0x34, /* g */ + [43]= 0x33, /* h */ + [44]= 0x3b, /* j */ + [45]= 0x42, /* k */ + [46]= 0x4b, /* l */ + + [50]= 0x12, /* left shift */ + + [52]= 0x1a, /* z */ + [53]= 0x22, /* x */ + [54]= 0x21, /* c */ + [55]= 0x2a, /* v */ + [56]= 0x32, /* b */ + [57]= 0x31, /* n */ + [58]= 0x3a, /* m */ + + [61]= 0x4e, /* - */ + [62]= 0x59, /* right shift */ + [65]= 0x29, /* space */ +}; + +static u8 to_code(u8 scancode) +{ + return keymap[scancode]; +} + static void *sdl__thread(void *p) { Uint32 rmask, gmask, bmask, amask; @@ -43,12 +101,30 @@ static void *sdl__thread(void *p) for (;;) { SDL_BlitSurface(guest_screen, NULL, screen, NULL); SDL_UpdateRect(screen, 0, 0, 0, 0); + while (SDL_PollEvent(ev)) { switch (ev.type) { + case SDL_KEYDOWN: { + u8 code = to_code(ev.key.keysym.scancode); + if (code) + kbd_queue(code); + else + pr_warning(key '%d' not found in keymap, ev.key.keysym.scancode); + break; + } + case SDL_KEYUP: { + u8 code = to_code(ev.key.keysym.scancode); + if (code) { + kbd_queue(0xf0); + kbd_queue(code); + } + break; + } case SDL_QUIT: goto exit; } } + SDL_Delay(1000 / FRAME_RATE); } exit: -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote: * David Woodhouse (dw...@infradead.org) wrote: On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote: I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We should figure out the matching DMAR unit directly from the ACPI table at ADD_DEVICE time, and store it in pdev-archdata.iommu. I saw patches which were going in that direction... Cool, where are they? I'm working on something similar, and missed them. [PATCH] pci, dmar: Update dmar units devices list during hotplug Alex was working on it. -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07.06.2011 16:40, Brad Campbell wrote: On 07/06/11 21:30, Patrick McHardy wrote: On 07.06.2011 05:33, Brad Campbell wrote: On 07/06/11 04:10, Bart De Schuymer wrote: Hi Brad, This has probably nothing to do with ebtables, so please rmmod in case it's loaded. A few questions I didn't directly see an answer to in the threads I scanned... I'm assuming you actually use the bridging firewall functionality. So, what iptables modules do you use? Can you reduce your iptables rules to a core that triggers the bug? Or does it get triggered even with an empty set of firewall rules? Are you using a stock .35 kernel or is it patched? Is this something I can trigger on a poor guy's laptop or does it require specialized hardware (I'm catching up on qemu/kvm...)? Not specialised hardware as such, I've just not been able to reproduce it outside of this specific operating scenario. The last similar problem we've had was related to the 32/64 bit compat code. Are you running 32 bit userspace on a 64 bit kernel? No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel. Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is current git I can't trigger it with empty firewall rules as it relies on a DNAT to occur. If I try it directly to the internal IP address (as I have to without netfilter loaded) then of course nothing fails. It's a pain in the bum as a fault, but it's one I can easily reproduce as long as I use the same set of circumstances. I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it on that then I'll attempt to pare down the IPTABLES rules to a bare minimum. It is nothing to do with ebtables as I don't compile it. I'm not really sure about bridging firewall functionality. I just use a couple of hand coded bash scripts to set the tables up. From one of your previous mails: # CONFIG_BRIDGE_NF_EBTABLES is not set How about CONFIG_BRIDGE_NETFILTER? It was compiled in. With the following table set I was able to reproduce the problem on 3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified Which kernel was the last version without this problem? root@srv:~# iptables-save # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *filter :INPUT ACCEPT [978:107619] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1659:291870] -A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT -A INPUT -i ppp0 -j DROP COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *nat :PREROUTING ACCEPT [813:49170] :INPUT ACCEPT [91:7090] :OUTPUT ACCEPT [267:20731] :POSTROUTING ACCEPT [296:22281] -A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443 -j DNAT --to-destination 192.168.253.198 COMMIT # Completed on Tue Jun 7 22:11:30 2011 # Generated by iptables-save v1.4.10 on Tue Jun 7 22:11:30 2011 *mangle :PREROUTING ACCEPT [2729:274392] :INPUT ACCEPT [2508:262976] :FORWARD ACCEPT [142:7068] :OUTPUT ACCEPT [1674:293701] :POSTROUTING ACCEPT [2131:346411] -A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu COMMIT # Completed on Tue Jun 7 22:11:30 2011 The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
* David Woodhouse (dw...@infradead.org) wrote: On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote: * David Woodhouse (dw...@infradead.org) wrote: On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote: I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We should figure out the matching DMAR unit directly from the ACPI table at ADD_DEVICE time, and store it in pdev-archdata.iommu. I saw patches which were going in that direction... Cool, where are they? I'm working on something similar, and missed them. [PATCH] pci, dmar: Update dmar units devices list during hotplug Oh yeah, thanks for the reminder. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeing DMAR errors after multiple load/unload with SR-IOV
On Tue, 2011-06-07 at 16:33 +0100, David Woodhouse wrote: On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote: * David Woodhouse (dw...@infradead.org) wrote: On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote: I think we still leak the list entry though. Bottom line is that we need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications. We happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We should figure out the matching DMAR unit directly from the ACPI table at ADD_DEVICE time, and store it in pdev-archdata.iommu. I saw patches which were going in that direction... Cool, where are they? I'm working on something similar, and missed them. [PATCH] pci, dmar: Update dmar units devices list during hotplug Alex was working on it. Nope, I had a wip patch that did an on-the-fly lookup, that I handed off to Yinghai, but it didn't actually work. That's when the suggestion was made to do it at hotplug, but I'm not pursuing that right now, maybe Yinghai is? Thanks, Alex Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC 4/4] Revert virtio: make add_buf return capacity remaining:
On Thu, Jun 02, 2011 at 06:43:25PM +0300, Michael S. Tsirkin wrote: This reverts commit 3c1b27d5043086a485f8526353ae9fe37bfa1065. The only user was virtio_net, and it switched to min_capacity instead. Signed-off-by: Michael S. Tsirkin m...@redhat.com It turns out another place in virtio_net: receive buf processing - relies on the old behaviour: try_fill_recv: do { if (vi-mergeable_rx_bufs) err = add_recvbuf_mergeable(vi, gfp); else if (vi-big_packets) err = add_recvbuf_big(vi, gfp); else err = add_recvbuf_small(vi, gfp); oom = err == -ENOMEM; if (err 0) break; ++vi-num; } while (err 0); The point is to avoid allocating a buf if the ring is out of space and we are sure add_buf will fail. It works well for mergeable buffers and for big packets if we are not OOM. small packets and oom will do extra get_page/put_page calls (but maybe we don't care). So this is RX, I intend to drop it from this patchset and focus on the TX side for starters. --- drivers/virtio/virtio_ring.c |2 +- include/linux/virtio.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 23422f1..a6c21eb 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -233,7 +233,7 @@ add_head: pr_debug(Added buffer head %i to %p\n, head, vq); END_USE(vq); - return vq-num_free; + return 0; } EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 209220d..63c4908 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -34,7 +34,7 @@ struct virtqueue { * in_num: the number of sg which are writable (after readable ones) * data: the token identifying the buffer. * gfp: how to do memory allocations (if necessary). - * Returns remaining capacity of queue (sg segments) or a negative error. + * Returns 0 on success or a negative error. * virtqueue_kick: update after add_buf * vq: the struct virtqueue * After one or more add_buf calls, invoke this to kick the other side. -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC 3/4] virtio_net: limit xmit polling
On Thu, Jun 02, 2011 at 06:43:17PM +0300, Michael S. Tsirkin wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Free up just enough to both clean up all buffers eventually and to be able to xmit the next packet. Signed-off-by: Michael S. Tsirkin m...@redhat.com I've been testing this patch and it seems to work fine so far. The following fixups are needed to make it build though: diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b25db1c..77cdf34 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -529,11 +529,8 @@ static bool free_old_xmit_skb(struct virtnet_info *vi) * virtqueue_add_buf will succeed. */ static bool free_xmit_capacity(struct virtnet_info *vi) { - struct sk_buff *skb; - unsigned int len; - while (virtqueue_min_capacity(vi-svq) MAX_SKB_FRAGS + 2) - if (unlikely(!free_old_xmit_skb)) + if (unlikely(!free_old_xmit_skb(vi))) return false; return true; } @@ -628,7 +625,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) * Doing this after kick means there's a chance we'll free * the skb we have just sent, which is hot in cache. */ for (i = 0; i 2; i++) - free_old_xmit_skb(v); + free_old_xmit_skb(vi); if (likely(free_xmit_capacity(vi))) return NETDEV_TX_OK; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC 0/4] virtio and vhost-net capacity handling
On Thu, Jun 02, 2011 at 06:42:35PM +0300, Michael S. Tsirkin wrote: OK, here's a new attempt to use the new capacity api. I also added more comments to clarify the logic. Hope this is more readable. Let me know pls. This is on top of the patches applied by Rusty. Warning: untested. Posting now to give people chance to comment on the API. OK, this seems to have survived some testing so far, after I dropped patch 4 and fixed build for patch 3 (build fixup patch sent in reply to the original). I'll be mostly offline until Sunday, would appreciate testing reports. git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git virtio-net-xmit-polling-v2 git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git virtio-net-event-idx-v3 Thanks! Changes from v1: - fix comment in patch 2 to correct confusion noted by Rusty - rewrite patch 3 along the lines suggested by Rusty note: it's not exactly the same but I hope it's close enough, the main difference is that mine does limited polling even in the unlikely xmit failure case. - added a patch to not return capacity from add_buf it always looked like a weird hack Michael S. Tsirkin (4): virtio_ring: add capacity check API virtio_net: fix tx capacity checks using new API virtio_net: limit xmit polling Revert virtio: make add_buf return capacity remaining: drivers/net/virtio_net.c | 111 ++ drivers/virtio/virtio_ring.c | 10 +++- include/linux/virtio.h |7 ++- 3 files changed, 84 insertions(+), 44 deletions(-) -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
Op 7/06/2011 16:40, Brad Campbell schreef: On 07/06/11 21:30, Patrick McHardy wrote: On 07.06.2011 05:33, Brad Campbell wrote: On 07/06/11 04:10, Bart De Schuymer wrote: Hi Brad, This has probably nothing to do with ebtables, so please rmmod in case it's loaded. A few questions I didn't directly see an answer to in the threads I scanned... I'm assuming you actually use the bridging firewall functionality. So, what iptables modules do you use? Can you reduce your iptables rules to a core that triggers the bug? Or does it get triggered even with an empty set of firewall rules? Are you using a stock .35 kernel or is it patched? Is this something I can trigger on a poor guy's laptop or does it require specialized hardware (I'm catching up on qemu/kvm...)? Not specialised hardware as such, I've just not been able to reproduce it outside of this specific operating scenario. The last similar problem we've had was related to the 32/64 bit compat code. Are you running 32 bit userspace on a 64 bit kernel? No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel. Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is current git If the bug is easily triggered with your guest os, then you could try to capture the traffic with wireshark (or something else) in a configuration that doesn't crash your system. Save the traffic in a pcap file. Then you can see if resending that traffic in the vulnerable configuration triggers the bug (I don't know if something in Windows exists, but tcpreplay should work for Linux). Once you have such a capture , chances are the bug is even easily reproducible by us (unless it's hardware-specific). Success isn't guaranteed, but I think it's worth a shot... cheers, Bart -- Bart De Schuymer www.artinalgorithms.be -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? Something in the kernel does u16 *ptr = addr (given by kmalloc()) ptr[-1] = 0; Could be an off-one error in a memmove()/memcopy() or loop... I cant see a network issue here. I checked arch/x86/lib/memmove_64.S and it seems fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it
On Tue, 2011-06-07 at 10:14 +0200, Jan Kiszka wrote: On 2011-06-07 10:06, Avi Kivity wrote: On 06/07/2011 01:04 AM, Jan Kiszka wrote: On 2011-06-06 23:48, Alex Williamson wrote: On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com At least kernels 2.6.38 and 2.6.39 do not properly support issuing a reset on an assigned device and corrupt its config space. Prevent this by checking for a host kernel with the required support, tagged by the to-be-introduced KVM_CAP_DEVICE_RESET. Wouldn't it be easier just to revert ed78661f in 2.6.39 stable? I guess we don't have an option to do that for .38 since stable is done there, but there are also some intel-iommu breakages that won't make stable for that release. It seems like the userspace invoked reset resolves known, demonstrable issues of devices continuing to DMA into guest memory while ed78661f is mostly a theoretical change. Easier would be this patch. But I don't mind reverting the problematic commit in 39, whatever is preferred. We should just resolve the issue finally. Kernel problems should be solved in the kernel (with exceptions of course, but don't see the need here). Then please file a revert for stable ASAP. How's this? For stable only or course. Thanks, Alex Revert KVM: Save/restore state of assigned PCI device From: Alex Williamson alex.william...@redhat.com This reverts ed78661f2614d3c9f69c23e280db3bafdabdf5bb as it assumes the saved PCI state will remain valid for the entire length of time that it is attached to a guest. This fails when userspace makes use of the pci-sysfs reset interface, which invalidates the saved device state, leaving nothing to be restored after the device is reset on de-assignment. This leaves the device in an unusable state. 3.0.0 will add an interface for KVM to save the PCI state in a buffer unaffected by other callers of pci_reset_function(), but the most appropriate stable fix seems to be reverting this change since the original assumption about the device saved state persisting is incorrect. Signed-off-by: Alex Williamson alex.william...@redhat.com --- virt/kvm/assigned-dev.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index ae72ae6..e3f1235 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -197,8 +197,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, { kvm_free_assigned_irq(kvm, assigned_dev); - __pci_reset_function(assigned_dev-dev); - pci_restore_state(assigned_dev-dev); + pci_reset_function(assigned_dev-dev); pci_release_regions(assigned_dev-dev); pci_disable_device(assigned_dev-dev); @@ -515,7 +514,6 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, } pci_reset_function(dev); - pci_save_state(dev); match-assigned_dev_id = assigned_dev-assigned_dev_id; match-host_segnr = assigned_dev-segnr; @@ -546,7 +544,6 @@ out: mutex_unlock(kvm-lock); return r; out_list_del: - pci_restore_state(dev); list_del(match-list); pci_release_regions(dev); out_disable: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3] Add an isa device for SGA
On 05/16/2011 01:45 PM, Glauber Costa wrote: This patch adds a dummy legacy ISA device whose responsibility is to deploy sgabios, an option rom for a serial graphics adapter. The proposal is that this device is always-on when -nographics, but can otherwise be enable in any setup when -device sga is used. [v2: suggestions on qdev by Markus ] [v3: cleanups and documentation, per list suggestions ] Signed-off-by: Glauber Costaglom...@redhat.com Applied. But I'd like to figure out what to do about sgabios.bin. I think we should ship a copy. Regards, Anthony Liguori --- Makefile.target |2 +- hw/pc.c |9 hw/sga.c| 56 +++ 3 files changed, 66 insertions(+), 1 deletions(-) create mode 100644 hw/sga.c diff --git a/Makefile.target b/Makefile.target index fdbdc6c..004ea7e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -224,7 +224,7 @@ obj-$(CONFIG_KVM) += ivshmem.o # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o -obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o +obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += extboot.o diff --git a/hw/pc.c b/hw/pc.c index 8d351ba..5a8e00a 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1096,6 +1096,15 @@ void pc_vga_init(PCIBus *pci_bus) isa_vga_init(); } } + +/* + * sga does not suppress normal vga output. So a machine can have both a + * vga card and sga manually enabled. Output will be seen on both. + * For nographic case, sga is enabled at all times + */ +if (display_type == DT_NOGRAPHIC) { +isa_create_simple(sga); +} } static void cpu_request_exit(void *opaque, int irq, int level) diff --git a/hw/sga.c b/hw/sga.c new file mode 100644 index 000..7ef750a --- /dev/null +++ b/hw/sga.c @@ -0,0 +1,56 @@ +/* + * QEMU dummy ISA device for loading sgabios option rom. + * + * Copyright (c) 2011 Glauber Costa, Red Hat Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + * + * sgabios code originally available at code.google.com/p/sgabios + * + */ +#include pci.h +#include pc.h +#include loader.h +#include sysemu.h + +#define SGABIOS_FILENAME sgabios.bin + +typedef struct ISAGAState { +ISADevice dev; +} ISASGAState; + +static int isa_cirrus_vga_initfn(ISADevice *dev) +{ +rom_add_vga(SGABIOS_FILENAME); +return 0; +} + +static ISADeviceInfo sga_info = { +.qdev.name= sga, +.qdev.desc= Serial Graphics Adapter, +.qdev.size= sizeof(ISASGAState), +.init = isa_cirrus_vga_initfn, +}; + +static void sga_register(void) +{ + isa_qdev_register(sga_info); +} + +device_init(sga_register); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/5] kvm tools: Get rid of spaces in ld script
Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/bios/rom.ld.S |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.git/tools/kvm/bios/rom.ld.S === --- linux-2.6.git.orig/tools/kvm/bios/rom.ld.S +++ linux-2.6.git/tools/kvm/bios/rom.ld.S @@ -11,7 +11,7 @@ PHDRS { } SECTIONS { - . = 0; - .text : { *(.text) } :text = 0x9090 + . = 0; + .text : { *(.text) } :text = 0x9090 } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/5] kvm tools: Reform bios make fules
Put bios code into bios.s and adjust makefile rules accordingly. It's more natural than bios-rom.S (which is now simply a container over real bios code). Also improve bios deps in Makefile. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/Makefile| 29 +++- tools/kvm/bios/bios-rom.S | 95 +++--- tools/kvm/bios/bios.S | 95 ++ tools/kvm/bios/gen-offsets.sh |3 - 4 files changed, 115 insertions(+), 107 deletions(-) Index: linux-2.6.git/tools/kvm/Makefile === --- linux-2.6.git.orig/tools/kvm/Makefile +++ linux-2.6.git/tools/kvm/Makefile @@ -82,7 +82,7 @@ DEPS := $(patsubst %.o,%.d,$(OBJS)) # Exclude BIOS object files from header dependencies. OBJS += bios.o -OBJS += bios/bios.o +OBJS += bios/bios-rom.o LIBS += -lrt LIBS += -lpthread @@ -165,20 +165,27 @@ BIOS_CFLAGS += -m32 BIOS_CFLAGS += -march=i386 BIOS_CFLAGS += -mregparm=3 -bios.o: bios/bios-rom.bin -bios/bios.o: bios/bios.S bios/bios-rom.bin - $(E) CC $@ - $(Q) $(CC) -c $(CFLAGS) bios/bios.S -o bios/bios.o - -bios/bios-rom.bin: bios/bios-rom.S bios/e820.c - $(E) CC $@ +bios.o: bios/bios.bin bios/bios-rom.h + +bios/bios.bin.elf: bios/bios.S bios/e820.c bios/int10.c bios/rom.ld.S + $(E) CC bios/e820.o $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/e820.c -o bios/e820.o + $(E) CC bios/int10.o $(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/int10.c -o bios/int10.o - $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios-rom.S -o bios/bios-rom.o + $(E) CC bios/bios.o + $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios.S -o bios/bios.o $(E) LD $@ - $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o bios/e820.o bios/int10.o + $(Q) ld -T bios/rom.ld.S -o bios/bios.bin.elf bios/bios.o bios/e820.o bios/int10.o + +bios/bios.bin: bios/bios.bin.elf $(E) OBJCOPY $@ - $(Q) objcopy -O binary -j .text bios/bios-rom.bin.elf bios/bios-rom.bin + $(Q) objcopy -O binary -j .text bios/bios.bin.elf bios/bios.bin + +bios/bios-rom.o: bios/bios-rom.S bios/bios.bin bios/bios-rom.h + $(E) CC $@ + $(Q) $(CC) -c $(CFLAGS) bios/bios-rom.S -o bios/bios-rom.o + +bios/bios-rom.h: bios/bios.bin.elf $(E) NM $@ $(Q) cd bios sh gen-offsets.sh bios-rom.h cd .. Index: linux-2.6.git/tools/kvm/bios/bios-rom.S === --- linux-2.6.git.orig/tools/kvm/bios/bios-rom.S +++ linux-2.6.git/tools/kvm/bios/bios-rom.S @@ -1,89 +1,12 @@ -/* - * Our pretty trivial BIOS emulation - */ - -#include kvm/bios.h #include kvm/assembly.h .org 0 - .code16gcc - -#include macro.S - -/* - * fake interrupt handler, nothing can be faster ever - */ -ENTRY(bios_intfake) - IRET -ENTRY_END(bios_intfake) - -/* - * int 10 - video - service - */ -ENTRY(bios_int10) - pushw %fs - pushl %es - pushl %edi - pushl %esi - pushl %ebp - pushl %esp - pushl %edx - pushl %ecx - pushl %ebx - pushl %eax - - movl%esp, %eax - /* this is way easier than doing it in assembly */ - /* just push all the regs and jump to a C handler */ - callint10_handler - - popl%eax - popl%ebx - popl%ecx - popl%edx - popl%esp - popl%ebp - popl%esi - popl%edi - popl%es - popw%fs - - IRET -ENTRY_END(bios_int10) - -#define EFLAGS_CF (1 0) - -ENTRY(bios_int15) - cmp $0xE820, %eax - jne 1f - - pushw %fs - - pushl %edx - pushl %ecx - pushl %edi - pushl %ebx - pushl %eax - - movl%esp, %eax # it's bioscall case - calle820_query_map - - popl%eax - popl%ebx - popl%edi - popl%ecx - popl%edx - - popw%fs - - /* Clear CF */ - andl$~EFLAGS_CF, 0x4(%esp) -1: - IRET -ENTRY_END(bios_int15) - -GLOBAL(__locals) - -#include local.S - -END(__locals) +#ifdef CONFIG_X86_64 + .code64 +#else + .code32 +#endif + +GLOBAL(bios_rom) + .incbin bios/bios.bin +END(bios_rom) Index: linux-2.6.git/tools/kvm/bios/bios.S === --- linux-2.6.git.orig/tools/kvm/bios/bios.S +++ linux-2.6.git/tools/kvm/bios/bios.S @@ -1,12 +1,89 @@ +/* + * Our pretty trivial BIOS emulation + */ + +#include kvm/bios.h #include kvm/assembly.h .org 0 -#ifdef CONFIG_X86_64 - .code64 -#else - .code32 -#endif - -GLOBAL(bios_rom) - .incbin bios/bios-rom.bin
[patch 1/5] kvm tools: Options parser to handle hex numbers
Some kernel parameters are convenient if passed in hex form so our options parser should handle even such form of input. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/util/parse-options.c | 102 - 1 file changed, 82 insertions(+), 20 deletions(-) Index: linux-2.6.git/tools/kvm/util/parse-options.c === --- linux-2.6.git.orig/tools/kvm/util/parse-options.c +++ linux-2.6.git/tools/kvm/util/parse-options.c @@ -39,6 +39,84 @@ static int get_arg(struct parse_opt_ctx_ return 0; } +#define numvalue(c)\ + ((c) = 'a' ? (c) - 'a' + 10 : \ +(c) = 'A' ? (c) - 'A' + 10 : (c) - '0') + +static u64 readhex(const char *str, bool *error) +{ + char *pos = strchr(str, 'x') + 1; + u64 res = 0; + + while (*pos) { + unsigned int v = numvalue(*pos); + if (v 16) { + *error = true; + return 0; + } + + res = (res * 16) + v; + pos++; + } + + *error = false; + return res; +} + +static int readnum(const struct option *opt, int flags, + const char *str, char **end) +{ + if (strchr(str, 'x')) { + bool error; + u64 value; + + value = readhex(str, error); + if (error) + goto enotnum; + + switch (opt-type) { + case OPTION_INTEGER: + *(int *)opt-value = value; + break; + case OPTION_UINTEGER: + *(unsigned int *)opt-value = value; + break; + case OPTION_LONG: + *(long *)opt-value = value; + break; + case OPTION_U64: + *(u64 *)opt-value = value; + break; + default: + goto invcall; + } + } else { + switch (opt-type) { + case OPTION_INTEGER: + *(int *)opt-value = strtol(str, end, 10); + break; + case OPTION_UINTEGER: + *(unsigned int *)opt-value = strtol(str, end, 10); + break; + case OPTION_LONG: + *(long *)opt-value = strtol(str, end, 10); + break; + case OPTION_U64: + *(u64 *)opt-value = strtoull(str, end, 10); + break; + default: + goto invcall; + } + } + + return 0; + +enotnum: + return opterror(opt, expects a numerical value, flags); +invcall: + return opterror(opt, invalid numeric conversion, flags); +} + static int get_value(struct parse_opt_ctx_t *p, const struct option *opt, int flags) { @@ -131,11 +209,7 @@ static int get_value(struct parse_opt_ct } if (get_arg(p, opt, flags, arg)) return -1; - *(int *)opt-value = strtol(arg, (char **)s, 10); - if (*s) - return opterror(opt, expects a numerical value, - flags); - return 0; + return readnum(opt, flags, arg, (char **)s); case OPTION_UINTEGER: if (unset) { @@ -148,11 +222,7 @@ static int get_value(struct parse_opt_ct } if (get_arg(p, opt, flags, arg)) return -1; - *(unsigned int *)opt-value = strtol(arg, (char **)s, 10); - if (*s) - return opterror(opt, - expects a numerical value, flags); - return 0; + return readnum(opt, flags, arg, (char **)s); case OPTION_LONG: if (unset) { @@ -165,11 +235,7 @@ static int get_value(struct parse_opt_ct } if (get_arg(p, opt, flags, arg)) return -1; - *(long *)opt-value = strtol(arg, (char **)s, 10); - if (*s) - return opterror(opt, - expects a numerical value, flags); - return 0; + return readnum(opt, flags, arg, (char **)s); case OPTION_U64: if (unset) { @@ -182,11 +248,7 @@ static int get_value(struct parse_opt_ct } if (get_arg(p, opt, flags, arg)) return -1; - *(u64 *)opt-value = strtoull(arg, (char **)s, 10); - if (*s) - return opterror(opt, -
[patch 3/5] kvm tools: Delete dangling cursor from int10
Noone use it anymore. Also cleanup comment on int10 as well, int10_handler routine do all the hard work. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/bios/bios-rom.S | 14 +- 1 file changed, 1 insertion(+), 13 deletions(-) Index: linux-2.6.git/tools/kvm/bios/bios-rom.S === --- linux-2.6.git.orig/tools/kvm/bios/bios-rom.S +++ linux-2.6.git/tools/kvm/bios/bios-rom.S @@ -18,13 +18,7 @@ ENTRY(bios_intfake) ENTRY_END(bios_intfake) /* - * int 10 - video - write character and advance cursor (tty write) - * ah = 0eh - * al = character - * bh = display page (alpha modes) - * bl = foreground color (graphics modes) - * - * We ignore bx settings + * int 10 - video - service */ ENTRY(bios_int10) pushw %fs @@ -55,12 +49,6 @@ ENTRY(bios_int10) popw%fs IRET - - -/* - * private IRQ data - */ -cursor:.long 0 ENTRY_END(bios_int10) #define EFLAGS_CF (1 0) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/5] kvm tools: Introduce vidmode parmeter
Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/kvm-run.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) Index: linux-2.6.git/tools/kvm/kvm-run.c === --- linux-2.6.git.orig/tools/kvm/kvm-run.c +++ linux-2.6.git/tools/kvm/kvm-run.c @@ -80,6 +80,7 @@ extern int active_console; bool do_debug_print = false; static int nrcpus; +static int vidmode = 0x312; static const char * const run_usage[] = { kvm run [options] [kernel image], @@ -139,6 +140,10 @@ static const struct option options[] = { OPT_STRING('\0', tapscript, script, Script path, Assign a script to process created tap device), + OPT_GROUP(BIOS options:), + OPT_INTEGER('\0', vidmode, vidmode, + Video mode), + OPT_GROUP(Debug options:), OPT_BOOLEAN('\0', debug, do_debug_print, Enable debug messages), @@ -434,7 +439,6 @@ int kvm_cmd_run(int argc, const char **a struct framebuffer *fb = NULL; unsigned int nr_online_cpus; int exit_code = 0; - u16 vidmode = 0; int max_cpus; char *hi; int i; @@ -541,12 +545,10 @@ int kvm_cmd_run(int argc, const char **a memset(real_cmdline, 0, sizeof(real_cmdline)); strcpy(real_cmdline, notsc noapic noacpi pci=conf1); - if (vnc || sdl) { + if (vnc || sdl) strcat(real_cmdline, video=vesafb console=tty0); - vidmode = 0x312; - } else { + else strcat(real_cmdline, console=ttyS0 earlyprintk=serial); - } strcat(real_cmdline, ); if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/5] kvm tools: A few fixes
Nothing serious, please review. Thanks. Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] kvm tools: Introduce vidmode parmeter
On Tue, 7 Jun 2011, Cyrill Gorcunov wrote: Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com This patch causes 'make check' to go crazy and print out bunch of these: Warning: Ignoring MMIO write at d0031f40 (length 4) Warning: Ignoring MMIO write at d0031f44 (length 4) Warning: Ignoring MMIO write at d0031f48 (length 4) Warning: Ignoring MMIO write at d0031f4c (length 4) Warning: Ignoring MMIO write at d0031f50 (length 4) Warning: Ignoring MMIO write at d0031f54 (length 4) Warning: Ignoring MMIO write at d0031f58 (length 4) Warning: Ignoring MMIO write at d0031f5c (length 4) Warning: Ignoring MMIO write at d0031f60 (length 4) Warning: Ignoring MMIO write at d0031f64 (length 4) Warning: Ignoring MMIO write at d0031f68 (length 4) Warning: Ignoring MMIO write at d0031f6c (length 4) Warning: Ignoring MMIO write at d0031f70 (length 4) Warning: Ignoring MMIO write at d0031f74 (length 4) Warning: Ignoring MMIO write at d0031f78 (length 4) Warning: Ignoring MMIO write at d0031f7c (length 4) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] kvm tools: Introduce vidmode parmeter
On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote: On Tue, 7 Jun 2011, Cyrill Gorcunov wrote: Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com This patch causes 'make check' to go crazy and print out bunch of these: Warning: Ignoring MMIO write at d0031f40 (length 4) Hmm, weird... Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] kvm tools: Introduce vidmode parmeter
On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote: On Tue, 7 Jun 2011, Cyrill Gorcunov wrote: Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com This patch causes 'make check' to go crazy and print out bunch of these: Pekka, are you sure it's because of _this_ particular patch? Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] kvm tools: Introduce vidmode parmeter
On Wed, Jun 08, 2011 at 12:10:30AM +0400, Cyrill Gorcunov wrote: On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote: On Tue, 7 Jun 2011, Cyrill Gorcunov wrote: Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com This patch causes 'make check' to go crazy and print out bunch of these: Pekka, are you sure it's because of _this_ particular patch? Cyrill This one should do the trick, cant say I like it, we probably need some default values from options parser, ie to extend it. Cyrill --- kvm tools: Introduce vidmode parmeter v2 Usually this might be set by loader but since we're the loader lets allow to specify vesa mode as well. v2: Pekka spotted the default value was being compromised, so revert it back and set only if specified. Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com --- tools/kvm/kvm-run.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) Index: linux-2.6.git/tools/kvm/kvm-run.c === --- linux-2.6.git.orig/tools/kvm/kvm-run.c +++ linux-2.6.git/tools/kvm/kvm-run.c @@ -80,6 +80,7 @@ extern int active_console; bool do_debug_print = false; static int nrcpus; +static int vidmode = -1; static const char * const run_usage[] = { kvm run [options] [kernel image], @@ -139,6 +140,10 @@ static const struct option options[] = { OPT_STRING('\0', tapscript, script, Script path, Assign a script to process created tap device), + OPT_GROUP(BIOS options:), + OPT_INTEGER('\0', vidmode, vidmode, + Video mode), + OPT_GROUP(Debug options:), OPT_BOOLEAN('\0', debug, do_debug_print, Enable debug messages), @@ -434,7 +439,6 @@ int kvm_cmd_run(int argc, const char **a struct framebuffer *fb = NULL; unsigned int nr_online_cpus; int exit_code = 0; - u16 vidmode = 0; int max_cpus; char *hi; int i; @@ -539,14 +543,22 @@ int kvm_cmd_run(int argc, const char **a kvm-nrcpus = nrcpus; + /* +* vidmode should be either specified +* either set by default +*/ + if (vnc || sdl) { + if (vidmode == -1) + vidmode = 0x312; + } else + vidmode = 0; + memset(real_cmdline, 0, sizeof(real_cmdline)); strcpy(real_cmdline, notsc noapic noacpi pci=conf1); if (vnc || sdl) { strcat(real_cmdline, video=vesafb console=tty0); - vidmode = 0x312; - } else { + } else strcat(real_cmdline, console=ttyS0 earlyprintk=serial); - } strcat(real_cmdline, ); if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf ag...@suse.de --- kernel/compat.c |1 + virt/kvm/kvm_main.c | 50 +- 2 files changed, 50 insertions(+), 1 deletions(-) diff --git a/kernel/compat.c b/kernel/compat.c index 9214dcd..506e176 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat) case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1]) 32 ); } } +EXPORT_SYMBOL_GPL(sigset_from_compat); asmlinkage long compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f78ddb8..f03db82 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -84,6 +84,8 @@ struct dentry *kvm_debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); +static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg); static int hardware_enable_all(void); static void hardware_disable_all(void); @@ -1585,7 +1587,9 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, - .compat_ioctl = kvm_vcpu_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = kvm_vcpu_compat_ioctl, +#endif .mmap = kvm_vcpu_mmap, .llseek = noop_llseek, }; @@ -1874,6 +1878,50 @@ out: return r; } +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = (void __user *)arg; + int r; + + if (vcpu-kvm-mm != current-mm) + return -EIO; + + switch (ioctl) { + case KVM_SET_SIGNAL_MASK: { + struct kvm_signal_mask __user *sigmask_arg = argp; + struct kvm_signal_mask kvm_sigmask; + compat_sigset_t csigset; + sigset_t sigset; + + if (argp) { + r = -EFAULT; + if (copy_from_user(kvm_sigmask, argp, + sizeof kvm_sigmask)) + goto out; + r = -EINVAL; + if (kvm_sigmask.len != sizeof csigset) + goto out; + r = -EFAULT; + if (copy_from_user(csigset, sigmask_arg-sigset, + sizeof csigset)) + goto out; + } + sigset_from_compat(sigset, csigset); + r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset); + break; + } + default: + r = kvm_vcpu_ioctl(filp, ioctl, arg); + } + +out: + return r; +} +#endif + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3] Add an isa device for SGA
On 06/07/2011 04:17 PM, Anthony Liguori wrote: On 05/16/2011 01:45 PM, Glauber Costa wrote: This patch adds a dummy legacy ISA device whose responsibility is to deploy sgabios, an option rom for a serial graphics adapter. The proposal is that this device is always-on when -nographics, but can otherwise be enable in any setup when -device sga is used. [v2: suggestions on qdev by Markus ] [v3: cleanups and documentation, per list suggestions ] Signed-off-by: Glauber Costaglom...@redhat.com Applied. But I'd like to figure out what to do about sgabios.bin. I think we should ship a copy. Agree. Regards, Anthony Liguori --- Makefile.target | 2 +- hw/pc.c | 9 hw/sga.c | 56 +++ 3 files changed, 66 insertions(+), 1 deletions(-) create mode 100644 hw/sga.c diff --git a/Makefile.target b/Makefile.target index fdbdc6c..004ea7e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -224,7 +224,7 @@ obj-$(CONFIG_KVM) += ivshmem.o # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o -obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o +obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += extboot.o diff --git a/hw/pc.c b/hw/pc.c index 8d351ba..5a8e00a 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1096,6 +1096,15 @@ void pc_vga_init(PCIBus *pci_bus) isa_vga_init(); } } + + /* + * sga does not suppress normal vga output. So a machine can have both a + * vga card and sga manually enabled. Output will be seen on both. + * For nographic case, sga is enabled at all times + */ + if (display_type == DT_NOGRAPHIC) { + isa_create_simple(sga); + } } static void cpu_request_exit(void *opaque, int irq, int level) diff --git a/hw/sga.c b/hw/sga.c new file mode 100644 index 000..7ef750a --- /dev/null +++ b/hw/sga.c @@ -0,0 +1,56 @@ +/* + * QEMU dummy ISA device for loading sgabios option rom. + * + * Copyright (c) 2011 Glauber Costa, Red Hat Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + * + * sgabios code originally available at code.google.com/p/sgabios + * + */ +#include pci.h +#include pc.h +#include loader.h +#include sysemu.h + +#define SGABIOS_FILENAME sgabios.bin + +typedef struct ISAGAState { + ISADevice dev; +} ISASGAState; + +static int isa_cirrus_vga_initfn(ISADevice *dev) +{ + rom_add_vga(SGABIOS_FILENAME); + return 0; +} + +static ISADeviceInfo sga_info = { + .qdev.name = sga, + .qdev.desc = Serial Graphics Adapter, + .qdev.size = sizeof(ISASGAState), + .init = isa_cirrus_vga_initfn, +}; + +static void sga_register(void) +{ + isa_qdev_register(sga_info); +} + +device_init(sga_register); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
On Tuesday 07 June 2011 22:25:15 Alexander Graf wrote: +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = (void __user *)arg; Converting a compat user argument into a pointer should use the compat_ptr() function to do the right thing on s390. Otherwise your patch looks good. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07.06.2011 20:31, Eric Dumazet wrote: Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? Something in the kernel does u16 *ptr = addr (given by kmalloc()) ptr[-1] = 0; Could be an off-one error in a memmove()/memcopy() or loop... I cant see a network issue here. So far me neither, but netfilter appears to trigger the bug. I checked arch/x86/lib/memmove_64.S and it seems fine. I was thinking it might be a missing skb_make_writable() combined with vhost_net specifics in the netfilter code (TCPMSS and NAT are both suspect), but was unable to find something. I also went through the dst_metrics() conversion to see whether anything could cause problems with the bridge fake_rttable, but also nothing so far. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 08/06/11 02:04, Bart De Schuymer wrote: If the bug is easily triggered with your guest os, then you could try to capture the traffic with wireshark (or something else) in a configuration that doesn't crash your system. Save the traffic in a pcap file. Then you can see if resending that traffic in the vulnerable configuration triggers the bug (I don't know if something in Windows exists, but tcpreplay should work for Linux). Once you have such a capture , chances are the bug is even easily reproducible by us (unless it's hardware-specific). Success isn't guaranteed, but I think it's worth a shot... The issue with this is I don't have a configuration that does not crash the system. This only happens under the specific circumstance that traffic from VM A is being DNAT'd to VM B. If I disable CONFIG_BRIDGE_NETFILTER, or I leave out the DNAT then I can't replicate the problem as I don't seem to be able to get the packets to go where I want them to go. Let me try and explain it a little more clearly with made up IP addresses to illustrate the problem. I have VM A (1.1.1.2) and VM B (1.1.1.3) on br1 (1.1.1.1) I have public IP on ppp0 (2.2.2.2). VM B can talk to VM A using its host address (1.1.1.2) and there is no problem. The DNAT says anything destined for PPP0 that is on port 443 and coming from anywhere other than PPP0 (ie inside the network) is to be DNAT'd to 1.1.1.3. So VM B (1.1.1.3) tries to connect to ppp0 (2.2.2.2) on port 443, and this is redirected to VM B on 1.1.1.2. Only under this specific circumstance does the problem occur. I can get VM B (1.1.1.3) to talk directly to VM A (1.1.1.2) all day long and there is no problem, it's only when VM B tries to talk to ppp0 that there is an issue (and it happens within seconds of the initial connection). All these tests have been performed with VM B being a Windows XP guest. Tonight I'll try it with a Linux guest and see if I can make it happen. If that works I might be able to come up with some reproducible test case for you. I have a desktop machine that has Intel VT extensions, so I'll work toward making a portable test case. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 08/06/11 06:57, Patrick McHardy wrote: On 07.06.2011 20:31, Eric Dumazet wrote: Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? Something in the kernel does u16 *ptr = addr (given by kmalloc()) ptr[-1] = 0; Could be an off-one error in a memmove()/memcopy() or loop... I cant see a network issue here. So far me neither, but netfilter appears to trigger the bug. Would it help if I tried some older kernels? This issue only surfaced for me recently as I only installed the VM's in question about 12 weeks ago and have only just started really using them in anger. I could try reproducing it on progressively older kernels to see if I can find one that works and then bisect from there. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 07/06/11 23:35, Patrick McHardy wrote: The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? To be honest I'm actually having trouble finding where TCPMSS is actually set in that ruleset. This is a production machine so I can only take it down after about 9PM at night. I'll have another crack at it tonight. I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? I have one bridge with all my virtual machines on it. In this particular instance the packets leave VM A destined for the IP address of ppp0 (the external interface). This is intercepted by the DNAT PREROUTING rule above and shunted back to VM B. The VM's are on br1 and the external address is ppp0. Without CONFIG_BRIDGE_NETFILTER compiled in I can see the traffic entering and leaving VM B with tcpdump, but the packets never seem to get back to VM A. VM A is XP 32 bit, VM B is Linux. I have some other Linux VM's, so I'll do some more testing tonight between those to see where the packets are going without CONFIG_BRIDGE_NETFILTER set. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use compat_ptr - only declare compat call with CONFIG_COMPAT --- kernel/compat.c |1 + virt/kvm/kvm_main.c | 52 ++- 2 files changed, 52 insertions(+), 1 deletions(-) diff --git a/kernel/compat.c b/kernel/compat.c index 9214dcd..506e176 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat) case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1]) 32 ); } } +EXPORT_SYMBOL_GPL(sigset_from_compat); asmlinkage long compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f78ddb8..04dfce9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -84,6 +84,10 @@ struct dentry *kvm_debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg); +#endif static int hardware_enable_all(void); static void hardware_disable_all(void); @@ -1585,7 +1589,9 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, - .compat_ioctl = kvm_vcpu_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = kvm_vcpu_compat_ioctl, +#endif .mmap = kvm_vcpu_mmap, .llseek = noop_llseek, }; @@ -1874,6 +1880,50 @@ out: return r; } +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = compat_ptr(arg); + int r; + + if (vcpu-kvm-mm != current-mm) + return -EIO; + + switch (ioctl) { + case KVM_SET_SIGNAL_MASK: { + struct kvm_signal_mask __user *sigmask_arg = argp; + struct kvm_signal_mask kvm_sigmask; + compat_sigset_t csigset; + sigset_t sigset; + + if (argp) { + r = -EFAULT; + if (copy_from_user(kvm_sigmask, argp, + sizeof kvm_sigmask)) + goto out; + r = -EINVAL; + if (kvm_sigmask.len != sizeof csigset) + goto out; + r = -EFAULT; + if (copy_from_user(csigset, sigmask_arg-sigset, + sizeof csigset)) + goto out; + } + sigset_from_compat(sigset, csigset); + r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset); + break; + } + default: + r = kvm_vcpu_ioctl(filp, ioctl, arg); + } + +out: + return r; +} +#endif + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC 4/4] Revert virtio: make add_buf return capacity remaining:
On Tue, 7 Jun 2011 18:54:57 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jun 02, 2011 at 06:43:25PM +0300, Michael S. Tsirkin wrote: This reverts commit 3c1b27d5043086a485f8526353ae9fe37bfa1065. The only user was virtio_net, and it switched to min_capacity instead. Signed-off-by: Michael S. Tsirkin m...@redhat.com It turns out another place in virtio_net: receive buf processing - relies on the old behaviour: try_fill_recv: do { if (vi-mergeable_rx_bufs) err = add_recvbuf_mergeable(vi, gfp); else if (vi-big_packets) err = add_recvbuf_big(vi, gfp); else err = add_recvbuf_small(vi, gfp); oom = err == -ENOMEM; if (err 0) break; ++vi-num; } while (err 0); The point is to avoid allocating a buf if the ring is out of space and we are sure add_buf will fail. It works well for mergeable buffers and for big packets if we are not OOM. small packets and oom will do extra get_page/put_page calls (but maybe we don't care). So this is RX, I intend to drop it from this patchset and focus on the TX side for starters. We could do some hack where we get the capacity, and estimate how many packets we need to fill it, then try to do that many. I say hack, because knowing whether we're doing indirect buffers is a layering violation. But that's life when you're trying to do microoptimizations. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-spec: Fix wrong bit number of device status
On Tue, 7 Jun 2011 21:09:42 +0800, Amos Kong ak...@redhat.com wrote: qemu-kvm/hw/virtio_config.h: #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1 #define VIRTIO_CONFIG_S_DRIVER 2 #define VIRTIO_CONFIG_S_DRIVER_OK 4 #define VIRTIO_CONFIG_S_FAILED 0x80 virtio-spec: ACKNOWLEDGE(1) : DRIVER(2) : DRIVER_OK(3) : FAILED(128): The spec refers to bit numbers and the headers use absolute numbers, they are not consistent. it shoule be 'FAILED(8)'. 2^(8-1) = 128 Changes from V1: - Fix wrong patch body Signed-off-by: Amos Kong ak...@redhat.com Thanks, applied! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On Tue, 07 Jun 2011 20:58:06 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: The performance test result: Netperf (TCP_RR): === ept is enabled: Before After 1st 709.58 734.60 2nd 715.40 723.75 3rd 713.45 724.22 ept=0 bypass_guest_pf=0: Before After 1st 706.10 709.63 2nd 709.38 715.80 3rd 695.90 710.70 In what condition, does TCP_RR perform so bad? On 1Gbps network, directly connecting two Intel servers, I got 20 times better result before. Even when I used a KVM guest as the netperf client, I got more than 10 times better result. Could you tell me a bit more details of your test? Kernbech (do not redirect output to /dev/null) == ept is enabled: Before After 1st 2m34.749s 2m33.482s 2nd 2m34.651s 2m33.161s 3rd 2m34.543s 2m34.271s ept=0 bypass_guest_pf=0: Before After 1st 4m43.467s 4m41.873s 2nd 4m45.225s 4m41.668s 3rd 4m47.029s 4m40.128s -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/15] KVM: MMU: optimize to handle dirty bit
On 06/07/2011 09:01 PM, Xiao Guangrong wrote: If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index b0c8184..67971da 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -106,6 +106,9 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) unsigned access; access = (gpte (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; + if (!is_dirty_gpte(gpte)) + access = ~ACC_WRITE_MASK; + Sorry, it can break something: if the gpte is not on the last level and dirty bit is set later, below patch should fix it, i'll merge it into in the next version. diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4287dc8..6ceb5fd 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -101,12 +101,13 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return (ret != orig_pte); } -static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) +static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte, + bool last) { unsigned access; access = (gpte (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; - if (!is_dirty_gpte(gpte)) + if (last !is_dirty_gpte(gpte)) access = ~ACC_WRITE_MASK; #if PTTYPE == 64 @@ -230,8 +231,6 @@ walk: pte |= PT_ACCESSED_MASK; } - pte_access = pt_access FNAME(gpte_access)(vcpu, pte); - walker-ptes[walker-level - 1] = pte; if ((walker-level == PT_PAGE_TABLE_LEVEL) || @@ -266,7 +265,7 @@ walk: break; } - pt_access = pte_access; + pt_access = FNAME(gpte_access)(vcpu, pte, false); --walker-level; } @@ -290,6 +289,7 @@ walk: walker-ptes[walker-level - 1] = pte; } + pte_access = pt_access FNAME(gpte_access)(vcpu, pte, true); walker-pt_access = pt_access; walker-pte_access = pte_access; pgprintk(%s: pte %llx pte_access %x pt_access %x\n, @@ -369,7 +369,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, return; pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte); - pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); + pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte, true); pfn = gfn_to_pfn_atomic(vcpu-kvm, gpte_to_gfn(gpte)); if (mmu_invalid_pfn(pfn)) { kvm_release_pfn_clean(pfn); @@ -444,7 +444,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte)) continue; - pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); + pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte, + true); gfn = gpte_to_gfn(gpte); pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn, pte_access ACC_WRITE_MASK); @@ -790,7 +791,7 @@ static bool FNAME(sync_mmio_spte)(struct kvm_vcpu *vcpu, if (unlikely(is_mmio_spte(*sptep))) { gfn_t gfn = gpte_to_gfn(gpte); unsigned access = sp-role.access FNAME(gpte_access)(vcpu, - gpte); + gpte, true); if (gfn != get_mmio_spte_gfn(*sptep)) { __set_spte(sptep, shadow_trap_nonpresent_pte); @@ -868,7 +869,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) } nr_present++; - pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); + pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte, + true); host_writable = sp-spt[i] SPTE_HOST_WRITEABLE; set_spte(vcpu, sp-spt[i], pte_access, 0, 0, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote: On Tue, 07 Jun 2011 20:58:06 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: The performance test result: Netperf (TCP_RR): === ept is enabled: Before After 1st 709.58 734.60 2nd 715.40 723.75 3rd 713.45 724.22 ept=0 bypass_guest_pf=0: Before After 1st 706.10 709.63 2nd 709.38 715.80 3rd 695.90 710.70 In what condition, does TCP_RR perform so bad? On 1Gbps network, directly connecting two Intel servers, I got 20 times better result before. Even when I used a KVM guest as the netperf client, I got more than 10 times better result. Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both? Could you tell me a bit more details of your test? Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Guest to host communication in kvm
Hello, I am trying to understand the kvm code. I am writing simple code in which I want to send some message or notification from the guest to host (qemu-kvm). I thought of implementing some hypercalls in which, on some condition this hypercall will get called and it get handled in qemu-kvm. But I didn't understand how to handle this in qemu-kvm. Or is there any other better way to do this? Please help me. Thanks in advance. Thanks, Vaibhav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On 06/08/2011 11:25 AM, Xiao Guangrong wrote: On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote: On Tue, 07 Jun 2011 20:58:06 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: The performance test result: Netperf (TCP_RR): === ept is enabled: Before After 1st 709.58 734.60 2nd 715.40 723.75 3rd 713.45 724.22 ept=0 bypass_guest_pf=0: Before After 1st 706.10 709.63 2nd 709.38 715.80 3rd 695.90 710.70 In what condition, does TCP_RR perform so bad? On 1Gbps network, directly connecting two Intel servers, I got 20 times better result before. Even when I used a KVM guest as the netperf client, I got more than 10 times better result. Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both? Could you tell me a bit more details of your test? Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. And this is my test script: #!/bin/sh echo 3 /proc/sys/vm/drop_caches ./netperf -H $HOST_NAME -p $PORT -t TCP_RR -l 60 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On Wed, 08 Jun 2011 11:32:12 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: On 06/08/2011 11:25 AM, Xiao Guangrong wrote: On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote: On Tue, 07 Jun 2011 20:58:06 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: The performance test result: Netperf (TCP_RR): === ept is enabled: Before After 1st 709.58 734.60 2nd 715.40 723.75 3rd 713.45 724.22 ept=0 bypass_guest_pf=0: Before After 1st 706.10 709.63 2nd 709.38 715.80 3rd 695.90 710.70 In what condition, does TCP_RR perform so bad? On 1Gbps network, directly connecting two Intel servers, I got 20 times better result before. Even when I used a KVM guest as the netperf client, I got more than 10 times better result. Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both? ept = 1 only. Could you tell me a bit more details of your test? Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. I see the reason, thank you! I used virtio-net and you used e1000. You are using e1000 to see the MMIO performance change, right? Takuya And this is my test script: #!/bin/sh echo 3 /proc/sys/vm/drop_caches ./netperf -H $HOST_NAME -p $PORT -t TCP_RR -l 60 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
Le mercredi 08 juin 2011 à 08:18 +0800, Brad Campbell a écrit : On 08/06/11 06:57, Patrick McHardy wrote: On 07.06.2011 20:31, Eric Dumazet wrote: Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit : The main suspects would be NAT and TCPMSS. Did you also try whether the crash occurs with only one of these these rules? I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access the address the way I was doing it, so that's a no-go for me. That's really weird since you're apparently not using any bridge netfilter features. It shouldn't have any effect besides changing at which point ip_tables is invoked. How are your network devices configured (specifically any bridges)? Something in the kernel does u16 *ptr = addr (given by kmalloc()) ptr[-1] = 0; Could be an off-one error in a memmove()/memcopy() or loop... I cant see a network issue here. So far me neither, but netfilter appears to trigger the bug. Would it help if I tried some older kernels? This issue only surfaced for me recently as I only installed the VM's in question about 12 weeks ago and have only just started really using them in anger. I could try reproducing it on progressively older kernels to see if I can find one that works and then bisect from there. Well, a bisection definitely should help, but needs a lot of time in your case. Could you try following patch, because this is the 'usual suspect' I had yesterday : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 46cbd28..9f548f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta; } +#if 0 if (fastpath size + sizeof(struct skb_shared_info) = ksize(skb-head)) { memmove(skb-head + size, skb_shinfo(skb), @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, off = nhead; goto adjust_others; } - +#endif data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); if (!data) goto nodata; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote: Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. I see the reason, thank you! I used virtio-net and you used e1000. You are using e1000 to see the MMIO performance change, right? Hi Takuya, Please applied my fix path when you test it again, thanks! :-) (http://www.spinics.net/lists/kvm/msg56017.html) Just then, in order to affirm the performance result, i tested it again, and do not use our office network(since such many boxes in this network), just boot two guests, one runs netperf server, one runs netperf client, both use e1000 and NAT network. I'll test the performance of virtio-net! This is the result: ept = 1: Before patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001182.27 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001185.84 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001181.58 16384 87380 After patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001205.65 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001216.06 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001215.70 16384 87380 ept = 0, bypass_guest_pf=0: Before patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001169.70 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001160.82 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001168.01 16384 87380 After patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001266.28 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.001268.16 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans.
[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf ag...@suse.de --- kernel/compat.c |1 + virt/kvm/kvm_main.c | 50 +- 2 files changed, 50 insertions(+), 1 deletions(-) diff --git a/kernel/compat.c b/kernel/compat.c index 9214dcd..506e176 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat) case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1]) 32 ); } } +EXPORT_SYMBOL_GPL(sigset_from_compat); asmlinkage long compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f78ddb8..f03db82 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -84,6 +84,8 @@ struct dentry *kvm_debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); +static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg); static int hardware_enable_all(void); static void hardware_disable_all(void); @@ -1585,7 +1587,9 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, - .compat_ioctl = kvm_vcpu_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = kvm_vcpu_compat_ioctl, +#endif .mmap = kvm_vcpu_mmap, .llseek = noop_llseek, }; @@ -1874,6 +1878,50 @@ out: return r; } +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = (void __user *)arg; + int r; + + if (vcpu-kvm-mm != current-mm) + return -EIO; + + switch (ioctl) { + case KVM_SET_SIGNAL_MASK: { + struct kvm_signal_mask __user *sigmask_arg = argp; + struct kvm_signal_mask kvm_sigmask; + compat_sigset_t csigset; + sigset_t sigset; + + if (argp) { + r = -EFAULT; + if (copy_from_user(kvm_sigmask, argp, + sizeof kvm_sigmask)) + goto out; + r = -EINVAL; + if (kvm_sigmask.len != sizeof csigset) + goto out; + r = -EFAULT; + if (copy_from_user(csigset, sigmask_arg-sigset, + sizeof csigset)) + goto out; + } + sigset_from_compat(sigset, csigset); + r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset); + break; + } + default: + r = kvm_vcpu_ioctl(filp, ioctl, arg); + } + +out: + return r; +} +#endif + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
On Tuesday 07 June 2011 22:25:15 Alexander Graf wrote: +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = (void __user *)arg; Converting a compat user argument into a pointer should use the compat_ptr() function to do the right thing on s390. Otherwise your patch looks good. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - use compat_ptr - only declare compat call with CONFIG_COMPAT --- kernel/compat.c |1 + virt/kvm/kvm_main.c | 52 ++- 2 files changed, 52 insertions(+), 1 deletions(-) diff --git a/kernel/compat.c b/kernel/compat.c index 9214dcd..506e176 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat) case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1]) 32 ); } } +EXPORT_SYMBOL_GPL(sigset_from_compat); asmlinkage long compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f78ddb8..04dfce9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -84,6 +84,10 @@ struct dentry *kvm_debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg); +#endif static int hardware_enable_all(void); static void hardware_disable_all(void); @@ -1585,7 +1589,9 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, - .compat_ioctl = kvm_vcpu_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = kvm_vcpu_compat_ioctl, +#endif .mmap = kvm_vcpu_mmap, .llseek = noop_llseek, }; @@ -1874,6 +1880,50 @@ out: return r; } +#ifdef CONFIG_COMPAT +static long kvm_vcpu_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm_vcpu *vcpu = filp-private_data; + void __user *argp = compat_ptr(arg); + int r; + + if (vcpu-kvm-mm != current-mm) + return -EIO; + + switch (ioctl) { + case KVM_SET_SIGNAL_MASK: { + struct kvm_signal_mask __user *sigmask_arg = argp; + struct kvm_signal_mask kvm_sigmask; + compat_sigset_t csigset; + sigset_t sigset; + + if (argp) { + r = -EFAULT; + if (copy_from_user(kvm_sigmask, argp, + sizeof kvm_sigmask)) + goto out; + r = -EINVAL; + if (kvm_sigmask.len != sizeof csigset) + goto out; + r = -EFAULT; + if (copy_from_user(csigset, sigmask_arg-sigset, + sizeof csigset)) + goto out; + } + sigset_from_compat(sigset, csigset); + r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset); + break; + } + default: + r = kvm_vcpu_ioctl(filp, ioctl, arg); + } + +out: + return r; +} +#endif + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html