Re: Planning the merge of KVM/arm64
On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote: On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote: On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote: Il 04/06/2013 17:43, Christoffer Dall ha scritto: Hi Paolo, I don't think this is an issue. Gleb and Marcelo for example pulled RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and that wasn't an issue. If Linus pulls the kvm/next tree first the diffstat should be similar and everything clean enough, no? Catalin has previously expressed his wish to upstream the kvm/arm64 patches directly through him given the churn in a completely new architecture and he wants to make sure that everything looks right. It's a pretty clean implementation with quite few dependencies and merging as a working series should be a priority instead of the Kconfig hack, imho. Ok, let's see what Gleb says. I have no objection to merge arm64 kvm trough Catalin if it mean less churn for everyone. That's what we did with arm and mips. Arm64 kvm has a dependency on kvm.git next though, so how Catalin make sure that everything looks right? Will he merge kvm.git/next to arm64 tree? Yes, that was the idea. Everything in kvm/next is considered stable, right? Right. Catalin should wait for kvm.git to be pulled by Linus next merge windows before sending his pull request then. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vfio: fix crash on rmmod
devtmpfs_delete_node() calls devnode() callback with mode==NULL but vfio still tries to write there. The patch fixes this. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Steps to reproduce on freshly booted system with no devices given to VFIO: modprobe vfio rmmod vfio_iommu_spapr_tce rmmod vfio --- drivers/vfio/vfio.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 523c121..259ad28 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1360,7 +1360,7 @@ static const struct file_operations vfio_device_fops = { */ static char *vfio_devnode(struct device *dev, umode_t *mode) { - if (MINOR(dev-devt) == 0) + if (mode (MINOR(dev-devt) == 0)) *mode = S_IRUGO | S_IWUGO; return kasprintf(GFP_KERNEL, vfio/%s, dev_name(dev)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling
Ben, ping! :) This series has tiny fixes (capability and ioctl numbers, changed documentation, compile errors in some configuration). More details are in the commit messages. Rebased on v3.10-rc4. Alexey Kardashevskiy (4): KVM: PPC: Add support for multiple-TCE hcalls powerpc: Prepare to support kernel handling of IOMMU map/unmap KVM: PPC: Add support for IOMMU in-kernel handling KVM: PPC: Add hugepage support for IOMMU in-kernel handling Documentation/virtual/kvm/api.txt| 45 +++ arch/powerpc/include/asm/kvm_host.h |7 + arch/powerpc/include/asm/kvm_ppc.h | 40 ++- arch/powerpc/include/asm/pgtable-ppc64.h |4 + arch/powerpc/include/uapi/asm/kvm.h |7 + arch/powerpc/kvm/book3s_64_vio.c | 398 - arch/powerpc/kvm/book3s_64_vio_hv.c | 471 -- arch/powerpc/kvm/book3s_hv.c | 39 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c| 37 ++- arch/powerpc/kvm/powerpc.c | 15 + arch/powerpc/mm/init_64.c| 77 - include/uapi/linux/kvm.h |3 + 13 files changed, 1121 insertions(+), 28 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls
This adds real mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition to/from real mode. This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs (copied from user and verified) before writing the whole list into the TCE table. This cache will be utilized more in the upcoming VFIO/IOMMU support to continue TCE list processing in the virtual mode in the case if the real mode handler failed for some reason. This adds a guest physical to host real address converter and calls the existing H_PUT_TCE handler. The converting function is going to be fully utilized by upcoming VFIO supporting patches. This also implements the KVM_CAP_PPC_MULTITCE capability, so in order to support the functionality of this patch, QEMU needs to query for this capability and set the hcall-multi-tce hypertas property only if the capability is present, otherwise there will be serious performance degradation. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the commit message * updated doc and moved it to another section * changed capability number 2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup if put_indirect failed, instead we do not even start writing to TCE table if we cannot get TCEs from the user and they are invalid * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for the previous item) * fixed bug with failthrough for H_IPI * removed all get_user() from real mode handlers * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public) --- Documentation/virtual/kvm/api.txt | 17 ++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h | 16 +- arch/powerpc/kvm/book3s_64_vio.c| 118 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 266 +++ arch/powerpc/kvm/book3s_hv.c| 39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c | 37 - arch/powerpc/kvm/powerpc.c |3 + include/uapi/linux/kvm.h|1 + 10 files changed, 473 insertions(+), 32 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to userspace to be handled. +4.83 KVM_CAP_PPC_MULTITCE + +Capability: KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This capability tells the guest that multiple TCE entry add/remove hypercalls +handling is supported by the kernel. This significanly accelerates DMA +operations for PPC KVM guests. + +Unlike other capabilities in this section, this one does not have an ioctl. +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed to +the guest. Othwerwise it might be better for the guest to continue using H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..85d8f26 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -609,6 +609,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a5287fe..e852921b 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); -extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, -unsigned long ioba, unsigned long tce); +extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( + struct kvm_vcpu *vcpu, unsigned long liobn); +extern long kvmppc_emulated_validate_tce(unsigned long tce); +extern void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt, + unsigned long ioba, unsigned long tce); +extern long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce); +extern long kvmppc_virtmode_h_put_tce_indirect(struct
[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 85d8f26..ac0e2fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct iommu_group *grp;/* used for IOMMU groups */ struct page *pages[0]; }; @@ -611,6 +612,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ + unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */ + unsigned long tce_reason; /* The reason of switching to the virtmode */ #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e852921b..934e01d 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args); extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table( struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_emulated_validate_tce(unsigned long tce); diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..cf82af4 100644 ---
[PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap
The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a fast hypercall directly in real mode (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API to increment/decrement page counter as get_user_pages API used for user mode mapping does not work in the real mode. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Paul Mackerras pau...@samba.org Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013-05-20: * PageTail() is replaced by PageCompound() in order to have the same checks for whether the page is huge in realmode_get_page() and realmode_put_page() --- arch/powerpc/include/asm/pgtable-ppc64.h |4 ++ arch/powerpc/mm/init_64.c| 77 +- 2 files changed, 80 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index e3d55f6f..7b46e5f 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -376,6 +376,10 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } #endif /* !CONFIG_HUGETLB_PAGE */ +struct page *realmode_pfn_to_page(unsigned long pfn); +int realmode_get_page(struct page *page); +int realmode_put_page(struct page *page); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */ diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index a90b9c4..ce3d8d4 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -297,5 +297,80 @@ void vmemmap_free(unsigned long start, unsigned long end) { } -#endif /* CONFIG_SPARSEMEM_VMEMMAP */ +/* + * We do not have access to the sparsemem vmemmap, so we fallback to + * walking the list of sparsemem blocks which we already maintain for + * the sake of crashdump. In the long run, we might want to maintain + * a tree if performance of that linear walk becomes a problem. + * + * Any of realmode_ functions can fail due to: + * 1) As real sparsemem blocks do not lay in RAM continously (they + * are in virtual address space which is not available in the real mode), + * the requested page struct can be split between blocks so get_page/put_page + * may fail. + * 2) When huge pages are used, the get_page/put_page API will fail + * in real mode as the linked addresses in the page struct are virtual + * too. + * When 1) or 2) takes place, the API returns an error code to cause + * an exit to kernel virtual mode where the operation will be completed. + */ +struct page *realmode_pfn_to_page(unsigned long pfn) +{ + struct vmemmap_backing *vmem_back; + struct page *page; + unsigned long page_size = 1 mmu_psize_defs[mmu_vmemmap_psize].shift; + unsigned long pg_va = (unsigned long) pfn_to_page(pfn); + + for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back-list) { + if (pg_va vmem_back-virt_addr) + continue; + /* Check that page struct is not split between real pages */ + if ((pg_va + sizeof(struct page)) + (vmem_back-virt_addr + page_size)) + return NULL; + + page = (struct page *) (vmem_back-phys + pg_va - + vmem_back-virt_addr); + return page; + } + + return NULL; +} +EXPORT_SYMBOL_GPL(realmode_pfn_to_page); + +#elif defined(CONFIG_FLATMEM) + +struct page *realmode_pfn_to_page(unsigned long pfn) +{ + struct page *page = pfn_to_page(pfn); + return page; +} +EXPORT_SYMBOL_GPL(realmode_pfn_to_page); + +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */ + +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM) +int realmode_get_page(struct page *page) +{ + if (PageCompound(page)) + return -EAGAIN; + + get_page(page); + + return 0; +} +EXPORT_SYMBOL_GPL(realmode_get_page); + +int realmode_put_page(struct page *page) +{ + if (PageCompound(page)) + return -EAGAIN; + + if (!atomic_add_unless(page-_count, -1, 1)) + return -EAGAIN; + + return 0; +} +EXPORT_SYMBOL_GPL(realmode_put_page); +#endif -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH 4/4] KVM: PPC: Add hugepage support for IOMMU in-kernel handling
This adds special support for huge pages (16MB). The reference counting cannot be easily done for such pages in real mode (when MMU is off) so we added a list of huge pages. It is populated in virtual mode and get_page is called just once per a huge page. Real mode handlers check if the requested page is huge and in the list, then no reference counting is done, otherwise an exit to virtual mode happens. The list is released at KVM exit. At the moment the fastest card available for tests uses up to 9 huge pages so walking through this list is not very expensive. However this can change and we may want to optimize this. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * fixed compile error when CONFIG_IOMMU_API=n 2013/05/20: * the real mode handler now searches for a huge page by gpa (used to be pte) * the virtual mode handler prints warning if it is called twice for the same huge page as the real mode handler is expected to fail just once - when a huge page is not in the list yet. * the huge page is refcounted twice - when added to the hugepage list and when used in the virtual mode hcall handler (can be optimized but it will make the patch less nice). --- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h | 22 + arch/powerpc/kvm/book3s_64_vio.c| 88 +-- arch/powerpc/kvm/book3s_64_vio_hv.c | 40 ++-- 4 files changed, 146 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ac0e2fe..4fc0865 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -181,6 +181,8 @@ struct kvmppc_spapr_tce_table { u64 liobn; u32 window_size; struct iommu_group *grp;/* used for IOMMU groups */ + struct list_head hugepages; /* used for IOMMU groups */ + spinlock_t hugepages_lock; /* used for IOMMU groups */ struct page *pages[0]; }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 934e01d..9054df0 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -149,6 +149,28 @@ extern long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu, extern long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce_value, unsigned long npages); + +/* + * The KVM guest can be backed with 16MB pages (qemu switch + * -mem-path /var/lib/hugetlbfs/global/pagesize-16MB/). + * In this case, we cannot do page counting from the real mode + * as the compound pages are used - they are linked in a list + * with pointers as virtual addresses which are inaccessible + * in real mode. + * + * The code below keeps a 16MB pages list and uses page struct + * in real mode if it is already locked in RAM and inserted into + * the list or switches to the virtual mode where it can be + * handled in a usual manner. + */ +struct kvmppc_iommu_hugepage { + struct list_head list; + pte_t pte; /* Huge page PTE */ + unsigned long gpa; /* Guest physical address */ + struct page *page; /* page struct of the very first subpage */ + unsigned long size; /* Huge page size (always 16MB at the moment) */ +}; + extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *rma); extern struct kvmppc_linear_info *kvm_alloc_rma(void); diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index ffb4698..9e2ba4d 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -45,6 +45,71 @@ #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) #define ERROR_ADDR ((void *)~(unsigned long)0x0) +#ifdef CONFIG_IOMMU_API +/* Adds a new huge page descriptor to the list */ +static long kvmppc_iommu_hugepage_try_add( + struct kvmppc_spapr_tce_table *tt, + pte_t pte, unsigned long hva, unsigned long gpa, + unsigned long pg_size) +{ + long ret = 0; + struct kvmppc_iommu_hugepage *hp; + struct page *p; + + spin_lock(tt-hugepages_lock); + list_for_each_entry(hp, tt-hugepages, list) { + if (hp-pte == pte) + goto unlock_exit; + } + + hva = hva ~(pg_size - 1); + ret = get_user_pages_fast(hva, 1, true/*write*/, p); + if ((ret != 1) || !p) { + ret = -EFAULT; + goto unlock_exit; + } + ret = 0; + + hp = kzalloc(sizeof(*hp), GFP_KERNEL); + if (!hp) { + ret = -ENOMEM; + goto unlock_exit; + } + + hp-page = p; + hp-pte = pte; + hp-gpa = gpa ~(pg_size - 1); +
Re: Intercepting task switches in svm/vmx with tdp enabled
On Wed, Jun 05, 2013 at 12:51:29AM -0500, Leo Prasath wrote: Hi, I am interested in intercepting task switches in vmx/svm in 64 bit mode with ept/npt enabled. However, I am not seeing the exit code due to task switch ( 9 for vmx and 125 for svm ) in the list of vm exits that I see in a typical guest run. I do not think task switch exit means what you think it means. This is not OS context switches, but some x86 cpu concept of task that can be switched by using HW mechanism. No modern OS uses it. Actually in 64 bit mode it does not exists at all. I log the vm exit codes in the x86/svm.c:handle_exit method for svm and x86/vmx.c:vmx_handle_exit for vmx. Any pointers regarding this is very much appreciated. On a related note, does cr3 write interception approximate task switch interception ? Depending on how OS works. For Linux it is probably true (if cr3 value changes). ( I was able to intercept cr3 writes with svm while npt was enabled. but with vmx, I could intercept cr3 writes only with ept disabled ) Thanks, Leo Looking through the manuals, svm has a control bit in VMCS for enabling / disabling task switch interception while vmx does not seem to have such a control bit. Again, this is not task switch you are looking for. - Excerpts from the manuals : Intel -- Exit reason #9 indicates a vm exit due to task switch. Vol. 3C 24-9 : Some instructions cause VM exits regardless of the settings of the processor-based VM-execution controls (see Section 25.1.2), as do task switches (see Section 25.2). Vol. 3C 25-6 : Task switches. Task switches are not allowed in VMX non-root operation. Any attempt to effect a task switch in VMX non-root operation causes a VM exit. See Section 25.4.2 AMD --- Intercept code to look for is: 7Dh VMEXIT_TASK_SWITCH task switch 15.14 AMD64 Technology Miscellaneous Intercepts : The SVM architecture includes intercepts to handle task switches, processor freezes due to FERR, and shutdown operations. Task switches can modify several resources that a VMM may want to protect (CR3, EFLAGS, LDT). However, instead of checking various intercepts (e.g., CR3 Write, LDTR Write) individually, task switches check only a single intercept bit. Page 581 : Layout of VMCB says Byte offset 00Ch : bit 29 Intercept task switches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, June 05, 2013 12:39 AM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Alexander Graf Subject: Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support On 06/03/2013 03:54:22 PM, Mihai Caraman wrote: Mihai Caraman (6): KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage KVM: PPC: Book3E: Refactor SPE_FP exit handling KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC KVM: PPC: Book3E: Add AltiVec support KVM: PPC: Book3E: Add ONE_REG AltiVec support KVM: PPC: Book3E: Enhance FPU laziness arch/powerpc/include/asm/kvm_asm.h| 16 ++- arch/powerpc/kvm/booke.c | 189 arch/powerpc/kvm/booke.h |4 +- arch/powerpc/kvm/bookehv_interrupts.S |8 +- arch/powerpc/kvm/e500.c | 10 +- arch/powerpc/kvm/e500_emulate.c |8 +- arch/powerpc/kvm/e500mc.c | 10 ++- 7 files changed, 199 insertions(+), 46 deletions(-) This looks like a bit much for 3.10 (certainly, subject lines like refactor and enhance and add support aren't going to make Linus happy given that we're past rc4) so I think we should apply http://patchwork.ozlabs.org/patch/242896/ for 3.10. Then for 3.11, revert it after applying this patchset. Why not 1/6 plus e6500 removal? -Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling
+ /* +* The interrupt is shared, KVM support for the featured unit +* is detected at run-time. +*/ This is a decent comment for the changelog, but for the code itself it seems fairly obvious if you look at the definition of kvmppc_supports_spe(). I will move it to change log. + bool handled = false; + + if (kvmppc_supports_spe()) { +#ifdef CONFIG_SPE + if (cpu_has_feature(CPU_FTR_SPE)) Didn't you already check this using kvmppc_supports_spe()? It makes sense with the next patch. It handles the improbable case of having CONFIG_ALTIVEC and CONFIG_SPE defined: if (kvmppc_supports_altivec() || kvmppc_supports_spe()). case BOOKE_INTERRUPT_SPE_FP_ROUND: +#ifdef CONFIG_SPE kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND); r = RESUME_GUEST; break; Why not use kvmppc_supports_spe() here, for consistency? I added cpu_has_feature(CPU_FTR_SPE) for the case specified above, but here SPE_FP_ROUND is not shared with ALTIVEC. CONFIG_SPE is used in other places in KVM without this check, shouldn't be all or nothing? -Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? Yes but you wanted a generic transport feature bit for flexible SG layout. Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then? If we need it within 6 months, I'd rather do that: this feature would then be assumed for 1.0, and reserved. We may do that for other features, too (the committee will have to review). MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... We might have something ready to deploy in 3 months, but realistically, I'd rather have it ready and tested outside the main git tree(s) and push it once the standard is committed. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..b07d96b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; - } + } else + msg.msg_control = NULL; /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? Because you probably use cpu that supports unrestricted mode or use AMD processor. Can you try loading kvm-intel with unrestricted_guest=0 option? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] kvm: x86: Emulate MSR_PLATFORM_INFO
On Tue, Jun 04, 2013 at 12:02:16PM -0400, Bandan Das wrote: These patches add an emulated MSR_PLATFORM_INFO that kvm guests can read as described in section 14.3.2.4 of the Intel SDM. The relevant changes and details are in [2/2]; [1/2] makes vendor_intel generic. There are atleat two known applications that fail to run because of this MSR missing - Sandra and vTune. So I really want Intel opinion on this. Right now it is impossible to implement the MSR correctly in the face of migration (may be with tsc scaling it will be possible) and while it is unimplemented if application tries to use it it fails, but if we will implement it application will just produce incorrect result without any means for user to detect it. v2: Addressed suggested changes Bandan Das (2): kvm: make vendor_intel a generic function kvm: x86: emulate MSR_PLATFORM_INFO arch/x86/include/asm/kvm_emulate.h| 13 -- arch/x86/include/asm/kvm_host.h | 20 +++ arch/x86/include/uapi/asm/msr-index.h | 2 ++ arch/x86/kvm/cpuid.c | 19 ++ arch/x86/kvm/cpuid.h | 16 arch/x86/kvm/emulate.c| 16 +++- arch/x86/kvm/x86.c| 48 +++ 7 files changed, 109 insertions(+), 25 deletions(-) -- 1.8.1.4 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
On Wed, Jun 05, 2013 at 03:40:46PM +0800, Jason Wang wrote: When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Jason Wang jasow...@redhat.com Good catch. Acked-by: Michael S. Tsirkin m...@redhat.com This needs to go into stable as well. --- drivers/vhost/net.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..b07d96b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; - } + } else + msg.msg_control = NULL; /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nVMX w/ Haswell] KVM unit-tests in L1 - eventinj test fails trying to send NMI
Adding Jan, Jun, to see if they have any inputs here. /kashyap On Tue, Jun 4, 2013 at 6:14 PM, Kashyap Chamarthy kashyap...@gmail.com wrote: Heya, So, I invoked this in L1 with: === [test@foo kvm-unit-tests]$ time qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio -nographic -no-user-config -nodefaults -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/eventinj.flat | tee /var/tmp/eventinj-test.txt enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 Try to divide by 0 DE isr running divider is 0 Result is 150 DE exception: PASS Try int 3 BP isr running After int 3 BP exception: PASS Try send vec 33 to itself irq1 running After vec 33 to itself vec 33: PASS Try int $33 irq1 running After int $33 int $33: PASS Try send vec 32 and 33 to itself irq1 running irq0 running After vec 32 and 33 to itself vec 32/33: PASS Try send vec 32 and int $33 irq1 running irq0 running After vec 32 and int $33 vec 32/int $33: PASS Try send vec 33 and 62 and mask one with TPR irq1 running After 33/62 TPR test TPR: PASS irq0 running Try send NMI to itself After NMI to itself NMI: FAIL Try int 33 with shadowed stack irq1 running After int 33 with shadowed stack int 33 with shadowed stack: PASS summary: 9 tests, 1 failures real0m0.647s user0m0.164s sys 0m0.146s [test@foo kvm-unit-tests]$ === Any hints on further debugging this ? Other info: -- - L1's qemu-kvm CLI === # ps -ef | grep -i qemu qemu 5455 1 94 Jun02 ?1-07:14:29 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name regular-guest -S -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu Haswell,+vmx -m 10240 -smp 4,sockets=4,cores=1,threads=1 -uuid 4ed9ac0b-7f72-dfcf-68b3-e6fe2ac588b2 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test/vmimages/regular-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:c1:34,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 root 12255 5419 0 08:41 pts/200:00:00 grep --color=auto -i qemu === - Setup details -- https://github.com/kashyapc/nvmx-haswell/blob/master/SETUP-nVMX.rst /kashyap -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, June 05, 2013 1:54 AM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Caraman Mihai Claudiu-B02008 Subject: Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness On 06/03/2013 03:54:28 PM, Mihai Caraman wrote: Adopt AltiVec approach to increase laziness by calling kvmppc_load_guest_fp() just before returning to guest instaed of each sched in. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com If you did this *before* adding Altivec it would have saved a question in an earlier patch. :-) I kept asking myself about the order and in the end I decided that this is an improvement originated from AltiVec work. FPU may be further cleaned up (get rid of active state, etc). --- arch/powerpc/kvm/booke.c |1 + arch/powerpc/kvm/e500mc.c |2 -- 2 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 019496d..5382238 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1258,6 +1258,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, } else { kvmppc_lazy_ee_enable(); kvmppc_load_guest_altivec(vcpu); + kvmppc_load_guest_fp(vcpu); } } You should probably do these before kvmppc_lazy_ee_enable(). Why? I wanted to look like part of lightweight_exit. Actually, I don't think this is a good idea at all. As I understand it, you're not supposed to take kernel ownersship of floating point in non-atomic context, because an interrupt could itself call enable_kernel_fp(). So lightweight_exit isn't executed in atomic context? Will be lazyee fixes including kvmppc_fix_ee_before_entry() in 3.10? 64-bit Book3E KVM is unreliable without them. Should we disable e5500 too for 3.10? Do you have benchmarks showing it's even worthwhile? No yet but isn't this the whole idea of FPU/AltiVec laziness that the kernel is struggling to achieve? Without it when returning to host (if another process got unit ownership in handle_exit) we restore and save the unit state for nothing. This multiplies when the ownership goes back and forth between handle_exit and other processes. Do you have in mid a specific AltiVec benchmark? I have a stress application that do consecutive vector assignments which proved to be very good at finding state corruptions. If I combine this application on the host with a guest that stays longer on handle_exit (running a tlb or emulation intensive application) I might have good data to support my case. -Mike -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs well. I will give another test case in x86/realmode.c later. BTW, what is the action when a 64-bit instruction executes in x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c? On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? Because you probably use cpu that supports unrestricted mode or use AMD processor. Can you try loading kvm-intel with unrestricted_guest=0 option? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 4/6] KVM: PPC: Book3E: Add AltiVec support
+ * Simulate AltiVec unavailable fault to load guest state + * from thread to AltiVec unit. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + if (!(current-thread.regs-msr MSR_VEC)) { + load_up_altivec(NULL); + current-thread.regs-msr |= MSR_VEC; + } + } +#endif Why not use kvmppc_supports_altivec()? In fact, there's nothing KVM-specific about these functions... I will do so, I had this code before kvmppc_supports_altivec() :) static inline bool kvmppc_supports_spe(void) { #ifdef CONFIG_SPE @@ -947,7 +1016,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, */ bool handled = false; - if (kvmppc_supports_spe()) { + if (kvmppc_supports_altivec() || kvmppc_supports_spe()) { #ifdef CONFIG_SPE if (cpu_has_feature(CPU_FTR_SPE)) if (vcpu-arch.shared-msr MSR_SPE) { The distinction between how you're handling SPE and Altivec here doesn't really have anything to do with SPE versus Altivec -- it's PR-mode versus HV-mode. I was mislead by MSR_SPE bit, we should rename it as MSR_SPV. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs well. I will give another test case in x86/realmode.c later. The test fails for me on CPU without unrestricted guest support. This means you either test on fixed kernel or unrestricted_guest=0 is broken. BTW, what is the action when a 64-bit instruction executes in x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c? Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c. On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? Because you probably use cpu that supports unrestricted mode or use AMD processor. Can you try loading kvm-intel with unrestricted_guest=0 option? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Planning the merge of KVM/arm64
On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote: On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote: On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote: Il 04/06/2013 17:43, Christoffer Dall ha scritto: Hi Paolo, I don't think this is an issue. Gleb and Marcelo for example pulled RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and that wasn't an issue. If Linus pulls the kvm/next tree first the diffstat should be similar and everything clean enough, no? Catalin has previously expressed his wish to upstream the kvm/arm64 patches directly through him given the churn in a completely new architecture and he wants to make sure that everything looks right. It's a pretty clean implementation with quite few dependencies and merging as a working series should be a priority instead of the Kconfig hack, imho. Ok, let's see what Gleb says. I have no objection to merge arm64 kvm trough Catalin if it mean less churn for everyone. That's what we did with arm and mips. Arm64 kvm has a dependency on kvm.git next though, so how Catalin make sure that everything looks right? Will he merge kvm.git/next to arm64 tree? Yes, that was the idea. Everything in kvm/next is considered stable, right? Right. Catalin should wait for kvm.git to be pulled by Linus next merge windows before sending his pull request then. I think it's better if I push the bulk of the arm64 KVM branch but without Kconfig patch enabling it. This branch would be based on mainline rather than kvm/next. Once your code goes in mainline, I'll just push the Kconfig entry (for bisection reasons, it could be after -rc1). This would keep the pull-request diffstat cleaner. As we discussed some time ago, after the core arm64 KVM is merged you will use the same workflow as for arm (merge via the kvm tree). Thanks. -- Catalin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Test case of emulating multibyte NOP
Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e6e48c9 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ +MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop +MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop +MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop +MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop +MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop +MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop +MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop +MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop +MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop +exec_in_big_real_mode(insn_nopl1); +exec_in_big_real_mode(insn_nopl2); +exec_in_big_real_mode(insn_nopl3); +exec_in_big_real_mode(insn_nopl4); +exec_in_big_real_mode(insn_nopl5); +exec_in_big_real_mode(insn_nopl6); +exec_in_big_real_mode(insn_nopl7); +exec_in_big_real_mode(insn_nopl8); +exec_in_big_real_mode(insn_nopl9); +report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
I mean after adding unrestricted_guest=0, the error is reproduced. Sorry for confused expression. I have committed another patch in x86/realmode.c. On Wed, Jun 5, 2013 at 5:28 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs well. I will give another test case in x86/realmode.c later. The test fails for me on CPU without unrestricted guest support. This means you either test on fixed kernel or unrestricted_guest=0 is broken. BTW, what is the action when a 64-bit instruction executes in x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c? Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c. On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? Because you probably use cpu that supports unrestricted mode or use AMD processor. Can you try loading kvm-intel with unrestricted_guest=0 option? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of multibyte NOP in emulation mode
On Wed, Jun 05, 2013 at 05:46:31PM +0800, 李春奇 Arthur Chunqi Li wrote: I mean after adding unrestricted_guest=0, the error is reproduced. Ah, OK. unrestricted_guest=0 works then :) Sorry for confused expression. I have committed another patch in x86/realmode.c. On Wed, Jun 5, 2013 at 5:28 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs well. I will give another test case in x86/realmode.c later. The test fails for me on CPU without unrestricted guest support. This means you either test on fixed kernel or unrestricted_guest=0 is broken. BTW, what is the action when a 64-bit instruction executes in x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c? Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c. On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote: Yes, that should be the point. x86/realmode.c is always running in emulation mode. I added the testing here there but no error occurred. I cannot find the reason. The code is as follows added to x86/realmode.c static void test_nopl(void) { MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r); exec_in_big_real_mode(insn_nopl); report(nopl, 0, 1); } and I objdump from realmode.flat is as follows: 6458 insn_code_nopl: 6458: 0f 1f 00nopl (%eax) But there cause no error when executing this insn. Why? Because you probably use cpu that supports unrestricted mode or use AMD processor. Can you try loading kvm-intel with unrestricted_guest=0 option? On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This case can test one of bugs when booting RHEL5.9 64-bit. Adding the test to x86/realmode.c will be much easier. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 33 + 1 file changed, 33 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..f26c70f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_nopl(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ +ulong *cr3 = (ulong *)read_cr3(); + +// Pad with RET instructions +memset(insn_page, 0xc3, 4096); +memset(alt_insn_page, 0xc3, 4096); +// Place a trapping instruction in the page to trigger a VMEXIT +insn_page[0] = 0x89; // mov %eax, (%rax) +insn_page[1] = 0x00; +insn_page[2] = 0x90; // nop +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX] +alt_insn_page[1] = 0x1f; +alt_insn_page[2] = 0x00; + +// Load the code TLB with insn_page, but point the page tables at +// alt_insn_page (and keep the data TLB clear, for AMD decode assist). +// This will make the CPU trap on the insn_page instruction but the +// hypervisor will see alt_insn_page. +install_page(cr3, virt_to_phys(insn_page), insn_ram); +// Load code TLB +invlpg(insn_ram); +asm volatile(call *%0 : : r(insn_ram + 3)); +// Trap, let hypervisor emulate at alt_insn_page +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); +asm volatile(call *%0 : : r(insn_ram), a(mem)); +report(nopl, 1); +} + int main() { void *mem; @@ -964,6 +995,8 @@ int main() test_string_io_mmio(mem); + test_nopl(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
On Tue, Jun 04, 2013 at 10:26:01PM -0300, Marcelo Tosatti wrote: On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote: Hi Gleb, Paolo, Marcelo, I have putted the potential controversial patches to the latter that are patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed, I think its are ready for being merged. If not luck enough, further discussion is needed, could you please apply that patches first? :) Thank you in advance! snip Looks good to me. I'll take it as Reviewed-by for the entire series :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote: Hi Gleb, Paolo, Marcelo, I have putted the potential controversial patches to the latter that are patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed, I think its are ready for being merged. If not luck enough, further discussion is needed, could you please apply that patches first? :) Thank you in advance! Applied all of them to queue. Thanks! -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: add kvm_para_available to asm-generic/kvm_para.h
On Tue, Jun 04, 2013 at 12:33:19PM +0100, James Hogan wrote: On 4 June 2013 10:05, Gleb Natapov g...@redhat.com wrote: On Wed, May 22, 2013 at 12:29:22PM +0100, James Hogan wrote: According to include/uapi/linux/kvm_para.h architectures should define kvm_para_available, so add an implementation to asm-generic/kvm_para.h which just returns false. What is this fixing? The only user of kvm_para_available() that can benefit from this is in sound/pci/intel8x0.c, but I do not see follow up patch to use it there. It was an unintentional config with mips + kvm + intel8x0 that hit it (I think I accidentally based my mips config off an x86_64 config). Kind of equivalent to a randconfig build failure I suppose. Yes, I see. I will queue the fix. Thanks! -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 04:49:22PM +0930, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? Yes but you wanted a generic transport feature bit for flexible SG layout. Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then? If we need it within 6 months, I'd rather do that: this feature would then be assumed for 1.0, and reserved. We may do that for other features, too (the committee will have to review). MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... We might have something ready to deploy in 3 months, but realistically, I'd rather have it ready and tested outside the main git tree(s) and push it once the standard is committed. Cheers, Rusty. We have working code already. We don't need 3 months out of tree, this gets us no progress. To make it ready to deploy we need it upstream and people testing it. I think the issue is the layout change, you don't want the new config layout before we get standartization rolling, right? Okay how about this small patch to the linux guest (completely untested, just to give you the idea): diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index a7ce730..0e34862 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -675,6 +675,33 @@ static void virtio_pci_release_dev(struct device *_d) */ } +/* Map a BAR. But carefully: make sure we don't overlap the MSI-X table */ +static void __iomem * virtio_pci_iomap(struct pci_dev *pci_dev, int bar) +{ + int msix_cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); + if (msix_cap) { + u32 offset; + u8 bir; + pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_TABLE, + offset); + bir = (u8)(offset PCI_MSIX_TABLE_BIR); + offset = PCI_MSIX_TABLE_OFFSET; + /* Spec says table offset is in a 4K page all by itself */ + if (bir == bar offset 4096) + return NULL; + + pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_PBA, + offset); + bir = (u8)(offset PCI_MSIX_PBA_BIR); + offset = PCI_MSIX_PBA_OFFSET; + /* Spec says table offset is in a 4K page all by itself. */ + if (bir == bar offset 4096) + return NULL; + } + /* 4K is enough for all devices at the moment. */ + return pci_iomap(pci_dev, 0, 4096); +} + /* the PCI probing function */ static int virtio_pci_probe(struct pci_dev *pci_dev, const struct pci_device_id *id) @@ -716,7 +743,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, if (err) goto out_enable_device; - vp_dev-ioaddr = pci_iomap(pci_dev, 0, 0); + vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 0); + /* Failed to map BAR0? Try with BAR1. */ + if (vp_dev-ioaddr == NULL) { + vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 1); if (vp_dev-ioaddr == NULL) { err = -ENOMEM; goto out_req_regions; In other words: put a copy of IO config at start of MMIO BAR1, making sure we don't overlap the MSIX table that's there. This does not break windows guests, and makes us compliant to the PCI express spec. Is this small enough to start going immediately, without waiting 3+ months? I really think it's a good idea to put something like this in the field: we might discover more issues around MMIO and we'll address them in the new config layout. This way we don't get two changes: new layout and switch to MMIO all at once. If you like this, I can make the appropriate spec and qemu changes in a couple of days. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!
On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote: Hello, Hit this right after killing trinity with Ctrl-C. Was fuzzing v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user. Tommi Thanks a lot for the report. If found some bugs when looking at this: I think they were introduced by 2839400f8fe28ce216eeeba3fb97bdf90977f7ad though I don't exactly see how ctrl-c can trigger this. I'll work on patches - is this reproducible at all? [29175] Random reseed: 3970521611 [29175] Random reseed: 202886419 [29175] Random reseed: 2930978521 [179904.099501] binder: 29175:2539 ioctl 4010630e fff returned -22 [29175] Random reseed: 2776471322 [29175] Random reseed: 3086119361 child 2606 exiting [29175] Bailing main loop. Exit reason: ctrl-c [179906.393060] [ cut here ] [179906.396341] kernel BUG at /build/linux/mm/slub.c:3352! [179906.399693] invalid opcode: [#1] SMP DEBUG_PAGEALLOC [179906.403272] CPU: 0 PID: 29175 Comm: trinity-main Not tainted 3.10.0-rc4 #1 [179906.407692] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [179906.411475] task: 8800b69e47c0 ti: 880092f2e000 task.ti: 880092f2e000 [179906.416305] RIP: 0010:[81225255] [81225255] kfree+0x155/0x2c0 [179906.421462] RSP: :880092f2fdb0 EFLAGS: 00010246 [179906.424983] RAX: 0100 RBX: 88009e588000 RCX: [179906.429746] RDX: 8800b69e47c0 RSI: 000a0004 RDI: 88009e588000 [179906.434499] RBP: 880092f2fdd8 R08: 0001 R09: [179906.439226] R10: R11: 0001 R12: [179906.443835] R13: ea0002796200 R14: 8800b9a960f8 R15: 8800ba06f6a0 [179906.448470] FS: 7f04cd25c700() GS:8800bf60() knlGS: [179906.453857] CS: 0010 DS: ES: CR0: 80050033 [179906.456956] CR2: 7f98e29d8f50 CR3: 9294a000 CR4: 06f0 [179906.460558] DR0: DR1: DR2: [179906.464059] DR3: DR6: 0ff0 DR7: 0400 [179906.467617] Stack: [179906.468704] 88001a7c 8800b9a960f8 [179906.472638] 8800ba06f6a0 880092f2fdf0 81c1c6df 88001a7c [179906.476583] 880092f2fe18 81c1c771 8800b69718c0 0008 [179906.480377] Call Trace: [179906.481636] [81c1c6df] vhost_net_vq_reset+0x7f/0xb0 [179906.484611] [81c1c771] vhost_net_release+0x61/0xb0 [179906.487481] [8123237a] __fput+0x12a/0x230 [179906.489968] [81232489] fput+0x9/0x10 [179906.492422] [8113a79e] task_work_run+0xae/0xf0 [179906.495169] [811172bc] do_exit+0x44c/0xb40 [179906.497789] [822a24d8] ? retint_swapgs+0x13/0x1b [179906.500652] [81117a74] do_group_exit+0x84/0xd0 [179906.503348] [81117ad2] SyS_exit_group+0x12/0x20 [179906.506146] [822a2e29] system_call_fastpath+0x16/0x1b [179906.509147] Code: 49 c1 ed 0c 49 c1 e5 06 49 01 c5 49 8b 45 00 f6 c4 80 74 0a 4d 8b 6d 30 66 0f 1f 44 00 00 49 8b 45 00 a8 80 75 28 f6 c4 c0 75 02 0f 0b 49 8b 45 00 31 f6 f6 c4 40 74 04 41 8b 75 68 4c 89 ef e8 [179906.522213] RIP [81225255] kfree+0x155/0x2c0 [179906.524937] RSP 880092f2fdb0 [179906.575627] ---[ end trace 3d4ce10faaa29990 ]--- [179906.577103] Fixing recursive fault but reboot is needed! [29174] Watchdog exiting -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: kernel: [13851.000412] kvm [7482]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x virtual machine was started with: qemu-system-i386 -machine accel=kvm -m 512 -cdrom grml32-full_2013.02.iso qemu-system-x86: 1.5.0+dfsg-3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!
2013/6/5 Michael S. Tsirkin m...@redhat.com: On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote: Hello, Hit this right after killing trinity with Ctrl-C. Was fuzzing v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user. Tommi Thanks a lot for the report. If found some bugs when looking at this: I think they were introduced by 2839400f8fe28ce216eeeba3fb97bdf90977f7ad though I don't exactly see how ctrl-c can trigger this. I'll work on patches - is this reproducible at all? Thanks, glad to hear that the report was useful. Yes, I did reproduce this quite quickly yesterday with trinity, but did not dig any deeper into what was going on. Tommi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!
On Wed, Jun 05, 2013 at 03:06:33PM +0300, Tommi Rantala wrote: 2013/6/5 Michael S. Tsirkin m...@redhat.com: On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote: Hello, Hit this right after killing trinity with Ctrl-C. Was fuzzing v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user. Tommi Thanks a lot for the report. If found some bugs when looking at this: I think they were introduced by 2839400f8fe28ce216eeeba3fb97bdf90977f7ad though I don't exactly see how ctrl-c can trigger this. I'll work on patches - is this reproducible at all? Thanks, glad to hear that the report was useful. Yes, I did reproduce this quite quickly yesterday with trinity, but did not dig any deeper into what was going on. Tommi Great, I'll post patches and we can see if it's fixed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? kernel: [13851.000412] kvm [7482]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x That's harmless. virtual machine was started with: qemu-system-i386 -machine accel=kvm -m 512 -cdrom grml32-full_2013.02.iso qemu-system-x86: 1.5.0+dfsg-3 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 05.06.2013 14:10, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote: On 19.05.2013 14:32, Gleb Natapov wrote: On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote: Dear KVM maintainers, it appears that there is a gap in x86 emulation, at least on a 32-bit host. Stefan found this when running GRML, a live distribution which can be downloaded from: http://download.grml.org/grml32-full_2013.02.iso. His original reported is at http://bugs.debian.org/707257. Can you verify with latest linux.git HEAD? It works for me there on 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame, so it would be helpful if you'll test 32bit before I install one myself. Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse. The virtual machine tries to boot the kernel, but stops after a few seconds and the kern.log shows: At what point does it stop? The machine stops at: Performance Events: Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c1 is 0) Enabling APIC mode: Flat. Using 1 I/O APICs -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Planning the merge of KVM/arm64
On Wed, Jun 05, 2013 at 10:31:46AM +0100, Catalin Marinas wrote: On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote: On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote: On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote: Il 04/06/2013 17:43, Christoffer Dall ha scritto: Hi Paolo, I don't think this is an issue. Gleb and Marcelo for example pulled RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and that wasn't an issue. If Linus pulls the kvm/next tree first the diffstat should be similar and everything clean enough, no? Catalin has previously expressed his wish to upstream the kvm/arm64 patches directly through him given the churn in a completely new architecture and he wants to make sure that everything looks right. It's a pretty clean implementation with quite few dependencies and merging as a working series should be a priority instead of the Kconfig hack, imho. Ok, let's see what Gleb says. I have no objection to merge arm64 kvm trough Catalin if it mean less churn for everyone. That's what we did with arm and mips. Arm64 kvm has a dependency on kvm.git next though, so how Catalin make sure that everything looks right? Will he merge kvm.git/next to arm64 tree? Yes, that was the idea. Everything in kvm/next is considered stable, right? Right. Catalin should wait for kvm.git to be pulled by Linus next merge windows before sending his pull request then. I think it's better if I push the bulk of the arm64 KVM branch but without Kconfig patch enabling it. This branch would be based on mainline rather than kvm/next. Once your code goes in mainline, I'll just push the Kconfig entry (for bisection reasons, it could be after -rc1). This would keep the pull-request diffstat cleaner. If there will be no non trivial conflicts between your tree and kvm/next it should be OK too. As we discussed some time ago, after the core arm64 KVM is merged you will use the same workflow as for arm (merge via the kvm tree). Thanks. -- Catalin -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Planning the merge of KVM/arm64
On 05/06/13 13:57, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 10:31:46AM +0100, Catalin Marinas wrote: On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote: On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote: On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote: Il 04/06/2013 17:43, Christoffer Dall ha scritto: Hi Paolo, I don't think this is an issue. Gleb and Marcelo for example pulled RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and that wasn't an issue. If Linus pulls the kvm/next tree first the diffstat should be similar and everything clean enough, no? Catalin has previously expressed his wish to upstream the kvm/arm64 patches directly through him given the churn in a completely new architecture and he wants to make sure that everything looks right. It's a pretty clean implementation with quite few dependencies and merging as a working series should be a priority instead of the Kconfig hack, imho. Ok, let's see what Gleb says. I have no objection to merge arm64 kvm trough Catalin if it mean less churn for everyone. That's what we did with arm and mips. Arm64 kvm has a dependency on kvm.git next though, so how Catalin make sure that everything looks right? Will he merge kvm.git/next to arm64 tree? Yes, that was the idea. Everything in kvm/next is considered stable, right? Right. Catalin should wait for kvm.git to be pulled by Linus next merge windows before sending his pull request then. I think it's better if I push the bulk of the arm64 KVM branch but without Kconfig patch enabling it. This branch would be based on mainline rather than kvm/next. Once your code goes in mainline, I'll just push the Kconfig entry (for bisection reasons, it could be after -rc1). This would keep the pull-request diffstat cleaner. If there will be no non trivial conflicts between your tree and kvm/next it should be OK too. In order to make sure no userspace ABI breakage occur during the merge, can you please make sure that the following values are reserved: - Capability KVM_CAP_ARM_EL1_32BIT, 93 - ONE_REG architecture KVM_REG_ARM64, 0x6000ULL So far, nothing clashes with it in kvm/next, but I'd like to be 100% sure... Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: Make local function static
$ make C=1 M=drivers/vhost drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static? drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static? drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static? Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/net.c | 4 ++-- drivers/vhost/scsi.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 05bdc3c..f800901 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -165,7 +165,7 @@ static void vhost_net_clear_ubuf_info(struct vhost_net *n) } } -int vhost_net_set_ubuf_info(struct vhost_net *n) +static int vhost_net_set_ubuf_info(struct vhost_net *n) { bool zcopy; int i; @@ -191,7 +191,7 @@ err: return -ENOMEM; } -void vhost_net_vq_reset(struct vhost_net *n) +static void vhost_net_vq_reset(struct vhost_net *n) { int i; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index de6d817..35ab0ce 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -216,7 +216,7 @@ static int iov_num_pages(struct iovec *iov) ((unsigned long)iov-iov_base PAGE_MASK)) PAGE_SHIFT; } -void tcm_vhost_done_inflight(struct kref *kref) +static void tcm_vhost_done_inflight(struct kref *kref) { struct vhost_scsi_inflight *inflight; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 05, 2013 at 05:41:41PM +0800, 李春奇 Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. The patch is mangled. Lines are wrapped, tabs are replaced with spaces. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e6e48c9 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ +MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop +MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop +MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop +MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop +MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop +MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop +MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop +MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop +MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop +exec_in_big_real_mode(insn_nopl1); +exec_in_big_real_mode(insn_nopl2); +exec_in_big_real_mode(insn_nopl3); +exec_in_big_real_mode(insn_nopl4); +exec_in_big_real_mode(insn_nopl5); +exec_in_big_real_mode(insn_nopl6); +exec_in_big_real_mode(insn_nopl7); +exec_in_big_real_mode(insn_nopl8); +exec_in_big_real_mode(insn_nopl9); +report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
Hello. On 05-06-2013 11:40, Jason Wang wrote: When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..b07d96b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; - } + } else You have to use {} on the *else* branch if you have it of the *if* branch (and vice versa), according to Documentation/CodingStyle. + msg.msg_control = NULL; WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. Well we can not change kernel now (it would break userspace) but we can start zeroing everything in userspace. Also, checkers like coverity might get confused by this. Finally, simpler assignment and comparison make it worth it, don't they? It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. Well we can not change kernel now (it would break userspace) but we can start zeroing everything in userspace. Also, checkers like coverity might get confused by this. Finally, simpler assignment and comparison make it worth it, don't they? I am not at all against the patch! Just pointing out a mistake in the commit message. It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow. I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori Why do you bring this up? No one advocates any ABI breakage, I only suggest extensions. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. Well we can not change kernel now (it would break userspace) but we can start zeroing everything in userspace. Also, checkers like coverity might get confused by this. Finally, simpler assignment and comparison make it worth it, don't they? I am not at all against the patch! Just pointing out a mistake in the commit message. I think we can agree both userspace not initializing them and kernel not checking it are mistakes? Anyway ... could you commit this tweaking the commit message in a way that you consider appropriate? Or want me to repost? It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Wed, Jun 05, 2013 at 05:11:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. Well we can not change kernel now (it would break userspace) but we can start zeroing everything in userspace. Also, checkers like coverity might get confused by this. Finally, simpler assignment and comparison make it worth it, don't they? I am not at all against the patch! Just pointing out a mistake in the commit message. I think we can agree both userspace not initializing them and kernel not checking it are mistakes? Mistake that cannot be fixed at this point. Anyway ... could you commit this tweaking the commit message in a way that you consider appropriate? Or want me to repost? No need to report. It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Wed, Jun 05, 2013 at 05:12:32PM +0300, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 05:11:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote: On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote: On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. The fact that kernel does not check them for zero value is what will prevents us from doing so. Well we can not change kernel now (it would break userspace) but we can start zeroing everything in userspace. Also, checkers like coverity might get confused by this. Finally, simpler assignment and comparison make it worth it, don't they? I am not at all against the patch! Just pointing out a mistake in the commit message. I think we can agree both userspace not initializing them and kernel not checking it are mistakes? Mistake that cannot be fixed at this point. Anyway ... could you commit this tweaking the commit message in a way that you consider appropriate? Or want me to repost? No need to report. Thanks. It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input
On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote: kvm_add_routing_entry makes an attempt to zero-initialize any new routing entry. However, it fails to initialize padding within the u field of the structure kvm_irq_routing_entry. Other functions like kvm_irqchip_update_msi_route also fail to initialize the padding field in kvm_irq_routing_entry. While mostly harmless, this would prevent us from reusing these fields for something useful in the future. It's better to just make sure all input is initialized. Once it is, we can also drop complex field by field assignment and just do the simple *a = *b to update a route entry. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied, thanks. --- kvm-all.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 405480e..f119ce1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s, } n = s-irq_routes-nr++; new = s-irq_routes-entries[n]; -memset(new, 0, sizeof(*new)); -new-gsi = entry-gsi; -new-type = entry-type; -new-flags = entry-flags; -new-u = entry-u; + +*new = *entry; set_gsi(s, entry-gsi); @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s, continue; } -entry-type = new_entry-type; -entry-flags = new_entry-flags; -entry-u = new_entry-u; +*entry = *new_entry; kvm_irqchip_commit_routes(s); @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s, void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { -struct kvm_irq_routing_entry e; +struct kvm_irq_routing_entry e = {}; assert(pin s-gsi_count); @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) return virq; } -route = g_malloc(sizeof(KVMMSIRoute)); +route = g_malloc0(sizeof(KVMMSIRoute)); route-kroute.gsi = virq; route-kroute.type = KVM_IRQ_ROUTING_MSI; route-kroute.flags = 0; @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg) int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; int virq; if (!kvm_gsi_routing_enabled()) { @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) { -struct kvm_irq_routing_entry kroute; +struct kvm_irq_routing_entry kroute = {}; if (!kvm_irqchip_in_kernel()) { return -ENOSYS; -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
Hi Gleb, I generate this mail by git send-email and I think the format is OK. This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. This is precisely why this is an ABI breaker. If you disable IO bars in the BIOS, than the interface that the OS sees will *not have an IO bar*. This *breaks existing guests*. Any time the programming interfaces changes on a PCI device, the revision ID and/or device ID must change. The spec is very clear about this. We cannot disable the IO BAR without changing revision ID/device ID. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow. I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori Why do you bring this up? No one advocates any ABI breakage, I only suggest extensions. It's an ABI breakage. You're claiming that the guests you tested handle the breakage reasonably but it is unquestionably an ABI breakage. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio: fix crash on rmmod
On Wed, 2013-06-05 at 16:03 +1000, Alexey Kardashevskiy wrote: devtmpfs_delete_node() calls devnode() callback with mode==NULL but vfio still tries to write there. The patch fixes this. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Oops. Applied. The mode change just went in for 3.10, so I'll get this in before the final rc. Thanks, Alex Steps to reproduce on freshly booted system with no devices given to VFIO: modprobe vfio rmmod vfio_iommu_spapr_tce rmmod vfio --- drivers/vfio/vfio.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 523c121..259ad28 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1360,7 +1360,7 @@ static const struct file_operations vfio_device_fops = { */ static char *vfio_devnode(struct device *dev, umode_t *mode) { - if (MINOR(dev-devt) == 0) + if (mode (MINOR(dev-devt) == 0)) *mode = S_IRUGO | S_IWUGO; return kasprintf(GFP_KERNEL, vfio/%s, dev_name(dev)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote: Hi Gleb, I generate this mail by git send-email and I think the format is OK. But I have not received the email, only this your reply to it. This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, No problem. Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote: Hi Gleb, I generate this mail by git send-email and I think the format is OK. But I have not received the email, only this your reply to it. Maybe the initial mail is in your spam box. This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, No problem. Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. This is precisely why this is an ABI breaker. If you disable IO bars in the BIOS, than the interface that the OS sees will *not have an IO bar*. This *breaks existing guests*. Any time the programming interfaces changes on a PCI device, the revision ID and/or device ID must change. The spec is very clear about this. We cannot disable the IO BAR without changing revision ID/device ID. But it's a bios/PC issue. It's not a device issue. Anyway, let's put express aside. It's easy to create non-working setups with pci, today: - create 16 pci bridges - put one virtio device behind each boom Try it. I want to fix that. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow. I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori Why do you bring this up? No one advocates any ABI breakage, I only suggest extensions. It's an ABI breakage. You're claiming that the guests you tested handle the breakage reasonably but it is unquestionably an ABI breakage. Regards, Anthony Liguori Adding BAR is not an ABI breakage, do we agree on that? Disabling IO would be but I am not proposing disabling IO. Guests might disable IO. I propose a way to make virtio still work if they do. This is *fixing* things. Not breaking. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 5, 2013 at 11:17 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote: On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote: Hi Gleb, I generate this mail by git send-email and I think the format is OK. But I have not received the email, only this your reply to it. Maybe the initial mail is in your spam box. Doubt it. I do not see it in mailing list archive either: http://news.gmane.org/gmane.comp.emulators.kvm.devel What git command line you've used to sent the email? git send-email --to kvm@vger.kernel.org --cc pbonz...@redhat.com --cc g...@redhat.com 0001-Test-case-of-emulating-multibyte-NOP.patch Should I need to add some smtp options? Arthur This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, No problem. Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote: On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote: Hi Gleb, I generate this mail by git send-email and I think the format is OK. But I have not received the email, only this your reply to it. Maybe the initial mail is in your spam box. Doubt it. I do not see it in mailing list archive either: http://news.gmane.org/gmane.comp.emulators.kvm.devel What git command line you've used to sent the email? This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, No problem. Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Wed, Jun 05, 2013 at 11:22:19PM +0800, 李春奇 Arthur Chunqi Li wrote: On Wed, Jun 5, 2013 at 11:17 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote: On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote: Hi Gleb, I generate this mail by git send-email and I think the format is OK. But I have not received the email, only this your reply to it. Maybe the initial mail is in your spam box. Doubt it. I do not see it in mailing list archive either: http://news.gmane.org/gmane.comp.emulators.kvm.devel What git command line you've used to sent the email? git send-email --to kvm@vger.kernel.org --cc pbonz...@redhat.com --cc g...@redhat.com 0001-Test-case-of-emulating-multibyte-NOP.patch Should I need to add some smtp options? You need to configure smtp server. Just like you need to do that for any other email application. Arthur This is my first try to commit a patch in open source community. Sorry for annoying you guys so much. Thanks, No problem. Arthur On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. This is precisely why this is an ABI breaker. If you disable IO bars in the BIOS, than the interface that the OS sees will *not have an IO bar*. This *breaks existing guests*. Any time the programming interfaces changes on a PCI device, the revision ID and/or device ID must change. The spec is very clear about this. We cannot disable the IO BAR without changing revision ID/device ID. But it's a bios/PC issue. It's not a device issue. Anyway, let's put express aside. It's easy to create non-working setups with pci, today: - create 16 pci bridges - put one virtio device behind each boom Try it. I want to fix that. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow. I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori Why do you bring this up? No one advocates any ABI breakage, I only suggest extensions. It's an ABI breakage. You're claiming that the guests you tested handle the breakage reasonably but it is unquestionably an ABI breakage. Regards, Anthony Liguori Adding BAR is not an ABI breakage, do we agree on that? Disabling IO would be but I am not proposing disabling IO. Guests might disable IO. Look, it's very simple. If the failure in the guest is that BAR0 mapping fails because the device is enabled but the BAR is disabled, then you've broken the ABI. And what's worse is that this isn't for an obscure scenario (like having 15 PCI bridges) but for something that would become the standard scenario (using a PCI-e bus). We need to either bump the revision ID or the device ID if we do this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] vhost: cleanups and fixes
On Thu, May 02, 2013 at 12:49:42PM -0700, Linus Torvalds wrote: On Thu, May 2, 2013 at 12:33 PM, Michael S. Tsirkin m...@redhat.com wrote: I prefer not rebasing, Good. will play with git to see why does request-pull get me a wrong diffstat and how to trick it into doing the right thing. Maybe merging my branch into master will do this. No, don't do an unnecessary merge just to get the diffstat right. git pull-request ends up assuming that there are no back-merges, and that you have a uniquely defined single shared merge base. That allows pull-request to just generate the diff directly from that merge base, without actually trying to do the merge itself (which may have conflicts etc). But because git pull-request doesn't actually *do* the merge, it means that it will fail to give the correct diffstat if the tree is complicated and has multiple merge bases, and it can't really figure what the original shared state was before the development. This is just one reason I do *not* want to see back-merges. They make history harder to read not just for humans. You can either ignore the problem (I'll see the real diffstat when I actually do the merge), or you can do a test-merge yourself (that you do *not* then push out in the development branch - keep it in a temporary branch for your own edification or just delete it after doing the merge, and don't do development on it!) In this case, it's an indirect back-merge: you back-merged a commit from the target tree that I have now merged, so it has become a back-merge. I'm not sure why you did that - if you needed to start from that state, it would actually have been better to just start at that state instead of merging it. OK I'm in that situation again. I have some vhost-net patches that depend on changes in tip. But I also have a vhost-next branch with unrelated changes, that I started from -rc3. Previously I would just merge tip into vhost-next, then everyone's happy, but it will create an implicit back-merge again, won't it? So what should I do? Sorry about being dense. But whatever. You can get the diffstat by using your merge as the base, so git diff -M --stat --summary bc7562355fda.. in your branch should get the right result without any merges etc.. But please do send me a proper pull request. Linus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Test case of emulating multibyte NOP
Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Test case of emulating multibyte NOP
From: Arthur Chunqi Li yzt...@gmail.com Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Test case of emulating multibyte NOP
On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote: This time the email is perfect :) On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote: Add multibyte NOP test case to kvm-unit-tests. This version adds test cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/realmode.c | 24 1 file changed, 24 insertions(+) diff --git a/x86/realmode.c b/x86/realmode.c index 981be08..e103ca6 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -1504,6 +1504,29 @@ static void test_fninit(void) report(fninit, 0, fsw == 0 (fcw 0x103f) == 0x003f); } +static void test_nopl(void) +{ + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop But all nops below that are not supported in 16 bit mode. You can disassemble realmode.elf in 16bit node (objdump -z -d -mi8086 x86/realmode.elf) and check yourself. Lets not complicate things for now and test only those that are easy to test. Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as https://bugzilla.redhat.com/show_bug.cgi?id=967652 DR6=0ff0 DR7=0400 EFER=0500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will the emulator runs well in that case when booting RHEL5.9 64-bit? Arthur + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 bytes nop + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); // 7 bytes nop + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 8 bytes nop + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n\r); // 9 bytes nop + exec_in_big_real_mode(insn_nopl1); + exec_in_big_real_mode(insn_nopl2); + exec_in_big_real_mode(insn_nopl3); + exec_in_big_real_mode(insn_nopl4); + exec_in_big_real_mode(insn_nopl5); + exec_in_big_real_mode(insn_nopl6); + exec_in_big_real_mode(insn_nopl7); + exec_in_big_real_mode(insn_nopl8); + exec_in_big_real_mode(insn_nopl9); + report(nopl, 0, 1); +} + void realmode_start(void) { test_null(); @@ -1548,6 +1571,7 @@ void realmode_start(void) test_xlat(); test_salc(); test_fninit(); + test_nopl(); exit(0); } -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
On 06/05/2013 02:10:07 AM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, June 05, 2013 12:39 AM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Alexander Graf Subject: Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support On 06/03/2013 03:54:22 PM, Mihai Caraman wrote: Mihai Caraman (6): KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage KVM: PPC: Book3E: Refactor SPE_FP exit handling KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC KVM: PPC: Book3E: Add AltiVec support KVM: PPC: Book3E: Add ONE_REG AltiVec support KVM: PPC: Book3E: Enhance FPU laziness arch/powerpc/include/asm/kvm_asm.h| 16 ++- arch/powerpc/kvm/booke.c | 189 arch/powerpc/kvm/booke.h |4 +- arch/powerpc/kvm/bookehv_interrupts.S |8 +- arch/powerpc/kvm/e500.c | 10 +- arch/powerpc/kvm/e500_emulate.c |8 +- arch/powerpc/kvm/e500mc.c | 10 ++- 7 files changed, 199 insertions(+), 46 deletions(-) This looks like a bit much for 3.10 (certainly, subject lines like refactor and enhance and add support aren't going to make Linus happy given that we're past rc4) so I think we should apply http://patchwork.ozlabs.org/patch/242896/ for 3.10. Then for 3.11, revert it after applying this patchset. Why not 1/6 plus e6500 removal? 1/6 is not a bugfix. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... So... does violating the PCI-e spec really matter? Is it preventing any guest from working properly? Yes, absolutely, this wording in spec is not there without reason. Existing guests allocate io space for PCI express ports in multiples on 4K. Since each express device is behind such a port, this means at most 15 such devices can use IO ports in a system. That's why to make a pci express virtio device, we must allow MMIO and/or some other communication mechanism as the spec requires. This is precisely why this is an ABI breaker. If you disable IO bars in the BIOS, than the interface that the OS sees will *not have an IO bar*. This *breaks existing guests*. Any time the programming interfaces changes on a PCI device, the revision ID and/or device ID must change. The spec is very clear about this. We cannot disable the IO BAR without changing revision ID/device ID. But it's a bios/PC issue. It's not a device issue. Anyway, let's put express aside. It's easy to create non-working setups with pci, today: - create 16 pci bridges - put one virtio device behind each boom Try it. I want to fix that. That's on x86. Besides x86, there are achitectures where IO is unavailable or very slow. I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori Why do you bring this up? No one advocates any ABI breakage, I only suggest extensions. It's an ABI breakage. You're claiming that the guests you tested handle the breakage reasonably but it is unquestionably an ABI breakage. Regards, Anthony Liguori Adding BAR is not an ABI breakage, do we agree on that? Disabling IO would be but I am not proposing disabling IO. Guests might disable IO. Look, it's very simple. If the failure in the guest is that BAR0 mapping fails because the device is enabled but the BAR is disabled, There's no such thing as device is enabled and neither BAR is disabled in PCI spec. Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. So modern guests will assume it's ok to work without IO, and it will become more common in the future. then you've broken the ABI. No. It means that the ABI is broken. Guests can disable IO *today* and when they do things don't work. And what's worse is that this isn't for an obscure scenario (like having 15 PCI bridges) but for something that would become the standard scenario (using a PCI-e bus). We need to either bump the revision ID or the device ID if we do this. Regards, Anthony Liguori We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. I think 1 == 2 == 3 and I view 2 as an ABI breaker. libvirt does like policy so they're going to make a simple decision and always use the same bus by default. I suspect if we made PCI the default, they might just always set the PCI-e flag just because. There are hundreds of thousands if not millions of guests with existing virtio-pci drivers. Forcing them to upgrade better have an extremely good justification. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. It also means we don't need to play games about sometimes enabling IO bars and sometimes not. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling
On 06/05/2013 02:29:47 AM, Caraman Mihai Claudiu-B02008 wrote: case BOOKE_INTERRUPT_SPE_FP_ROUND: +#ifdef CONFIG_SPE kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND); r = RESUME_GUEST; break; Why not use kvmppc_supports_spe() here, for consistency? I added cpu_has_feature(CPU_FTR_SPE) for the case specified above, but here SPE_FP_ROUND is not shared with ALTIVEC. CONFIG_SPE is used in other places in KVM without this check, shouldn't be all or nothing? I'd rather it be consistent, at least between handling one exception and another. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error while running android as guest ---FastModels
On Wed, Jun 05, 2013 at 08:31:13PM +0200, Mai Daftedar wrote: Dear All, After following the Android on Fast Models pdf. I managed to successfully run android as the host and when running the following command to run android as a guest: ./qemu-system-arm \ -vnc :0 \ -k en-us \ -enable-kvm \ -serial stdio \ -kernel zImage \ -m 512 -M vexpress-a15 -cpu cortex-a15 \ -drive file=./android.img,id=virtio-blk,if=none \ -device virtio-blk,drive=virtio-blk,transport=virtio-mmio.0 \ -append virtio_mmio.device=1M@0x4e00:74:0 init=/init console=ttyAMA0 mem=512M root=/dev/vda rw The following errors appear: *device virtio-blk, drive=virtio-blk, transport=virtio-mmio.0: No 'PCI' bus found for device virtio-blk-pci* Any suggestions on what I am missing out Sounds like you're using the wrong version of QEMU. For virtio to work you need to use the rather old kvm-arm-virtio branch in my tree (it may be replicated elsewhere too). There is ongoing work to get this support updated to newer QEMU implementations. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), We can't do this - it will hurt performance. give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. I think 1 == 2 == 3 and I view 2 as an ABI breaker. Why do you think 2 == 3? 2 changes default behaviour. 3 does not. libvirt does like policy so they're going to make a simple decision and always use the same bus by default. I suspect if we made PCI the default, they might just always set the PCI-e flag just because. This sounds very strange. But let's assume you are right for the sake of the argument ... There are hundreds of thousands if not millions of guests with existing virtio-pci drivers. Forcing them to upgrade better have an extremely good justification. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. I don't think how this changes the situation.
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). Oh I forgot: 6) access to MMIO BARs is painful in the BIOS environment so BIOS would typically need to enable IO for the boot device. virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), We can't do this - it will hurt performance. give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. I think 1 == 2 == 3 and I view 2 as an ABI breaker. Why do you think 2 == 3? 2 changes default behaviour. 3 does not. libvirt does like policy so they're going to make a simple decision and always use the same bus by default. I suspect if we made PCI the default, they might just always set the PCI-e flag just because. This sounds very strange. But let's assume you are right for the sake of the argument ... There are hundreds of thousands if not millions of guests with existing virtio-pci drivers. Forcing them to upgrade better have an extremely good
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. For the record, with respect to PCI-e discussion, I have no problem with the idea of changing the device ID or revision id and asking guests to upgrade if they want to use a pcie device. That's not exactly 4 however. I see no reason to couple PCI-e with MMIO discussion, that's just one of the reasons to support MMIO. I think 1 == 2 == 3 and I view 2 as an ABI breaker. libvirt does like policy so they're going to make a simple decision and always use the same bus by default. I suspect if we made PCI the default, they might just always set the PCI-e flag just because. There are hundreds of thousands if not millions of guests with existing virtio-pci drivers. Forcing them to upgrade better have an extremely good justification. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. It also means we don't need to play games about sometimes enabling IO bars and sometimes not. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM Which architectures have PCI but no IO ports? 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM This is not virtio specific. This is true for all devices that use IO. 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). Am well aware of this, this is why we use PIO. I fully agree with you that when we do MMIO, we should switch the notification mechanism to avoid encoding anything meaningful as data. virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), We can't do this - it will hurt performance. Can you explain? I thought the whole trick with separating out the virtqueue notification register was to regain the performance? give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. I think 1 == 2 == 3 and I view 2 as an ABI breaker. Why do you think 2 == 3? 2 changes default behaviour. 3 does not. It doesn't change the default behavior but then we're pushing the decision of when to use pci-e to the user. They have to understand that there can be subtle breakages because the virtio-pci driver may not work if they are using an old guest. libvirt does like policy so they're
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). Oh I forgot: 6) access to MMIO BARs is painful in the BIOS environment so BIOS would typically need to enable IO for the boot device. But if you want to boot from the 16th device, the BIOS needs to solve this problem anyway. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness
On 06/05/2013 04:14:21 AM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, June 05, 2013 1:54 AM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Caraman Mihai Claudiu-B02008 Subject: Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness On 06/03/2013 03:54:28 PM, Mihai Caraman wrote: Adopt AltiVec approach to increase laziness by calling kvmppc_load_guest_fp() just before returning to guest instaed of each sched in. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com If you did this *before* adding Altivec it would have saved a question in an earlier patch. :-) I kept asking myself about the order and in the end I decided that this is an improvement originated from AltiVec work. FPU may be further cleaned up (get rid of active state, etc). --- arch/powerpc/kvm/booke.c |1 + arch/powerpc/kvm/e500mc.c |2 -- 2 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 019496d..5382238 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1258,6 +1258,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, } else { kvmppc_lazy_ee_enable(); kvmppc_load_guest_altivec(vcpu); + kvmppc_load_guest_fp(vcpu); } } You should probably do these before kvmppc_lazy_ee_enable(). Why? I wanted to look like part of lightweight_exit. We want to minimize the portion of the code that runs with interrupts disabled while telling tracers that interrupts are enabled. We want to minimize the C code run with lazy EE in an inconsistent state. The same applies to kvm_vcpu_run()... Actually, I don't think this is a good idea at all. As I understand it, you're not supposed to take kernel ownersship of floating point in non-atomic context, because an interrupt could itself call enable_kernel_fp(). So lightweight_exit isn't executed in atomic context? Ignore this, I misread what the patch was doing. I thought you were doing the opposite you did. :-P As such, this patch appears to fix the thing I was complaining about -- before, we could have taken an interrupt after kvmppc_core_vcpu_load(), and that interrupt could have claimed the floating point (unlikely with the kernel as is, but you never know what could happen in the future or out-of-tree...). Will be lazyee fixes including kvmppc_fix_ee_before_entry() in 3.10? 64-bit Book3E KVM is unreliable without them. Should we disable e5500 too for 3.10? I hope so... I meant to ask Gleb to take them while Alex was away, but I forgot about them. :-P Alex, are you back from vacation yet? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... PPS: Very lightly tested drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. For native endpoints. Currently virtio would be a legacy endpoint which is quite correct -- it is compatible with a legacy interface. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM Which architectures have PCI but no IO ports? 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM This is not virtio specific. This is true for all devices that use IO. Absolutely. And you will find that modern devices make use of IO ports optional. 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). Am well aware of this, this is why we use PIO. I fully agree with you that when we do MMIO, we should switch the notification mechanism to avoid encoding anything meaningful as data. virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large). In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), We can't do this - it will hurt performance. Can you explain? I thought the whole trick with separating out the virtqueue notification register was to regain the performance? Yes but this trick only works well with NPT (it's still a bit slower than PIO but not so drastically). Without NPT you still need a page walk so it will be slow. give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/05/2013 01:45 PM, Anthony Liguori wrote: But if you want to boot from the 16th device, the BIOS needs to solve this problem anyway. No. Just have the BIOS plumb the path to the device it wants to boot from. Problem solved. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 03:45:09PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: Look, it's very simple. We only need to do it if we do a change that breaks guests. Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). Let me add 3 other issues which I mentioned and you seem to miss: 3) architectures which don't have fast access to IO ports, exist virtio does not work there ATM 4) setups with many PCI bridges exist and have the same issue as PCI express. virtio does not work there ATM 5) On x86, even with nested page tables, firmware only decodes the page address on an invalid PTE, not the data. You need to emulate the guest to get at the data. Without nested page tables, we have to do page table walk and emulate to get both address and data. Since this is how MMIO is implemented in kvm on x86, MMIO is much slower than PIO (with nested page tables by a factor of 2, did not test without). Oh I forgot: 6) access to MMIO BARs is painful in the BIOS environment so BIOS would typically need to enable IO for the boot device. But if you want to boot from the 16th device, the BIOS needs to solve this problem anyway. Regards, Anthony Liguori But it's solvable: just enable devices one by one to boot from them. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, Jun 05, 2013 at 02:10:03PM -0700, H. Peter Anvin wrote: On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. For native endpoints. Currently virtio would be a legacy endpoint which is quite correct -- it is compatible with a legacy interface. -hpa Yes. But it's not the spec wording that we care for most. It's supporting setups that have trouble using IO. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... Nice catch! The patch looks sane and replicates the check done at fill_balloon(). I think we also could use this P.S. as a commentary to let others aware of this scenario. Thanks Luiz! Acked-by: Rafael Aquini aqu...@redhat.com PPS: Very lightly tested drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
H. Peter Anvin h...@zytor.com writes: On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. For native endpoints. Currently virtio would be a legacy endpoint which is quite correct -- it is compatible with a legacy interface. Do legacy endpoints also use 4k for BARs? If not, can't we use a new device id for native endpoints and call it a day? Legacy endpoints would continue using the existing BAR layout. Regards, Anthony Liguori -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: Can you explain? I thought the whole trick with separating out the virtqueue notification register was to regain the performance? Yes but this trick only works well with NPT (it's still a bit slower than PIO but not so drastically). Without NPT you still need a page walk so it will be slow. Do you mean NPT/EPT? If your concern is shadow paging, then I think you're concerned about hardware that is so slow to start with that it's not worth considering. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. Only because the chance it's 100% compatible on the software level is 0. It always has some hardware specific quirks. No such excuse here. It also means we don't need to play games about sometimes enabling IO bars and sometimes not. This last paragraph is wrong, it ignores the issues 3) to 5) I added above. If you do take them into account: - there are reasons to add MMIO BAR to PCI, even without PCI express So far, the only reason you've provided is it doesn't work on some architectures. Which architectures? PowerPC wants this. Existing PowerPC remaps PIO to MMAP so it works fine today. Future platforms may not do this but future platforms can use a different device. They certainly won't be able to use the existing drivers anyway. Ben, am I wrong here? - we won't be able to drop IO BAR from virtio An IO BAR is useless if it means we can't have more than 12 devices. It's not useless. A smart BIOS can enable devices one by one as it tries to boot from them. A smart BIOS can also use MMIO to program virtio. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/05/2013 02:50 PM, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. For native endpoints. Currently virtio would be a legacy endpoint which is quite correct -- it is compatible with a legacy interface. Do legacy endpoints also use 4k for BARs? There are no 4K BARs. In fact, I/O BARs are restricted by spec (there is no technical enforcement, however) to 256 bytes. The 4K come from the upstream bridge windows, which are only 4K granular (historic stuff from when bridges were assumed rare.) However, there can be multiple devices, functions, and BARs inside that window. The issue with PCIe is that each PCIe port is a bridge, so in reality there is only one real device per bus number. If not, can't we use a new device id for native endpoints and call it a day? Legacy endpoints would continue using the existing BAR layout. Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
H. Peter Anvin h...@zytor.com writes: On 06/05/2013 02:50 PM, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: Spec says IO and memory can be enabled/disabled, separately. PCI Express spec says devices should work without IO. For native endpoints. Currently virtio would be a legacy endpoint which is quite correct -- it is compatible with a legacy interface. Do legacy endpoints also use 4k for BARs? There are no 4K BARs. In fact, I/O BARs are restricted by spec (there is no technical enforcement, however) to 256 bytes. The 4K come from the upstream bridge windows, which are only 4K granular (historic stuff from when bridges were assumed rare.) However, there can be multiple devices, functions, and BARs inside that window. Got it. The issue with PCIe is that each PCIe port is a bridge, so in reality there is only one real device per bus number. If not, can't we use a new device id for native endpoints and call it a day? Legacy endpoints would continue using the existing BAR layout. Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? Regards, Anthony Liguori -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote: Existing PowerPC remaps PIO to MMAP so it works fine today. Future platforms may not do this but future platforms can use a different device. They certainly won't be able to use the existing drivers anyway. Ben, am I wrong here? Well, we do remap PIO, though it's ugly and annoying so we'd rather avoid it alltogether :-) Our latest HW PHBs don't support PIO at all, so while we can still support it in the virtual ones in the guest for ever, it's becoming pretty clear that PIO is on the way out ... - we won't be able to drop IO BAR from virtio An IO BAR is useless if it means we can't have more than 12 devices. It's not useless. A smart BIOS can enable devices one by one as it tries to boot from them. A smart BIOS can also use MMIO to program virtio. Indeed :-) I see no reason why not providing both access path though. Have the PIO BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the MMIO window which I'd be happy to favor on power. We could even put somewhere in there a feature bit set by qemu to indicate whether it thinks PIO or MMIO is faster on a given platform if you really think that's worth it (I don't). Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Benjamin Herrenschmidt b...@kernel.crashing.org writes: On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote: A smart BIOS can also use MMIO to program virtio. Indeed :-) I see no reason why not providing both access path though. Have the PIO BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the MMIO window which I'd be happy to favor on power. We could even put somewhere in there a feature bit set by qemu to indicate whether it thinks PIO or MMIO is faster on a given platform if you really think that's worth it (I don't). That's okay, but what I'm most concerned about is compatibility. A virtio PCI device that's a native endpoint needs to have a different device ID than one that is a legacy endpoint. The current drivers have no hope of working (well) with virtio PCI devices exposed as native endpoints. I don't care if the native PCI endpoint also has a PIO bar. But it seems silly (and confusing) to me to make that layout be the legacy layout verses a straight mirror of the new layout if we're already changing the device ID. In addition, it doesn't seem at all necessary to have an MMIO bar to the legacy device. If the reason you want MMIO is to avoid using PIO, then you break existing drivers because they assume PIO. If you are breaking existing drivers then you should change the device ID. If strictly speaking it's just that MMIO is a bit faster, I'm not sure that complexity is worth it without seeing performance numbers first. Regards, Anthony Liguori Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, 5 Jun 2013 18:24:49 -0300 Rafael Aquini aqu...@redhat.com wrote: On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... Nice catch! The patch looks sane and replicates the check done at fill_balloon(). I think we also could use this P.S. as a commentary to let others aware of this scenario. Thanks Luiz! Want me to respin? Acked-by: Rafael Aquini aqu...@redhat.com Thanks for your review! PPS: Very lightly tested drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, 2013-06-05 at 17:53 -0500, Anthony Liguori wrote: If strictly speaking it's just that MMIO is a bit faster, I'm not sure that complexity is worth it without seeing performance numbers first. You mean PIO. I agree with all your points here. The only thing I saw as an option would be to add a PIO BAR that is an exact mirror of the MMIO BAR if and only if: - PIO is indeed significantly faster on platforms we care about and/or - There is significant simplification in things like BIOSes in using PIO over MMIO I personally don't care and would chose to use MMIO always including in firmware on ppc. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, Jun 05, 2013 at 07:08:44PM -0400, Luiz Capitulino wrote: On Wed, 5 Jun 2013 18:24:49 -0300 Rafael Aquini aqu...@redhat.com wrote: On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... Nice catch! The patch looks sane and replicates the check done at fill_balloon(). I think we also could use this P.S. as a commentary to let others aware of this scenario. Thanks Luiz! Want me to respin? That would be great, indeed. I guess the commentary could also go for the same if case at fill_balloon(), assuming the tests are placed to prevent the same scenario you described at changelog. You can stick my Ack on it, if reposting. Cheers! -- Rafael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Regards, Anthony Liguori -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intercepting task switches in svm/vmx with tdp enabled
Thanks much for the reply. It seems Linux stopped using the hardware context switch mechanisms ( like far jmp ) since kernel version 2.2 ( per understanding linux kernel book ). For now, I am just going to use cr3 write interception to detect guest process context switches. ( on a related note however, with linux running in a single cpu guest vm, I see interceptions printing writes to cr3 with same value as the one that already is in the register - possibly threads or other scenarios ) Thanks, -Leo On Wed, Jun 5, 2013 at 1:16 AM, Gleb Natapov g...@redhat.com wrote: On Wed, Jun 05, 2013 at 12:51:29AM -0500, Leo Prasath wrote: Hi, I am interested in intercepting task switches in vmx/svm in 64 bit mode with ept/npt enabled. However, I am not seeing the exit code due to task switch ( 9 for vmx and 125 for svm ) in the list of vm exits that I see in a typical guest run. I do not think task switch exit means what you think it means. This is not OS context switches, but some x86 cpu concept of task that can be switched by using HW mechanism. No modern OS uses it. Actually in 64 bit mode it does not exists at all. I log the vm exit codes in the x86/svm.c:handle_exit method for svm and x86/vmx.c:vmx_handle_exit for vmx. Any pointers regarding this is very much appreciated. On a related note, does cr3 write interception approximate task switch interception ? Depending on how OS works. For Linux it is probably true (if cr3 value changes). ( I was able to intercept cr3 writes with svm while npt was enabled. but with vmx, I could intercept cr3 writes only with ept disabled ) Thanks, Leo Looking through the manuals, svm has a control bit in VMCS for enabling / disabling task switch interception while vmx does not seem to have such a control bit. Again, this is not task switch you are looking for. - Excerpts from the manuals : Intel -- Exit reason #9 indicates a vm exit due to task switch. Vol. 3C 24-9 : Some instructions cause VM exits regardless of the settings of the processor-based VM-execution controls (see Section 25.1.2), as do task switches (see Section 25.2). Vol. 3C 25-6 : Task switches. Task switches are not allowed in VMX non-root operation. Any attempt to effect a task switch in VMX non-root operation causes a VM exit. See Section 25.4.2 AMD --- Intercept code to look for is: 7Dh VMEXIT_TASK_SWITCH task switch 15.14 AMD64 Technology Miscellaneous Intercepts : The SVM architecture includes intercepts to handle task switches, processor freezes due to FERR, and shutdown operations. Task switches can modify several resources that a VMM may want to protect (CR3, EFLAGS, LDT). However, instead of checking various intercepts (e.g., CR3 Write, LDTR Write) individually, task switches check only a single intercept bit. Page 581 : Layout of VMCB says Byte offset 00Ch : bit 29 Intercept task switches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
On 06/05/2013 09:44 PM, Sergei Shtylyov wrote: Hello. On 05-06-2013 11:40, Jason Wang wrote: When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..b07d96b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; -} +} else You have to use {} on the *else* branch if you have it of the *if* branch (and vice versa), according to Documentation/CodingStyle. checkpatch.pl didn't complain this, will send v2. Thanks +msg.msg_control = NULL; WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] vhost_net: clear msg.control for non-zerocopy case during tx
When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 a0198323 88007c9ebd08 81796b73 88007c9ebd48 8103d66b 7b773e20 8800779f 8800779f43f0 8800779f8418 015c 0062 88007c9ebd58 Call Trace: [81796b73] dump_stack+0x19/0x1e [8103d66b] warn_slowpath_common+0x6b/0xa0 [8103d6b5] warn_slowpath_null+0x15/0x20 [a0197627] handle_tx+0x477/0x4b0 [vhost_net] [a0197690] handle_tx_kick+0x10/0x20 [vhost_net] [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [81061f46] kthread+0xc6/0xd0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 [817a1aec] ret_from_fork+0x7c/0xb0 [81061e80] ? kthread_freezable_should_stop+0x70/0x70 Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- The patch is needed for -stable. Changes from v1: - code style issue fix --- drivers/vhost/net.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b51e23..518622d 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -436,6 +436,8 @@ static void handle_tx(struct vhost_net *net) kref_get(ubufs-kref); } nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; + } else { + msg.msg_control = NULL; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. Now you have a different compatibility problem; how do you know the guest supports the new virtio-pcie net? If you put a virtio-pci card behind a PCI-e bridge today, it's not compliant, but AFAICT it will Just Work. (Modulo the 16-dev limit). I've been assuming we'd avoid a flag day change; that devices would look like existing virtio-pci with capabilities indicating the new config layout. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. Removing both forward and backward compatibility is easy to explain, but I think it'll be harder to deploy. This is your area though, so perhaps I'm wrong. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. Noone expected the new cards to Just Work with old OSes: a new machine meant a new OS and new drivers. Hardware vendors like that. Since virtualization often involves legacy, our priorities might be different. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m
Test access to %bpl via modr/m addressing mode. This case can test another bug in the boot of RHEL5.9 64-bit. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/emulator.c | 41 + 1 file changed, 41 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 96576e5..3563971 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem) report(test, *mem == 0x8400); } +static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + ulong *cr3 = (ulong *)read_cr3(); + uint16_t cx = 0; + + // Pad with RET instructions + memset(insn_page, 0xc3, 4096); + memset(alt_insn_page, 0xc3, 4096); + // Place a trapping instruction in the page to trigger a VMEXIT + insn_page[0] = 0x66; // mov $0x4321, %cx + insn_page[1] = 0xb9; + insn_page[2] = 0x21; + insn_page[3] = 0x43; + insn_page[4] = 0x89; // mov %eax, (%rax) + insn_page[5] = 0x00; + insn_page[6] = 0x90; // nop + // Place mov %cl, %bpl in alt_insn_page for emulator to execuate + // If emulator mistaken addressing %bpl, %cl may be moved to %ch + // %cx will be broken to 0x2121, not 0x4321 + alt_insn_page[4] = 0x40; + alt_insn_page[5] = 0x88; + alt_insn_page[6] = 0xcd; + + // Load the code TLB with insn_page, but point the page tables at + // alt_insn_page (and keep the data TLB clear, for AMD decode assist). + // This will make the CPU trap on the insn_page instruction but the + // hypervisor will see alt_insn_page. + install_page(cr3, virt_to_phys(insn_page), insn_ram); + // Load code TLB + invlpg(insn_ram); + asm volatile(call *%0 : : r(insn_ram+3)); + // Trap, let hypervisor emulate at alt_insn_page + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram); + asm volatile(call *%0 : : r(insn_ram), a(mem)); + asm volatile(:=c(cx)); + report(access bpl in modr/m, cx == 0x4321); +} + int main() { void *mem; @@ -964,6 +1003,8 @@ int main() test_string_io_mmio(mem); + test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram); + printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html