date:20130605

Re: Planning the merge of KVM/arm64

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote:
 On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote:
  On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote:
  Il 04/06/2013 17:43, Christoffer Dall ha scritto:
   Hi Paolo,
  
   I don't think this is an issue. Gleb and Marcelo for example pulled
   RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and
   that wasn't an issue.  If Linus pulls the kvm/next tree first the
   diffstat should be similar and everything clean enough, no?
  
   Catalin has previously expressed his wish to upstream the kvm/arm64
   patches directly through him given the churn in a completely new
   architecture and he wants to make sure that everything looks right.
  
   It's a pretty clean implementation with quite few dependencies and
   merging as a working series should be a priority instead of the
   Kconfig hack, imho.
 
  Ok, let's see what Gleb says.
 
  I have no objection to merge arm64 kvm trough Catalin if it mean less
  churn for everyone. That's what we did with arm and mips. Arm64 kvm
  has a dependency on kvm.git next though, so how Catalin make sure that
  everything looks right? Will he merge kvm.git/next to arm64 tree?
 
 Yes, that was the idea. Everything in kvm/next is considered stable, right?
 
Right. Catalin should wait for kvm.git to be pulled by Linus next merge
windows before sending his pull request then.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vfio: fix crash on rmmod

2013-06-05 Thread Alexey Kardashevskiy

devtmpfs_delete_node() calls devnode() callback with mode==NULL but
vfio still tries to write there.

The patch fixes this.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

---

Steps to reproduce on freshly booted system with no devices given to VFIO:
modprobe vfio
rmmod vfio_iommu_spapr_tce
rmmod vfio
---
 drivers/vfio/vfio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 523c121..259ad28 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1360,7 +1360,7 @@ static const struct file_operations vfio_device_fops = {
  */
 static char *vfio_devnode(struct device *dev, umode_t *mode)
 {
-   if (MINOR(dev-devt) == 0)
+   if (mode  (MINOR(dev-devt) == 0))
*mode = S_IRUGO | S_IWUGO;
 
return kasprintf(GFP_KERNEL, vfio/%s, dev_name(dev));
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling

2013-06-05 Thread Alexey Kardashevskiy

Ben, ping! :)

This series has tiny fixes (capability and ioctl numbers,
changed documentation, compile errors in some configuration).
More details are in the commit messages.
Rebased on v3.10-rc4.


Alexey Kardashevskiy (4):
  KVM: PPC: Add support for multiple-TCE hcalls
  powerpc: Prepare to support kernel handling of IOMMU map/unmap
  KVM: PPC: Add support for IOMMU in-kernel handling
  KVM: PPC: Add hugepage support for IOMMU in-kernel handling

 Documentation/virtual/kvm/api.txt|   45 +++
 arch/powerpc/include/asm/kvm_host.h  |7 +
 arch/powerpc/include/asm/kvm_ppc.h   |   40 ++-
 arch/powerpc/include/asm/pgtable-ppc64.h |4 +
 arch/powerpc/include/uapi/asm/kvm.h  |7 +
 arch/powerpc/kvm/book3s_64_vio.c |  398 -
 arch/powerpc/kvm/book3s_64_vio_hv.c  |  471 --
 arch/powerpc/kvm/book3s_hv.c |   39 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |6 +
 arch/powerpc/kvm/book3s_pr_papr.c|   37 ++-
 arch/powerpc/kvm/powerpc.c   |   15 +
 arch/powerpc/mm/init_64.c|   77 -
 include/uapi/linux/kvm.h |3 +
 13 files changed, 1121 insertions(+), 28 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-05 Thread Alexey Kardashevskiy

This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list into
the TCE table. This cache will be utilized more in the upcoming
VFIO/IOMMU support to continue TCE list processing in the virtual
mode in the case if the real mode handler failed for some reason.

This adds a guest physical to host real address converter
and calls the existing H_PUT_TCE handler. The converting function
is going to be fully utilized by upcoming VFIO supporting patches.

This also implements the KVM_CAP_PPC_MULTITCE capability,
so in order to support the functionality of this patch, QEMU
needs to query for this capability and set the hcall-multi-tce
hypertas property only if the capability is present, otherwise
there will be serious performance degradation.

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---
Changelog:
2013/06/05:
* fixed mistype about IBMVIO in the commit message
* updated doc and moved it to another section
* changed capability number

2013/05/21:
* added kvm_vcpu_arch::tce_tmp
* removed cleanup if put_indirect failed, instead we do not even start
writing to TCE table if we cannot get TCEs from the user and they are
invalid
* kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
and kvmppc_emulated_validate_tce (for the previous item)
* fixed bug with failthrough for H_IPI
* removed all get_user() from real mode handlers
* kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 5f91eda..6c082ff 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to 
userspace to be
 handled.
 
 
+4.83 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability tells the guest that multiple TCE entry add/remove hypercalls
+handling is supported by the kernel. This significanly accelerates DMA
+operations for PPC KVM guests.
+
+Unlike other capabilities in this section, this one does not have an ioctl.
+Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
+H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed to
+the guest. Othwerwise it might be better for the guest to continue using 
H_PUT_TCE
+hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..85d8f26 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
spinlock_t tbacct_lock;
u64 busy_stolen;
u64 busy_preempt;
+
+   unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index a5287fe..e852921b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
-extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-unsigned long ioba, unsigned long tce);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
+   struct kvm_vcpu *vcpu, unsigned long liobn);
+extern long kvmppc_emulated_validate_tce(unsigned long tce);
+extern void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt,
+   unsigned long ioba, unsigned long tce);
+extern long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce);
+extern long kvmppc_virtmode_h_put_tce_indirect(struct

[PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-05 Thread Alexey Kardashevskiy

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests without passing them to QEMU, which should
save time on switching to QEMU and back.

Both real and virtual modes are supported - whenever the kernel
fails to handle TCE request, it passes it to the virtual mode.
If it the virtual mode handlers fail, then the request is passed
to the user mode, for example, to QEMU.

This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
in-kernel handling of IOMMU map/unmap.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job
---
 Documentation/virtual/kvm/api.txt   |   28 +
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/include/uapi/asm/kvm.h |7 ++
 arch/powerpc/kvm/book3s_64_vio.c|  198 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  193 +-
 arch/powerpc/kvm/powerpc.c  |   12 +++
 include/uapi/linux/kvm.h|2 +
 8 files changed, 439 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6c082ff..e962e3b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest 
to continue using H_PUT_T
 hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
+4.84 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+   __u64 liobn;
+   __u32 iommu_id;
+   __u32 flags;
+};
+
+No flag is supported at the moment.
+
+When the guest issues TCE call on a liobn for which a TCE table has been
+registered, the kernel will handle it in real mode, updating the hardware
+TCE table. TCE table calls for other liobns will cause a vm exit and must
+be handled by userspace.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 85d8f26..ac0e2fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct iommu_group *grp;/* used for IOMMU groups */
struct page *pages[0];
 };
 
@@ -611,6 +612,8 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
+   unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
+   unsigned long tce_reason;  /* The reason of switching to the virtmode */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e852921b..934e01d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+   struct kvm_create_spapr_tce_iommu *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_emulated_validate_tce(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..cf82af4 100644
---

[PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-06-05 Thread Alexey Kardashevskiy

The current VFIO-on-POWER implementation supports only user mode
driven mapping, i.e. QEMU is sending requests to map/unmap pages.
However this approach is really slow, so we want to move that to KVM.
Since H_PUT_TCE can be extremely performance sensitive (especially with
network adapters where each packet needs to be mapped/unmapped) we chose
to implement that as a fast hypercall directly in real
mode (processor still in the guest context but MMU off).

To be able to do that, we need to provide some facilities to
access the struct page count within that real mode environment as things
like the sparsemem vmemmap mappings aren't accessible.

This adds an API to increment/decrement page counter as
get_user_pages API used for user mode mapping does not work
in the real mode.

CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Paul Mackerras pau...@samba.org
Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013-05-20:
* PageTail() is replaced by PageCompound() in order to have the same checks
for whether the page is huge in realmode_get_page() and realmode_put_page()
---
 arch/powerpc/include/asm/pgtable-ppc64.h |4 ++
 arch/powerpc/mm/init_64.c|   77 +-
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index e3d55f6f..7b46e5f 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -376,6 +376,10 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t 
*pgdir, unsigned long ea,
 }
 #endif /* !CONFIG_HUGETLB_PAGE */
 
+struct page *realmode_pfn_to_page(unsigned long pfn);
+int realmode_get_page(struct page *page);
+int realmode_put_page(struct page *page);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index a90b9c4..ce3d8d4 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -297,5 +297,80 @@ void vmemmap_free(unsigned long start, unsigned long end)
 {
 }
 
-#endif /* CONFIG_SPARSEMEM_VMEMMAP */
+/*
+ * We do not have access to the sparsemem vmemmap, so we fallback to
+ * walking the list of sparsemem blocks which we already maintain for
+ * the sake of crashdump. In the long run, we might want to maintain
+ * a tree if performance of that linear walk becomes a problem.
+ *
+ * Any of realmode_ functions can fail due to:
+ * 1) As real sparsemem blocks do not lay in RAM continously (they
+ * are in virtual address space which is not available in the real mode),
+ * the requested page struct can be split between blocks so get_page/put_page
+ * may fail.
+ * 2) When huge pages are used, the get_page/put_page API will fail
+ * in real mode as the linked addresses in the page struct are virtual
+ * too.
+ * When 1) or 2) takes place, the API returns an error code to cause
+ * an exit to kernel virtual mode where the operation will be completed.
+ */
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+   struct vmemmap_backing *vmem_back;
+   struct page *page;
+   unsigned long page_size = 1  mmu_psize_defs[mmu_vmemmap_psize].shift;
+   unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
+
+   for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back-list) {
+   if (pg_va  vmem_back-virt_addr)
+   continue;
 
+   /* Check that page struct is not split between real pages */
+   if ((pg_va + sizeof(struct page)) 
+   (vmem_back-virt_addr + page_size))
+   return NULL;
+
+   page = (struct page *) (vmem_back-phys + pg_va -
+   vmem_back-virt_addr);
+   return page;
+   }
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#elif defined(CONFIG_FLATMEM)
+
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+   struct page *page = pfn_to_page(pfn);
+   return page;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
+
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
+int realmode_get_page(struct page *page)
+{
+   if (PageCompound(page))
+   return -EAGAIN;
+
+   get_page(page);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(realmode_get_page);
+
+int realmode_put_page(struct page *page)
+{
+   if (PageCompound(page))
+   return -EAGAIN;
+
+   if (!atomic_add_unless(page-_count, -1, 1))
+   return -EAGAIN;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(realmode_put_page);
+#endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH 4/4] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-05 Thread Alexey Kardashevskiy

This adds special support for huge pages (16MB).  The reference
counting cannot be easily done for such pages in real mode (when
MMU is off) so we added a list of huge pages.  It is populated in
virtual mode and get_page is called just once per a huge page.
Real mode handlers check if the requested page is huge and in the list,
then no reference counting is done, otherwise an exit to virtual mode
happens.  The list is released at KVM exit.  At the moment the fastest
card available for tests uses up to 9 huge pages so walking through this
list is not very expensive.  However this can change and we may want
to optimize this.

Cc: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org

---

Changes:
2013/06/05:
* fixed compile error when CONFIG_IOMMU_API=n

2013/05/20:
* the real mode handler now searches for a huge page by gpa (used to be pte)
* the virtual mode handler prints warning if it is called twice for the same
huge page as the real mode handler is expected to fail just once - when a huge
page is not in the list yet.
* the huge page is refcounted twice - when added to the hugepage list and
when used in the virtual mode hcall handler (can be optimized but it will
make the patch less nice).
---
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   22 +
 arch/powerpc/kvm/book3s_64_vio.c|   88 +--
 arch/powerpc/kvm/book3s_64_vio_hv.c |   40 ++--
 4 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ac0e2fe..4fc0865 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -181,6 +181,8 @@ struct kvmppc_spapr_tce_table {
u64 liobn;
u32 window_size;
struct iommu_group *grp;/* used for IOMMU groups */
+   struct list_head hugepages; /* used for IOMMU groups */
+   spinlock_t hugepages_lock;  /* used for IOMMU groups */
struct page *pages[0];
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 934e01d..9054df0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -149,6 +149,28 @@ extern long kvmppc_virtmode_h_put_tce_indirect(struct 
kvm_vcpu *vcpu,
 extern long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
unsigned long liobn, unsigned long ioba,
unsigned long tce_value, unsigned long npages);
+
+/*
+ * The KVM guest can be backed with 16MB pages (qemu switch
+ * -mem-path /var/lib/hugetlbfs/global/pagesize-16MB/).
+ * In this case, we cannot do page counting from the real mode
+ * as the compound pages are used - they are linked in a list
+ * with pointers as virtual addresses which are inaccessible
+ * in real mode.
+ *
+ * The code below keeps a 16MB pages list and uses page struct
+ * in real mode if it is already locked in RAM and inserted into
+ * the list or switches to the virtual mode where it can be
+ * handled in a usual manner.
+ */
+struct kvmppc_iommu_hugepage {
+   struct list_head list;
+   pte_t pte;  /* Huge page PTE */
+   unsigned long gpa;  /* Guest physical address */
+   struct page *page;  /* page struct of the very first subpage */
+   unsigned long size; /* Huge page size (always 16MB at the moment) */
+};
+
 extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
struct kvm_allocate_rma *rma);
 extern struct kvmppc_linear_info *kvm_alloc_rma(void);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index ffb4698..9e2ba4d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,6 +45,71 @@
 #define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
 #define ERROR_ADDR  ((void *)~(unsigned long)0x0)
 
+#ifdef CONFIG_IOMMU_API
+/* Adds a new huge page descriptor to the list  */
+static long kvmppc_iommu_hugepage_try_add(
+   struct kvmppc_spapr_tce_table *tt,
+   pte_t pte, unsigned long hva, unsigned long gpa,
+   unsigned long pg_size)
+{
+   long ret = 0;
+   struct kvmppc_iommu_hugepage *hp;
+   struct page *p;
+
+   spin_lock(tt-hugepages_lock);
+   list_for_each_entry(hp, tt-hugepages, list) {
+   if (hp-pte == pte)
+   goto unlock_exit;
+   }
+
+   hva = hva  ~(pg_size - 1);
+   ret = get_user_pages_fast(hva, 1, true/*write*/, p);
+   if ((ret != 1) || !p) {
+   ret = -EFAULT;
+   goto unlock_exit;
+   }
+   ret = 0;
+
+   hp = kzalloc(sizeof(*hp), GFP_KERNEL);
+   if (!hp) {
+   ret = -ENOMEM;
+   goto unlock_exit;
+   }
+
+   hp-page = p;
+   hp-pte = pte;
+   hp-gpa = gpa  ~(pg_size - 1);
+

Re: Intercepting task switches in svm/vmx with tdp enabled

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 12:51:29AM -0500, Leo Prasath wrote:
 Hi,
 
 I am interested in intercepting task switches in vmx/svm in 64 bit
 mode with ept/npt enabled.
 However, I am not seeing the exit code due to task switch ( 9 for vmx
 and 125 for svm ) in the list of vm exits that I see in a typical
 guest run.
I do not think task switch exit means what you think it means. This is
not OS context switches, but some x86 cpu concept of task that can be
switched by using HW mechanism. No modern OS uses it. Actually in 64 bit
mode it does not exists at all.

 I log the vm exit codes in the x86/svm.c:handle_exit method for svm
 and x86/vmx.c:vmx_handle_exit for vmx.
 
 Any pointers regarding this is very much appreciated.
 
 On a related note, does cr3 write interception approximate task switch
 interception ?
Depending on how OS works. For Linux it is probably true (if cr3 value
changes).

 ( I was able to intercept cr3 writes with svm while npt was enabled.
 but with vmx, I could intercept cr3 writes only with ept disabled )
 
 Thanks,
 Leo
 
 Looking through the manuals, svm has a control bit in VMCS for
 enabling / disabling task switch interception while vmx does not seem
 to have such a control bit.
Again, this is not task switch you are looking for.

 -
 Excerpts from the manuals :
 
 Intel
 --
 
 Exit reason #9 indicates a vm exit due to task switch.
 
 Vol. 3C 24-9 : Some instructions cause VM exits regardless of the
 settings of the processor-based VM-execution controls (see Section
 25.1.2), as
 do task switches (see Section 25.2).
 
 Vol. 3C 25-6 : Task switches. Task switches are not allowed in VMX
 non-root operation. Any attempt to effect a task switch in VMX
 non-root operation causes a VM exit. See Section 25.4.2
 
 AMD
 ---
 
 Intercept code to look for is: 7Dh VMEXIT_TASK_SWITCH task switch
 
 15.14 AMD64 Technology Miscellaneous Intercepts : The SVM architecture
 includes intercepts to handle task switches, processor freezes due to
 FERR, and shutdown operations.
 Task switches can modify several resources that a VMM may want to
 protect (CR3, EFLAGS, LDT).  However, instead of checking various
 intercepts (e.g., CR3 Write, LDTR Write) individually, task switches
 check only a single intercept bit.
 
 Page 581 : Layout of VMCB says Byte offset 00Ch : bit 29 Intercept
 task switches.
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
 Add multibyte NOP test case to kvm-unit-tests. This case can test one
 of bugs when booting RHEL5.9 64-bit.
 
Adding the test to x86/realmode.c will be much easier.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/emulator.c |   33 +
  1 file changed, 33 insertions(+)
 
 diff --git a/x86/emulator.c b/x86/emulator.c
 index 96576e5..f26c70f 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
  report(test, *mem == 0x8400);
  }
 
 +static void test_nopl(uint64_t *mem, uint8_t *insn_page,
 +   uint8_t *alt_insn_page, void *insn_ram)
 +{
 +ulong *cr3 = (ulong *)read_cr3();
 +
 +// Pad with RET instructions
 +memset(insn_page, 0xc3, 4096);
 +memset(alt_insn_page, 0xc3, 4096);
 +// Place a trapping instruction in the page to trigger a VMEXIT
 +insn_page[0] = 0x89; // mov %eax, (%rax)
 +insn_page[1] = 0x00;
 +insn_page[2] = 0x90; // nop
 +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
 +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
 +alt_insn_page[1] = 0x1f;
 +alt_insn_page[2] = 0x00;
 +
 +// Load the code TLB with insn_page, but point the page tables at
 +// alt_insn_page (and keep the data TLB clear, for AMD decode assist).
 +// This will make the CPU trap on the insn_page instruction but the
 +// hypervisor will see alt_insn_page.
 +install_page(cr3, virt_to_phys(insn_page), insn_ram);
 +// Load code TLB
 +invlpg(insn_ram);
 +asm volatile(call *%0 : : r(insn_ram + 3));
 +// Trap, let hypervisor emulate at alt_insn_page
 +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
 +asm volatile(call *%0 : : r(insn_ram), a(mem));
 +report(nopl, 1);
 +}
 +
  int main()
  {
   void *mem;
 @@ -964,6 +995,8 @@ int main()
 
   test_string_io_mmio(mem);
 
 + test_nopl(mem, insn_page, alt_insn_page, insn_ram);
 +
   printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
   return fails ? 1 : 0;
  }
 --
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread 李春奇

Yes, that should be the point. x86/realmode.c is always running in
emulation mode. I added the testing here there but no error occurred.
I cannot find the reason.

The code is as follows added to x86/realmode.c
static void test_nopl(void)
{
MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
exec_in_big_real_mode(insn_nopl);
report(nopl, 0, 1);
}

and I objdump from realmode.flat is as follows:
6458 insn_code_nopl:
6458:   0f 1f 00nopl   (%eax)

But there cause no error when executing this insn. Why?


On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:

 On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
  Add multibyte NOP test case to kvm-unit-tests. This case can test one
  of bugs when booting RHEL5.9 64-bit.
 
 Adding the test to x86/realmode.c will be much easier.

  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
   x86/emulator.c |   33 +
   1 file changed, 33 insertions(+)
 
  diff --git a/x86/emulator.c b/x86/emulator.c
  index 96576e5..f26c70f 100644
  --- a/x86/emulator.c
  +++ b/x86/emulator.c
  @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
   report(test, *mem == 0x8400);
   }
 
  +static void test_nopl(uint64_t *mem, uint8_t *insn_page,
  +   uint8_t *alt_insn_page, void *insn_ram)
  +{
  +ulong *cr3 = (ulong *)read_cr3();
  +
  +// Pad with RET instructions
  +memset(insn_page, 0xc3, 4096);
  +memset(alt_insn_page, 0xc3, 4096);
  +// Place a trapping instruction in the page to trigger a VMEXIT
  +insn_page[0] = 0x89; // mov %eax, (%rax)
  +insn_page[1] = 0x00;
  +insn_page[2] = 0x90; // nop
  +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
  +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
  +alt_insn_page[1] = 0x1f;
  +alt_insn_page[2] = 0x00;
  +
  +// Load the code TLB with insn_page, but point the page tables at
  +// alt_insn_page (and keep the data TLB clear, for AMD decode assist).
  +// This will make the CPU trap on the insn_page instruction but the
  +// hypervisor will see alt_insn_page.
  +install_page(cr3, virt_to_phys(insn_page), insn_ram);
  +// Load code TLB
  +invlpg(insn_ram);
  +asm volatile(call *%0 : : r(insn_ram + 3));
  +// Trap, let hypervisor emulate at alt_insn_page
  +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
  +asm volatile(call *%0 : : r(insn_ram), a(mem));
  +report(nopl, 1);
  +}
  +
   int main()
   {
void *mem;
  @@ -964,6 +995,8 @@ int main()
 
test_string_io_mmio(mem);
 
  + test_nopl(mem, insn_page, alt_insn_page, insn_ram);
  +
printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
return fails ? 1 : 0;
   }
  --
  1.7.9.5

 --
 Gleb.




--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support

2013-06-05 Thread Caraman Mihai Claudiu-B02008

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 05, 2013 12:39 AM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Alexander Graf
 Subject: Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support
 
 On 06/03/2013 03:54:22 PM, Mihai Caraman wrote:
  Mihai Caraman (6):
KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage
KVM: PPC: Book3E: Refactor SPE_FP exit handling
KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC
KVM: PPC: Book3E: Add AltiVec support
KVM: PPC: Book3E: Add ONE_REG AltiVec support
KVM: PPC: Book3E: Enhance FPU laziness
 
   arch/powerpc/include/asm/kvm_asm.h|   16 ++-
   arch/powerpc/kvm/booke.c  |  189
  
   arch/powerpc/kvm/booke.h  |4 +-
   arch/powerpc/kvm/bookehv_interrupts.S |8 +-
   arch/powerpc/kvm/e500.c   |   10 +-
   arch/powerpc/kvm/e500_emulate.c   |8 +-
   arch/powerpc/kvm/e500mc.c |   10 ++-
   7 files changed, 199 insertions(+), 46 deletions(-)
 
 This looks like a bit much for 3.10 (certainly, subject lines like
 refactor and enhance and add support aren't going to make Linus
 happy given that we're past rc4) so I think we should apply
 http://patchwork.ozlabs.org/patch/242896/ for 3.10.  Then for 3.11,
 revert it after applying this patchset.
 

Why not 1/6 plus e6500 removal?

-Mike


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling

2013-06-05 Thread Caraman Mihai Claudiu-B02008

  +   /*
  +* The interrupt is shared, KVM support for the
  featured unit
  +* is detected at run-time.
  +*/
 
 This is a decent comment for the changelog, but for the code itself it
 seems fairly obvious if you look at the definition of
 kvmppc_supports_spe().

I will move it to change log.

 
  +   bool handled = false;
  +
  +   if (kvmppc_supports_spe()) {
  +#ifdef CONFIG_SPE
  +   if (cpu_has_feature(CPU_FTR_SPE))
 
 Didn't you already check this using kvmppc_supports_spe()?

It makes sense with the next patch. It handles the improbable case of having
CONFIG_ALTIVEC and CONFIG_SPE defined:

if (kvmppc_supports_altivec() || kvmppc_supports_spe()).

 
  case BOOKE_INTERRUPT_SPE_FP_ROUND:
  +#ifdef CONFIG_SPE
  kvmppc_booke_queue_irqprio(vcpu,
  BOOKE_IRQPRIO_SPE_FP_ROUND);
  r = RESUME_GUEST;
  break;
 
 Why not use kvmppc_supports_spe() here, for consistency?

I added cpu_has_feature(CPU_FTR_SPE) for the case specified above, but here
SPE_FP_ROUND is not shared with ALTIVEC. CONFIG_SPE is used in other places
in KVM without this check, shouldn't be all or nothing?

-Mike




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Rusty Russell

Michael S. Tsirkin m...@redhat.com writes:
 By my count, net still has 7 feature bits left, so I don't think the
 feature bits are likely to be a limitation in the next 6 months?

 Yes but you wanted a generic transport feature bit
 for flexible SG layout.
 Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?

If we need it within 6 months, I'd rather do that: this feature would
then be assumed for 1.0, and reserved.  We may do that for other
features, too (the committee will have to review).

 MMIO is a bigger problem.  Linux guests are happy with it: does it break
 the Windows drivers?
 
 Thanks,
 Rusty.

 You mean make BAR0 an MMIO BAR?
 Yes, it would break current windows guests.
 Further, as long as we use same address to notify all queues,
 we would also need to decode the instruction on x86 and that's
 measureably slower than PIO.
 We could go back to discussing hypercall use for notifications,
 but that has its own set of issues...

We might have something ready to deploy in 3 months, but realistically,
I'd rather have it ready and tested outside the main git tree(s) and
push it once the standard is committed.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-05 Thread Jason Wang

When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.

Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
(vhost-net: skip head management if no outstanding).

This solves the following warnings:

WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc 
openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
a0198323 88007c9ebd08 81796b73 88007c9ebd48
8103d66b 7b773e20 8800779f 8800779f43f0
8800779f8418 015c 0062 88007c9ebd58
Call Trace:
[81796b73] dump_stack+0x19/0x1e
[8103d66b] warn_slowpath_common+0x6b/0xa0
[8103d6b5] warn_slowpath_null+0x15/0x20
[a0197627] handle_tx+0x477/0x4b0 [vhost_net]
[a0197690] handle_tx_kick+0x10/0x20 [vhost_net]
[a019541e] vhost_worker+0xfe/0x1a0 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[81061f46] kthread+0xc6/0xd0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70
[817a1aec] ret_from_fork+0x7c/0xb0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/vhost/net.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2b51e23..b07d96b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net)
kref_get(ubufs-kref);
}
nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV;
-   }
+   } else
+   msg.msg_control = NULL;
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock-ops-sendmsg(NULL, sock, msg, len);
if (unlikely(err  0)) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Yes, that should be the point. x86/realmode.c is always running in
 emulation mode. I added the testing here there but no error occurred. I
 cannot find the reason.
 
 The code is as follows added to x86/realmode.c
 static void test_nopl(void)
 {
 MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
 exec_in_big_real_mode(insn_nopl);
 report(nopl, 0, 1);
 }
 
 and I objdump from realmode.flat is as follows:
 6458 insn_code_nopl:
 6458:   0f 1f 00nopl   (%eax)
 
 But there cause no error when executing this insn. Why?
 
Because you probably use cpu that supports unrestricted mode or use AMD
processor. Can you try loading kvm-intel with unrestricted_guest=0
option?

 
 On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:
 
  On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
   Add multibyte NOP test case to kvm-unit-tests. This case can test one
   of bugs when booting RHEL5.9 64-bit.
  
  Adding the test to x86/realmode.c will be much easier.
 
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
x86/emulator.c |   33 +
1 file changed, 33 insertions(+)
  
   diff --git a/x86/emulator.c b/x86/emulator.c
   index 96576e5..f26c70f 100644
   --- a/x86/emulator.c
   +++ b/x86/emulator.c
   @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
report(test, *mem == 0x8400);
}
  
   +static void test_nopl(uint64_t *mem, uint8_t *insn_page,
   +   uint8_t *alt_insn_page, void *insn_ram)
   +{
   +ulong *cr3 = (ulong *)read_cr3();
   +
   +// Pad with RET instructions
   +memset(insn_page, 0xc3, 4096);
   +memset(alt_insn_page, 0xc3, 4096);
   +// Place a trapping instruction in the page to trigger a VMEXIT
   +insn_page[0] = 0x89; // mov %eax, (%rax)
   +insn_page[1] = 0x00;
   +insn_page[2] = 0x90; // nop
   +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
   +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
   +alt_insn_page[1] = 0x1f;
   +alt_insn_page[2] = 0x00;
   +
   +// Load the code TLB with insn_page, but point the page tables at
   +// alt_insn_page (and keep the data TLB clear, for AMD decode
  assist).
   +// This will make the CPU trap on the insn_page instruction but the
   +// hypervisor will see alt_insn_page.
   +install_page(cr3, virt_to_phys(insn_page), insn_ram);
   +// Load code TLB
   +invlpg(insn_ram);
   +asm volatile(call *%0 : : r(insn_ram + 3));
   +// Trap, let hypervisor emulate at alt_insn_page
   +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
   +asm volatile(call *%0 : : r(insn_ram), a(mem));
   +report(nopl, 1);
   +}
   +
int main()
{
 void *mem;
   @@ -964,6 +995,8 @@ int main()
  
 test_string_io_mmio(mem);
  
   + test_nopl(mem, insn_page, alt_insn_page, insn_ram);
   +
 printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
 return fails ? 1 : 0;
}
   --
   1.7.9.5
 
  --
  Gleb.
 
 
 
 
 -- 
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/2] kvm: x86: Emulate MSR_PLATFORM_INFO

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 12:02:16PM -0400, Bandan Das wrote:
 These patches add an emulated MSR_PLATFORM_INFO that kvm guests
 can read as described in section 14.3.2.4 of the Intel SDM. 
 The relevant changes and details are in [2/2]; [1/2] makes vendor_intel
 generic. There are atleat two known applications that fail to run because
 of this MSR missing - Sandra and vTune.
So I really want Intel opinion on this. Right now it is impossible to
implement the MSR correctly in the face of migration (may be with tsc
scaling it will be possible) and while it is unimplemented if application
tries to use it it fails, but if we will implement it application will
just produce incorrect result without any means for user to detect it.

 
 v2: Addressed suggested changes
 
 Bandan Das (2):
   kvm: make vendor_intel a generic function
   kvm: x86: emulate MSR_PLATFORM_INFO
 
  arch/x86/include/asm/kvm_emulate.h| 13 --
  arch/x86/include/asm/kvm_host.h   | 20 +++
  arch/x86/include/uapi/asm/msr-index.h |  2 ++
  arch/x86/kvm/cpuid.c  | 19 ++
  arch/x86/kvm/cpuid.h  | 16 
  arch/x86/kvm/emulate.c| 16 +++-
  arch/x86/kvm/x86.c| 48 
 +++
  7 files changed, 109 insertions(+), 25 deletions(-)
 
 -- 
 1.8.1.4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 03:40:46PM +0800, Jason Wang wrote:
 When we decide not use zero-copy, msg.control should be set to NULL otherwise
 macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
 wrongly.
 
 Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
 (vhost-net: skip head management if no outstanding).
 
 This solves the following warnings:
 
 WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
 Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc 
 openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
 CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
 a0198323 88007c9ebd08 81796b73 88007c9ebd48
 8103d66b 7b773e20 8800779f 8800779f43f0
 8800779f8418 015c 0062 88007c9ebd58
 Call Trace:
 [81796b73] dump_stack+0x19/0x1e
 [8103d66b] warn_slowpath_common+0x6b/0xa0
 [8103d6b5] warn_slowpath_null+0x15/0x20
 [a0197627] handle_tx+0x477/0x4b0 [vhost_net]
 [a0197690] handle_tx_kick+0x10/0x20 [vhost_net]
 [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net]
 [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [81061f46] kthread+0xc6/0xd0
 [81061e80] ? kthread_freezable_should_stop+0x70/0x70
 [817a1aec] ret_from_fork+0x7c/0xb0
 [81061e80] ? kthread_freezable_should_stop+0x70/0x70
 
 Signed-off-by: Jason Wang jasow...@redhat.com

Good catch.

Acked-by: Michael S. Tsirkin m...@redhat.com

This needs to go into stable as well.

 ---
  drivers/vhost/net.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 2b51e23..b07d96b 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net)
   kref_get(ubufs-kref);
   }
   nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV;
 - }
 + } else
 + msg.msg_control = NULL;
   /* TODO: Check specific error and bomb out unless ENOBUFS? */
   err = sock-ops-sendmsg(NULL, sock, msg, len);
   if (unlikely(err  0)) {
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [nVMX w/ Haswell] KVM unit-tests in L1 - eventinj test fails trying to send NMI

2013-06-05 Thread Kashyap Chamarthy

Adding Jan, Jun, to see if they have any inputs here.

/kashyap

On Tue, Jun 4, 2013 at 6:14 PM, Kashyap Chamarthy kashyap...@gmail.com wrote:
 Heya,

 So, I invoked this in L1 with:
 ===
 [test@foo kvm-unit-tests]$ time qemu-system-x86_64 -enable-kvm -device
 pc-testdev -serial stdio -nographic -no-user-config -nodefaults
 -device
 isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/eventinj.flat |
 tee /var/tmp/eventinj-test.txt
 enabling apic
 paging enabled
 cr0 = 80010011
 cr3 = 7fff000
 cr4 = 20
 Try to divide by 0
 DE isr running divider is 0
 Result is 150
 DE exception: PASS
 Try int 3
 BP isr running
 After int 3
 BP exception: PASS
 Try send vec 33 to itself
 irq1 running
 After vec 33 to itself
 vec 33: PASS
 Try int $33
 irq1 running
 After int $33
 int $33: PASS
 Try send vec 32 and 33 to itself
 irq1 running
 irq0 running
 After vec 32 and 33 to itself
 vec 32/33: PASS
 Try send vec 32 and int $33
 irq1 running
 irq0 running
 After vec 32 and int $33
 vec 32/int $33: PASS
 Try send vec 33 and 62 and mask one with TPR
 irq1 running
 After 33/62 TPR test
 TPR: PASS
 irq0 running
 Try send NMI to itself
 After NMI to itself
 NMI: FAIL
 Try int 33 with shadowed stack
 irq1 running
 After int 33 with shadowed stack
 int 33 with shadowed stack: PASS

 summary: 9 tests, 1 failures

 real0m0.647s
 user0m0.164s
 sys 0m0.146s
 [test@foo kvm-unit-tests]$
 ===

 Any hints on further debugging this ?


 Other info:
 --

 - L1's qemu-kvm CLI
 ===
 # ps -ef | grep -i qemu
 qemu  5455 1 94 Jun02 ?1-07:14:29
 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name regular-guest -S
 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu Haswell,+vmx -m 10240
 -smp 4,sockets=4,cores=1,threads=1 -uuid
 4ed9ac0b-7f72-dfcf-68b3-e6fe2ac588b2 -nographic -no-user-config
 -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
 -drive 
 file=/home/test/vmimages/regular-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:c1:34,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 root 12255  5419  0 08:41 pts/200:00:00 grep --color=auto -i qemu
 ===

 - Setup details --
 https://github.com/kashyapc/nvmx-haswell/blob/master/SETUP-nVMX.rst

 /kashyap
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness

2013-06-05 Thread Caraman Mihai Claudiu-B02008

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 05, 2013 1:54 AM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Caraman Mihai Claudiu-B02008
 Subject: Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness

 On 06/03/2013 03:54:28 PM, Mihai Caraman wrote:
  Adopt AltiVec approach to increase laziness by calling
  kvmppc_load_guest_fp()
  just before returning to guest instaed of each sched in.

  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com

 If you did this *before* adding Altivec it would have saved a question
 in an earlier patch. :-)

I kept asking myself about the order and in the end I decided that this is
an improvement originated from AltiVec work. FPU may be further cleaned up
(get rid of active state, etc). 

  ---
   arch/powerpc/kvm/booke.c  |1 +
   arch/powerpc/kvm/e500mc.c |2 --
   2 files changed, 1 insertions(+), 2 deletions(-)

  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 019496d..5382238 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -1258,6 +1258,7 @@ int kvmppc_handle_exit(struct kvm_run *run,
  struct kvm_vcpu *vcpu,
  } else {
  kvmppc_lazy_ee_enable();
  kvmppc_load_guest_altivec(vcpu);
  +   kvmppc_load_guest_fp(vcpu);
  }
  }

 You should probably do these before kvmppc_lazy_ee_enable().

Why? I wanted to look like part of lightweight_exit.

 Actually, I don't think this is a good idea at all.  As I understand
 it, you're not supposed to take kernel ownersship of floating point in
 non-atomic context, because an interrupt could itself call
 enable_kernel_fp().

So lightweight_exit isn't executed in atomic context?

Will be lazyee fixes including  kvmppc_fix_ee_before_entry() in 3.10?
64-bit Book3E KVM is unreliable without them. Should we disable e5500 too
for 3.10?

 Do you have benchmarks showing it's even worthwhile?

No yet but isn't this the whole idea of FPU/AltiVec laziness that the kernel
is struggling to achieve? Without it when returning to host (if another process
got unit ownership in handle_exit) we restore and save the unit state for 
nothing.
This multiplies when the ownership goes back and forth between handle_exit and 
other
processes.

Do you have in mid a specific AltiVec benchmark? I have a stress application 
that do
consecutive vector assignments which proved to be very good at finding state
corruptions. If I combine this application on the host with a guest that stays 
longer
on handle_exit (running a tlb or emulation intensive application) I might have 
good
data to support my case.

-Mike

 -Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread 李春奇

Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs
well. I will give another test case in x86/realmode.c later.

BTW, what is the action when a 64-bit instruction executes in
x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c?

On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Yes, that should be the point. x86/realmode.c is always running in
 emulation mode. I added the testing here there but no error occurred. I
 cannot find the reason.

 The code is as follows added to x86/realmode.c
 static void test_nopl(void)
 {
 MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
 exec_in_big_real_mode(insn_nopl);
 report(nopl, 0, 1);
 }

 and I objdump from realmode.flat is as follows:
 6458 insn_code_nopl:
 6458:   0f 1f 00nopl   (%eax)

 But there cause no error when executing this insn. Why?

 Because you probably use cpu that supports unrestricted mode or use AMD
 processor. Can you try loading kvm-intel with unrestricted_guest=0
 option?


 On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:

  On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
   Add multibyte NOP test case to kvm-unit-tests. This case can test one
   of bugs when booting RHEL5.9 64-bit.
  
  Adding the test to x86/realmode.c will be much easier.
 
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
x86/emulator.c |   33 +
1 file changed, 33 insertions(+)
  
   diff --git a/x86/emulator.c b/x86/emulator.c
   index 96576e5..f26c70f 100644
   --- a/x86/emulator.c
   +++ b/x86/emulator.c
   @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
report(test, *mem == 0x8400);
}
  
   +static void test_nopl(uint64_t *mem, uint8_t *insn_page,
   +   uint8_t *alt_insn_page, void *insn_ram)
   +{
   +ulong *cr3 = (ulong *)read_cr3();
   +
   +// Pad with RET instructions
   +memset(insn_page, 0xc3, 4096);
   +memset(alt_insn_page, 0xc3, 4096);
   +// Place a trapping instruction in the page to trigger a VMEXIT
   +insn_page[0] = 0x89; // mov %eax, (%rax)
   +insn_page[1] = 0x00;
   +insn_page[2] = 0x90; // nop
   +// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
   +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
   +alt_insn_page[1] = 0x1f;
   +alt_insn_page[2] = 0x00;
   +
   +// Load the code TLB with insn_page, but point the page tables at
   +// alt_insn_page (and keep the data TLB clear, for AMD decode
  assist).
   +// This will make the CPU trap on the insn_page instruction but the
   +// hypervisor will see alt_insn_page.
   +install_page(cr3, virt_to_phys(insn_page), insn_ram);
   +// Load code TLB
   +invlpg(insn_ram);
   +asm volatile(call *%0 : : r(insn_ram + 3));
   +// Trap, let hypervisor emulate at alt_insn_page
   +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
   +asm volatile(call *%0 : : r(insn_ram), a(mem));
   +report(nopl, 1);
   +}
   +
int main()
{
 void *mem;
   @@ -964,6 +995,8 @@ int main()
  
 test_string_io_mmio(mem);
  
   + test_nopl(mem, insn_page, alt_insn_page, insn_ram);
   +
 printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
 return fails ? 1 : 0;
}
   --
   1.7.9.5
 
  --
  Gleb.
 



 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

 --
 Gleb.



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH 4/6] KVM: PPC: Book3E: Add AltiVec support

2013-06-05 Thread Caraman Mihai Claudiu-B02008

  + * Simulate AltiVec unavailable fault to load guest state
  + * from thread to AltiVec unit.
  + * It requires to be called with preemption disabled.
  + */
  +static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu)
  +{
  +#ifdef CONFIG_ALTIVEC
  +   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
  +   if (!(current-thread.regs-msr  MSR_VEC)) {
  +   load_up_altivec(NULL);
  +   current-thread.regs-msr |= MSR_VEC;
  +   }
  +   }
  +#endif
 
 Why not use kvmppc_supports_altivec()?  In fact, there's nothing
 KVM-specific about these functions...

I will do so, I had this code before kvmppc_supports_altivec() :)

   static inline bool kvmppc_supports_spe(void)
   {
   #ifdef CONFIG_SPE
  @@ -947,7 +1016,7 @@ int kvmppc_handle_exit(struct kvm_run *run,
  struct kvm_vcpu *vcpu,
   */
  bool handled = false;
 
  -   if (kvmppc_supports_spe()) {
  +   if (kvmppc_supports_altivec() || kvmppc_supports_spe())
  {
   #ifdef CONFIG_SPE
  if (cpu_has_feature(CPU_FTR_SPE))
  if (vcpu-arch.shared-msr  MSR_SPE) {
 
 The distinction between how you're handling SPE and Altivec here
 doesn't really have anything to do with SPE versus Altivec -- it's
 PR-mode versus HV-mode.

I was mislead by MSR_SPE bit, we should rename it as MSR_SPV.

-Mike


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs
 well. I will give another test case in x86/realmode.c later.
 
The test fails for me on CPU without unrestricted guest support. This
means you either test on fixed kernel or unrestricted_guest=0 is broken.

 BTW, what is the action when a 64-bit instruction executes in
 x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c?
 
Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c.

 On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote:
  Yes, that should be the point. x86/realmode.c is always running in
  emulation mode. I added the testing here there but no error occurred. I
  cannot find the reason.
 
  The code is as follows added to x86/realmode.c
  static void test_nopl(void)
  {
  MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
  exec_in_big_real_mode(insn_nopl);
  report(nopl, 0, 1);
  }
 
  and I objdump from realmode.flat is as follows:
  6458 insn_code_nopl:
  6458:   0f 1f 00nopl   (%eax)
 
  But there cause no error when executing this insn. Why?
 
  Because you probably use cpu that supports unrestricted mode or use AMD
  processor. Can you try loading kvm-intel with unrestricted_guest=0
  option?
 
 
  On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:
 
   On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
Add multibyte NOP test case to kvm-unit-tests. This case can test one
of bugs when booting RHEL5.9 64-bit.
   
   Adding the test to x86/realmode.c will be much easier.
  
Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/emulator.c |   33 +
 1 file changed, 33 insertions(+)
   
diff --git a/x86/emulator.c b/x86/emulator.c
index 96576e5..f26c70f 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
 report(test, *mem == 0x8400);
 }
   
+static void test_nopl(uint64_t *mem, uint8_t *insn_page,
+   uint8_t *alt_insn_page, void *insn_ram)
+{
+ulong *cr3 = (ulong *)read_cr3();
+
+// Pad with RET instructions
+memset(insn_page, 0xc3, 4096);
+memset(alt_insn_page, 0xc3, 4096);
+// Place a trapping instruction in the page to trigger a VMEXIT
+insn_page[0] = 0x89; // mov %eax, (%rax)
+insn_page[1] = 0x00;
+insn_page[2] = 0x90; // nop
+// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
+alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
+alt_insn_page[1] = 0x1f;
+alt_insn_page[2] = 0x00;
+
+// Load the code TLB with insn_page, but point the page tables at
+// alt_insn_page (and keep the data TLB clear, for AMD decode
   assist).
+// This will make the CPU trap on the insn_page instruction but 
the
+// hypervisor will see alt_insn_page.
+install_page(cr3, virt_to_phys(insn_page), insn_ram);
+// Load code TLB
+invlpg(insn_ram);
+asm volatile(call *%0 : : r(insn_ram + 3));
+// Trap, let hypervisor emulate at alt_insn_page
+install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
+asm volatile(call *%0 : : r(insn_ram), a(mem));
+report(nopl, 1);
+}
+
 int main()
 {
  void *mem;
@@ -964,6 +995,8 @@ int main()
   
  test_string_io_mmio(mem);
   
+ test_nopl(mem, insn_page, alt_insn_page, insn_ram);
+
  printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
  return fails ? 1 : 0;
 }
--
1.7.9.5
  
   --
   Gleb.
  
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.
 
 
 
 -- 
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Planning the merge of KVM/arm64

2013-06-05 Thread Catalin Marinas

On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote:
 On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote:
  On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote:
   On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote:
   Il 04/06/2013 17:43, Christoffer Dall ha scritto:
Hi Paolo,
   
I don't think this is an issue. Gleb and Marcelo for example pulled
RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and
that wasn't an issue.  If Linus pulls the kvm/next tree first the
diffstat should be similar and everything clean enough, no?
   
Catalin has previously expressed his wish to upstream the kvm/arm64
patches directly through him given the churn in a completely new
architecture and he wants to make sure that everything looks right.
   
It's a pretty clean implementation with quite few dependencies and
merging as a working series should be a priority instead of the
Kconfig hack, imho.
  
   Ok, let's see what Gleb says.
  
   I have no objection to merge arm64 kvm trough Catalin if it mean less
   churn for everyone. That's what we did with arm and mips. Arm64 kvm
   has a dependency on kvm.git next though, so how Catalin make sure that
   everything looks right? Will he merge kvm.git/next to arm64 tree?
  
  Yes, that was the idea. Everything in kvm/next is considered stable, right?
  
 Right. Catalin should wait for kvm.git to be pulled by Linus next merge
 windows before sending his pull request then.

I think it's better if I push the bulk of the arm64 KVM branch but
without Kconfig patch enabling it. This branch would be based on
mainline rather than kvm/next. Once your code goes in mainline, I'll
just push the Kconfig entry (for bisection reasons, it could be after
-rc1). This would keep the pull-request diffstat cleaner.

As we discussed some time ago, after the core arm64 KVM is merged you
will use the same workflow as for arm (merge via the kvm tree).

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread 李春奇

Add multibyte NOP test case to kvm-unit-tests. This version adds test
cases into  x86/realmode.c. This can test one of bugs when booting
RHEL5.9 64-bit.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/realmode.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index 981be08..e6e48c9 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1504,6 +1504,29 @@ static void test_fninit(void)
  report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
 }

+static void test_nopl(void)
+{
+MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
+MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
+MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
+MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
+MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); //
5 bytes nop
+MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00,
0x00\n\r); // 6 bytes nop
+MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00,
0x00\n\r); // 7 bytes nop
+MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00,
0x00, 0x00\n\r); // 8 bytes nop
+MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00,
0x00, 0x00, 0x00\n\r); // 9 bytes nop
+exec_in_big_real_mode(insn_nopl1);
+exec_in_big_real_mode(insn_nopl2);
+exec_in_big_real_mode(insn_nopl3);
+exec_in_big_real_mode(insn_nopl4);
+exec_in_big_real_mode(insn_nopl5);
+exec_in_big_real_mode(insn_nopl6);
+exec_in_big_real_mode(insn_nopl7);
+exec_in_big_real_mode(insn_nopl8);
+exec_in_big_real_mode(insn_nopl9);
+report(nopl, 0, 1);
+}
+
 void realmode_start(void)
 {
  test_null();
@@ -1548,6 +1571,7 @@ void realmode_start(void)
  test_xlat();
  test_salc();
  test_fninit();
+ test_nopl();

  exit(0);
 }
--
1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread 李春奇

I mean after adding unrestricted_guest=0, the error is reproduced.
Sorry for confused expression. I have committed another patch in
x86/realmode.c.

On Wed, Jun 5, 2013 at 5:28 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs
 well. I will give another test case in x86/realmode.c later.

 The test fails for me on CPU without unrestricted guest support. This
 means you either test on fixed kernel or unrestricted_guest=0 is broken.

 BTW, what is the action when a 64-bit instruction executes in
 x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c?

 Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c.

 On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote:
  Yes, that should be the point. x86/realmode.c is always running in
  emulation mode. I added the testing here there but no error occurred. I
  cannot find the reason.
 
  The code is as follows added to x86/realmode.c
  static void test_nopl(void)
  {
  MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
  exec_in_big_real_mode(insn_nopl);
  report(nopl, 0, 1);
  }
 
  and I objdump from realmode.flat is as follows:
  6458 insn_code_nopl:
  6458:   0f 1f 00nopl   (%eax)
 
  But there cause no error when executing this insn. Why?
 
  Because you probably use cpu that supports unrestricted mode or use AMD
  processor. Can you try loading kvm-intel with unrestricted_guest=0
  option?
 
 
  On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:
 
   On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li wrote:
Add multibyte NOP test case to kvm-unit-tests. This case can test one
of bugs when booting RHEL5.9 64-bit.
   
   Adding the test to x86/realmode.c will be much easier.
  
Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/emulator.c |   33 +
 1 file changed, 33 insertions(+)
   
diff --git a/x86/emulator.c b/x86/emulator.c
index 96576e5..f26c70f 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
 report(test, *mem == 0x8400);
 }
   
+static void test_nopl(uint64_t *mem, uint8_t *insn_page,
+   uint8_t *alt_insn_page, void *insn_ram)
+{
+ulong *cr3 = (ulong *)read_cr3();
+
+// Pad with RET instructions
+memset(insn_page, 0xc3, 4096);
+memset(alt_insn_page, 0xc3, 4096);
+// Place a trapping instruction in the page to trigger a VMEXIT
+insn_page[0] = 0x89; // mov %eax, (%rax)
+insn_page[1] = 0x00;
+insn_page[2] = 0x90; // nop
+// Place nopl 0x0(%eax) in alt_insn_page for emulator to execuate
+alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
+alt_insn_page[1] = 0x1f;
+alt_insn_page[2] = 0x00;
+
+// Load the code TLB with insn_page, but point the page tables at
+// alt_insn_page (and keep the data TLB clear, for AMD decode
   assist).
+// This will make the CPU trap on the insn_page instruction but 
the
+// hypervisor will see alt_insn_page.
+install_page(cr3, virt_to_phys(insn_page), insn_ram);
+// Load code TLB
+invlpg(insn_ram);
+asm volatile(call *%0 : : r(insn_ram + 3));
+// Trap, let hypervisor emulate at alt_insn_page
+install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
+asm volatile(call *%0 : : r(insn_ram), a(mem));
+report(nopl, 1);
+}
+
 int main()
 {
  void *mem;
@@ -964,6 +995,8 @@ int main()
   
  test_string_io_mmio(mem);
   
+ test_nopl(mem, insn_page, alt_insn_page, insn_ram);
+
  printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
  return fails ? 1 : 0;
 }
--
1.7.9.5
  
   --
   Gleb.
  
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.



 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

 --
 Gleb.



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of multibyte NOP in emulation mode

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 05:46:31PM +0800, 李春奇 Arthur Chunqi Li wrote:
 I mean after adding unrestricted_guest=0, the error is reproduced.
Ah, OK. unrestricted_guest=0 works then :)

 Sorry for confused expression. I have committed another patch in
 x86/realmode.c.
 
 On Wed, Jun 5, 2013 at 5:28 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 05:23:18PM +0800, 李春奇 Arthur Chunqi Li wrote:
  Yes, I load kvm-intel with unrestricted_guest=0 and the emulator runs
  well. I will give another test case in x86/realmode.c later.
 
  The test fails for me on CPU without unrestricted guest support. This
  means you either test on fixed kernel or unrestricted_guest=0 is broken.
 
  BTW, what is the action when a 64-bit instruction executes in
  x86/realmode.c? Should I add 64-bit insn tests only in x86/emulator.c?
 
  Yes, 64-bit or 32-bit instructions should be added to x86/emulator.c.
 
  On Wed, Jun 5, 2013 at 4:27 PM, Gleb Natapov g...@redhat.com wrote:
   On Wed, Jun 05, 2013 at 03:00:33PM +0800, 李春奇 Arthur Chunqi Li wrote:
   Yes, that should be the point. x86/realmode.c is always running in
   emulation mode. I added the testing here there but no error occurred. I
   cannot find the reason.
  
   The code is as follows added to x86/realmode.c
   static void test_nopl(void)
   {
   MK_INSN(nopl, .byte 0x0f, 0x1f, 0x00\n\r);
   exec_in_big_real_mode(insn_nopl);
   report(nopl, 0, 1);
   }
  
   and I objdump from realmode.flat is as follows:
   6458 insn_code_nopl:
   6458:   0f 1f 00nopl   (%eax)
  
   But there cause no error when executing this insn. Why?
  
   Because you probably use cpu that supports unrestricted mode or use AMD
   processor. Can you try loading kvm-intel with unrestricted_guest=0
   option?
  
  
   On Wed, Jun 5, 2013 at 2:42 PM, Gleb Natapov g...@redhat.com wrote:
  
On Wed, Jun 05, 2013 at 10:16:46AM +0800, 李春奇 Arthur Chunqi Li 
wrote:
 Add multibyte NOP test case to kvm-unit-tests. This case can test 
 one
 of bugs when booting RHEL5.9 64-bit.

Adding the test to x86/realmode.c will be much easier.
   
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/emulator.c |   33 +
  1 file changed, 33 insertions(+)

 diff --git a/x86/emulator.c b/x86/emulator.c
 index 96576e5..f26c70f 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -901,6 +901,37 @@ static void test_simplealu(u32 *mem)
  report(test, *mem == 0x8400);
  }

 +static void test_nopl(uint64_t *mem, uint8_t *insn_page,
 +   uint8_t *alt_insn_page, void *insn_ram)
 +{
 +ulong *cr3 = (ulong *)read_cr3();
 +
 +// Pad with RET instructions
 +memset(insn_page, 0xc3, 4096);
 +memset(alt_insn_page, 0xc3, 4096);
 +// Place a trapping instruction in the page to trigger a VMEXIT
 +insn_page[0] = 0x89; // mov %eax, (%rax)
 +insn_page[1] = 0x00;
 +insn_page[2] = 0x90; // nop
 +// Place nopl 0x0(%eax) in alt_insn_page for emulator to 
 execuate
 +alt_insn_page[0] = 0x0f; // nop DWORD ptr[EAX]
 +alt_insn_page[1] = 0x1f;
 +alt_insn_page[2] = 0x00;
 +
 +// Load the code TLB with insn_page, but point the page tables 
 at
 +// alt_insn_page (and keep the data TLB clear, for AMD decode
assist).
 +// This will make the CPU trap on the insn_page instruction 
 but the
 +// hypervisor will see alt_insn_page.
 +install_page(cr3, virt_to_phys(insn_page), insn_ram);
 +// Load code TLB
 +invlpg(insn_ram);
 +asm volatile(call *%0 : : r(insn_ram + 3));
 +// Trap, let hypervisor emulate at alt_insn_page
 +install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
 +asm volatile(call *%0 : : r(insn_ram), a(mem));
 +report(nopl, 1);
 +}
 +
  int main()
  {
   void *mem;
 @@ -964,6 +995,8 @@ int main()

   test_string_io_mmio(mem);

 + test_nopl(mem, insn_page, alt_insn_page, insn_ram);
 +
   printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
   return fails ? 1 : 0;
  }
 --
 1.7.9.5
   
--
Gleb.
   
  
  
  
   --
   Arthur Chunqi Li
   Department of Computer Science
   School of EECS
   Peking University
   Beijing, China
  
   --
   Gleb.
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.
 
 
 
 -- 
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 10:26:01PM -0300, Marcelo Tosatti wrote:
 On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
  Hi Gleb, Paolo, Marcelo,
  
  I have putted the potential controversial patches to the latter that are
  patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
  I think its are ready for being merged. If not luck enough, further 
  discussion
  is needed, could you please apply that patches first? :)
  
  Thank you in advance!
 
 snip
 
 Looks good to me.
I'll take it as Reviewed-by for the entire series :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages

2013-06-05 Thread Gleb Natapov

On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
 Hi Gleb, Paolo, Marcelo,
 
 I have putted the potential controversial patches to the latter that are
 patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
 I think its are ready for being merged. If not luck enough, further discussion
 is needed, could you please apply that patches first? :)
 
 Thank you in advance!
 
Applied all of them to queue. Thanks!

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: add kvm_para_available to asm-generic/kvm_para.h

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 12:33:19PM +0100, James Hogan wrote:
 On 4 June 2013 10:05, Gleb Natapov g...@redhat.com wrote:
  On Wed, May 22, 2013 at 12:29:22PM +0100, James Hogan wrote:
  According to include/uapi/linux/kvm_para.h architectures should define
  kvm_para_available, so add an implementation to asm-generic/kvm_para.h
  which just returns false.
 
  What is this fixing? The only user of kvm_para_available() that can
  benefit from this is in sound/pci/intel8x0.c, but I do not see follow up
  patch to use it there.
 
 It was an unintentional config with mips + kvm + intel8x0 that hit it
 (I think I accidentally based my mips config off an x86_64 config).
 Kind of equivalent to a randconfig build failure I suppose.
 
Yes, I see. I will queue the fix. Thanks!

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 04:49:22PM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  By my count, net still has 7 feature bits left, so I don't think the
  feature bits are likely to be a limitation in the next 6 months?
 
  Yes but you wanted a generic transport feature bit
  for flexible SG layout.
  Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?
 
 If we need it within 6 months, I'd rather do that: this feature would
 then be assumed for 1.0, and reserved.  We may do that for other
 features, too (the committee will have to review).
 
  MMIO is a bigger problem.  Linux guests are happy with it: does it break
  the Windows drivers?
  
  Thanks,
  Rusty.
 
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 We might have something ready to deploy in 3 months,
 but realistically,
 I'd rather have it ready and tested outside the main git tree(s) and
 push it once the standard is committed.
 
 Cheers,
 Rusty.


We have working code already. We don't need 3 months out of tree,
this gets us no progress. To make it ready to deploy
we need it upstream and people testing it.

I think the issue is the layout change, you don't want
the new config layout before we get standartization rolling, right?
Okay how about this small patch to the linux guest
(completely untested, just to give you the idea):


diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a7ce730..0e34862 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -675,6 +675,33 @@ static void virtio_pci_release_dev(struct device *_d)
 */
 }
 
+/* Map a BAR. But carefully: make sure we don't overlap the MSI-X table */
+static void __iomem * virtio_pci_iomap(struct pci_dev *pci_dev, int bar)
+{
+   int msix_cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+   if (msix_cap) {
+   u32 offset;
+   u8 bir;
+   pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_TABLE,
+ offset);
+   bir = (u8)(offset  PCI_MSIX_TABLE_BIR);
+   offset = PCI_MSIX_TABLE_OFFSET;
+   /* Spec says table offset is in a 4K page all by itself */
+   if (bir == bar  offset  4096)
+   return NULL;
+
+   pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_PBA,
+ offset);
+   bir = (u8)(offset  PCI_MSIX_PBA_BIR);
+   offset = PCI_MSIX_PBA_OFFSET;
+   /* Spec says table offset is in a 4K page all by itself. */
+   if (bir == bar  offset  4096)
+   return NULL;
+   }
+   /* 4K is enough for all devices at the moment. */
+   return pci_iomap(pci_dev, 0, 4096);
+}
+
 /* the PCI probing function */
 static int virtio_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *id)
@@ -716,7 +743,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
if (err)
goto out_enable_device;
 
-   vp_dev-ioaddr = pci_iomap(pci_dev, 0, 0);
+   vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 0);
+   /* Failed to map BAR0? Try with BAR1. */
+   if (vp_dev-ioaddr == NULL) {
+   vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 1);
if (vp_dev-ioaddr == NULL) {
err = -ENOMEM;
goto out_req_regions;


In other words: put a copy of IO config at start of MMIO BAR1,
making sure we don't overlap the MSIX table that's there.

This does not break windows guests, and makes us compliant to the
PCI express spec.

Is this small enough to start going immediately, without waiting
3+ months?

I really think it's a good idea to put something like this
in the field: we might discover more issues around MMIO
and we'll address them in the new config layout.
This way we don't get two changes: new layout and switch to MMIO
all at once.

If you like this, I can make the appropriate spec and qemu changes in a
couple of days.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!

2013-06-05 Thread Michael S. Tsirkin

On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote:
 Hello,
 
 Hit this right after killing trinity with Ctrl-C. Was fuzzing
 v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user.
 
 Tommi

Thanks a lot for the report. If found some bugs when looking
at this: I think they were introduced by
2839400f8fe28ce216eeeba3fb97bdf90977f7ad
though I don't exactly see how ctrl-c can trigger this.
I'll work on patches - is this reproducible at all?

 [29175] Random reseed: 3970521611
 [29175] Random reseed: 202886419
 [29175] Random reseed: 2930978521
 [179904.099501] binder: 29175:2539 ioctl 4010630e fff returned -22
 [29175] Random reseed: 2776471322
 [29175] Random reseed: 3086119361
 child 2606 exiting
 [29175] Bailing main loop. Exit reason: ctrl-c
 [179906.393060] [ cut here ]
 [179906.396341] kernel BUG at /build/linux/mm/slub.c:3352!
 [179906.399693] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 [179906.403272] CPU: 0 PID: 29175 Comm: trinity-main Not tainted 3.10.0-rc4 #1
 [179906.407692] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [179906.411475] task: 8800b69e47c0 ti: 880092f2e000 task.ti:
 880092f2e000
 [179906.416305] RIP: 0010:[81225255]  [81225255]
 kfree+0x155/0x2c0
 [179906.421462] RSP: :880092f2fdb0  EFLAGS: 00010246
 [179906.424983] RAX: 0100 RBX: 88009e588000 RCX:
 
 [179906.429746] RDX: 8800b69e47c0 RSI: 000a0004 RDI:
 88009e588000
 [179906.434499] RBP: 880092f2fdd8 R08: 0001 R09:
 
 [179906.439226] R10:  R11: 0001 R12:
 
 [179906.443835] R13: ea0002796200 R14: 8800b9a960f8 R15:
 8800ba06f6a0
 [179906.448470] FS:  7f04cd25c700() GS:8800bf60()
 knlGS:
 [179906.453857] CS:  0010 DS:  ES:  CR0: 80050033
 [179906.456956] CR2: 7f98e29d8f50 CR3: 9294a000 CR4:
 06f0
 [179906.460558] DR0:  DR1:  DR2:
 
 [179906.464059] DR3:  DR6: 0ff0 DR7:
 0400
 [179906.467617] Stack:
 [179906.468704]  88001a7c  
 8800b9a960f8
 [179906.472638]  8800ba06f6a0 880092f2fdf0 81c1c6df
 88001a7c
 [179906.476583]  880092f2fe18 81c1c771 8800b69718c0
 0008
 [179906.480377] Call Trace:
 [179906.481636]  [81c1c6df] vhost_net_vq_reset+0x7f/0xb0
 [179906.484611]  [81c1c771] vhost_net_release+0x61/0xb0
 [179906.487481]  [8123237a] __fput+0x12a/0x230
 [179906.489968]  [81232489] fput+0x9/0x10
 [179906.492422]  [8113a79e] task_work_run+0xae/0xf0
 [179906.495169]  [811172bc] do_exit+0x44c/0xb40
 [179906.497789]  [822a24d8] ? retint_swapgs+0x13/0x1b
 [179906.500652]  [81117a74] do_group_exit+0x84/0xd0
 [179906.503348]  [81117ad2] SyS_exit_group+0x12/0x20
 [179906.506146]  [822a2e29] system_call_fastpath+0x16/0x1b
 [179906.509147] Code: 49 c1 ed 0c 49 c1 e5 06 49 01 c5 49 8b 45 00 f6
 c4 80 74 0a 4d 8b 6d 30 66 0f 1f 44 00 00 49 8b 45 00 a8 80 75 28 f6
 c4 c0 75 02 0f 0b 49 8b 45 00 31 f6 f6 c4 40 74 04 41 8b 75 68 4c 89
 ef e8
 [179906.522213] RIP  [81225255] kfree+0x155/0x2c0
 [179906.524937]  RSP 880092f2fdb0
 [179906.575627] ---[ end trace 3d4ce10faaa29990 ]---
 [179906.577103] Fixing recursive fault but reboot is needed!
 [29174] Watchdog exiting
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-05 Thread Stefan Pietsch

On 19.05.2013 14:32, Gleb Natapov wrote:
 On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote:
 Dear KVM maintainers, it appears that there is a gap in x86 emulation,
 at least on a 32-bit host.  Stefan found this when running GRML, a live
 distribution which can be downloaded from:
 http://download.grml.org/grml32-full_2013.02.iso.  His original
 reported is at http://bugs.debian.org/707257.

 Can you verify with latest linux.git HEAD? It works for me there on
 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time frame,
 so it would be helpful if you'll test 32bit before I install one myself.


Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse.

The virtual machine tries to boot the kernel, but stops after a few
seconds and the kern.log shows:

kernel: [13851.000412] kvm [7482]: vcpu0 disabled perfctr wrmsr: 0xc1
data 0x


virtual machine was started with:
qemu-system-i386 -machine accel=kvm -m 512 -cdrom grml32-full_2013.02.iso

qemu-system-x86: 1.5.0+dfsg-3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!

2013-06-05 Thread Tommi Rantala

2013/6/5 Michael S. Tsirkin m...@redhat.com:
 On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote:
 Hello,

 Hit this right after killing trinity with Ctrl-C. Was fuzzing
 v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user.

 Tommi

 Thanks a lot for the report. If found some bugs when looking
 at this: I think they were introduced by
 2839400f8fe28ce216eeeba3fb97bdf90977f7ad
 though I don't exactly see how ctrl-c can trigger this.
 I'll work on patches - is this reproducible at all?

Thanks, glad to hear that the report was useful.

Yes, I did reproduce this quite quickly yesterday with trinity, but
did not dig any deeper into what was going on.

Tommi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost kernel BUG at /build/linux/mm/slub.c:3352!

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 03:06:33PM +0300, Tommi Rantala wrote:
 2013/6/5 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jun 04, 2013 at 09:50:59PM +0300, Tommi Rantala wrote:
  Hello,
 
  Hit this right after killing trinity with Ctrl-C. Was fuzzing
  v3.10-rc4-0-gd683b96 in a qemu virtual machine as the root user.
 
  Tommi
 
  Thanks a lot for the report. If found some bugs when looking
  at this: I think they were introduced by
  2839400f8fe28ce216eeeba3fb97bdf90977f7ad
  though I don't exactly see how ctrl-c can trigger this.
  I'll work on patches - is this reproducible at all?
 
 Thanks, glad to hear that the report was useful.
 
 Yes, I did reproduce this quite quickly yesterday with trinity, but
 did not dig any deeper into what was going on.
 
 Tommi

Great, I'll post patches and we can see if it's fixed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote:
 On 19.05.2013 14:32, Gleb Natapov wrote:
  On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote:
  Dear KVM maintainers, it appears that there is a gap in x86 emulation,
  at least on a 32-bit host.  Stefan found this when running GRML, a live
  distribution which can be downloaded from:
  http://download.grml.org/grml32-full_2013.02.iso.  His original
  reported is at http://bugs.debian.org/707257.
 
  Can you verify with latest linux.git HEAD? It works for me there on
  64bit. There were a lot of problems fixed in this area in 3.9/3.10 time 
  frame,
  so it would be helpful if you'll test 32bit before I install one myself.
 
 
 Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse.
 
 The virtual machine tries to boot the kernel, but stops after a few
 seconds and the kern.log shows:
At what point does it stop?

 
 kernel: [13851.000412] kvm [7482]: vcpu0 disabled perfctr wrmsr: 0xc1
 data 0x
 
That's harmless.

 
 virtual machine was started with:
 qemu-system-i386 -machine accel=kvm -m 512 -cdrom grml32-full_2013.02.iso
 
 qemu-system-x86: 1.5.0+dfsg-3

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-05 Thread Stefan Pietsch

On 05.06.2013 14:10, Gleb Natapov wrote:
 On Wed, Jun 05, 2013 at 01:57:25PM +0200, Stefan Pietsch wrote:
 On 19.05.2013 14:32, Gleb Natapov wrote:
 On Sun, May 19, 2013 at 02:00:31AM +0100, Ben Hutchings wrote:
 Dear KVM maintainers, it appears that there is a gap in x86 emulation,
 at least on a 32-bit host.  Stefan found this when running GRML, a live
 distribution which can be downloaded from:
 http://download.grml.org/grml32-full_2013.02.iso.  His original
 reported is at http://bugs.debian.org/707257.

 Can you verify with latest linux.git HEAD? It works for me there on
 64bit. There were a lot of problems fixed in this area in 3.9/3.10 time 
 frame,
 so it would be helpful if you'll test 32bit before I install one myself.


 Kernel version 3.9.4-1 (linux-image-3.9-1-686-pae) made things worse.

 The virtual machine tries to boot the kernel, but stops after a few
 seconds and the kern.log shows:
 At what point does it stop?


The machine stops at:

Performance Events: Broken PMU hardware detected, using software events
only.
Failed to access perfctr msr (MSR c1 is 0)
Enabling APIC mode:  Flat.  Using 1 I/O APICs

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Planning the merge of KVM/arm64

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 10:31:46AM +0100, Catalin Marinas wrote:
 On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote:
  On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote:
   On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote:
On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote:
Il 04/06/2013 17:43, Christoffer Dall ha scritto:
 Hi Paolo,

 I don't think this is an issue. Gleb and Marcelo for example pulled
 RMK's stable tree for my KVM/ARM updates for the 3.10 merge window 
 and
 that wasn't an issue.  If Linus pulls the kvm/next tree first the
 diffstat should be similar and everything clean enough, no?

 Catalin has previously expressed his wish to upstream the kvm/arm64
 patches directly through him given the churn in a completely new
 architecture and he wants to make sure that everything looks right.

 It's a pretty clean implementation with quite few dependencies and
 merging as a working series should be a priority instead of the
 Kconfig hack, imho.
   
Ok, let's see what Gleb says.
   
I have no objection to merge arm64 kvm trough Catalin if it mean less
churn for everyone. That's what we did with arm and mips. Arm64 kvm
has a dependency on kvm.git next though, so how Catalin make sure that
everything looks right? Will he merge kvm.git/next to arm64 tree?
   
   Yes, that was the idea. Everything in kvm/next is considered stable, 
   right?
   
  Right. Catalin should wait for kvm.git to be pulled by Linus next merge
  windows before sending his pull request then.
 
 I think it's better if I push the bulk of the arm64 KVM branch but
 without Kconfig patch enabling it. This branch would be based on
 mainline rather than kvm/next. Once your code goes in mainline, I'll
 just push the Kconfig entry (for bisection reasons, it could be after
 -rc1). This would keep the pull-request diffstat cleaner.
 
If there will be no non trivial conflicts between your tree and kvm/next
it should be OK too.

 As we discussed some time ago, after the core arm64 KVM is merged you
 will use the same workflow as for arm (merge via the kvm tree).
 
 Thanks.
 
 -- 
 Catalin

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
 You mean make BAR0 an MMIO BAR?
 Yes, it would break current windows guests.
 Further, as long as we use same address to notify all queues,
 we would also need to decode the instruction on x86 and that's
 measureably slower than PIO.
 We could go back to discussing hypercall use for notifications,
 but that has its own set of issues...

So... does violating the PCI-e spec really matter?  Is it preventing
any guest from working properly?

I don't think we should rush an ABI breakage if the only benefit is
claiming spec compliance.

Regards,

Anthony Liguori


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Planning the merge of KVM/arm64

2013-06-05 Thread Marc Zyngier

On 05/06/13 13:57, Gleb Natapov wrote:
 On Wed, Jun 05, 2013 at 10:31:46AM +0100, Catalin Marinas wrote:
 On Wed, Jun 05, 2013 at 07:01:05AM +0100, Gleb Natapov wrote:
 On Tue, Jun 04, 2013 at 10:57:32PM -0700, Christoffer Dall wrote:
 On 4 June 2013 09:37, Gleb Natapov g...@redhat.com wrote:
 On Tue, Jun 04, 2013 at 05:51:41PM +0200, Paolo Bonzini wrote:
 Il 04/06/2013 17:43, Christoffer Dall ha scritto:
 Hi Paolo,

 I don't think this is an issue. Gleb and Marcelo for example pulled
 RMK's stable tree for my KVM/ARM updates for the 3.10 merge window and
 that wasn't an issue.  If Linus pulls the kvm/next tree first the
 diffstat should be similar and everything clean enough, no?

 Catalin has previously expressed his wish to upstream the kvm/arm64
 patches directly through him given the churn in a completely new
 architecture and he wants to make sure that everything looks right.

 It's a pretty clean implementation with quite few dependencies and
 merging as a working series should be a priority instead of the
 Kconfig hack, imho.

 Ok, let's see what Gleb says.

 I have no objection to merge arm64 kvm trough Catalin if it mean less
 churn for everyone. That's what we did with arm and mips. Arm64 kvm
 has a dependency on kvm.git next though, so how Catalin make sure that
 everything looks right? Will he merge kvm.git/next to arm64 tree?

 Yes, that was the idea. Everything in kvm/next is considered stable, right?

 Right. Catalin should wait for kvm.git to be pulled by Linus next merge
 windows before sending his pull request then.

 I think it's better if I push the bulk of the arm64 KVM branch but
 without Kconfig patch enabling it. This branch would be based on
 mainline rather than kvm/next. Once your code goes in mainline, I'll
 just push the Kconfig entry (for bisection reasons, it could be after
 -rc1). This would keep the pull-request diffstat cleaner.

 If there will be no non trivial conflicts between your tree and kvm/next
 it should be OK too.

In order to make sure no userspace ABI breakage occur during the merge,
can you please make sure that the following values are reserved:
- Capability KVM_CAP_ARM_EL1_32BIT, 93
- ONE_REG architecture KVM_REG_ARM64, 0x6000ULL

So far, nothing clashes with it in kvm/next, but I'd like to be 100% sure...

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost: Make local function static

2013-06-05 Thread Asias He

$ make C=1 M=drivers/vhost

drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not 
declared. Should it be static?
drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not 
declared. Should it be static?
drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not 
declared. Should it be static?

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/net.c  | 4 ++--
 drivers/vhost/scsi.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 05bdc3c..f800901 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -165,7 +165,7 @@ static void vhost_net_clear_ubuf_info(struct vhost_net *n)
}
 }
 
-int vhost_net_set_ubuf_info(struct vhost_net *n)
+static int vhost_net_set_ubuf_info(struct vhost_net *n)
 {
bool zcopy;
int i;
@@ -191,7 +191,7 @@ err:
return -ENOMEM;
 }
 
-void vhost_net_vq_reset(struct vhost_net *n)
+static void vhost_net_vq_reset(struct vhost_net *n)
 {
int i;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index de6d817..35ab0ce 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -216,7 +216,7 @@ static int iov_num_pages(struct iovec *iov)
   ((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
 }
 
-void tcm_vhost_done_inflight(struct kref *kref)
+static void tcm_vhost_done_inflight(struct kref *kref)
 {
struct vhost_scsi_inflight *inflight;
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 05:41:41PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Add multibyte NOP test case to kvm-unit-tests. This version adds test
 cases into  x86/realmode.c. This can test one of bugs when booting
 RHEL5.9 64-bit.
 
The patch is mangled. Lines are wrapped, tabs are replaced with spaces.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/realmode.c |   24 
  1 file changed, 24 insertions(+)
 
 diff --git a/x86/realmode.c b/x86/realmode.c
 index 981be08..e6e48c9 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -1504,6 +1504,29 @@ static void test_fninit(void)
   report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
  }
 
 +static void test_nopl(void)
 +{
 +MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
 +MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
 +MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
 +MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
 +MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); //
 5 bytes nop
 +MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00,
 0x00\n\r); // 6 bytes nop
 +MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00,
 0x00\n\r); // 7 bytes nop
 +MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00,
 0x00, 0x00\n\r); // 8 bytes nop
 +MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00,
 0x00, 0x00, 0x00\n\r); // 9 bytes nop
 +exec_in_big_real_mode(insn_nopl1);
 +exec_in_big_real_mode(insn_nopl2);
 +exec_in_big_real_mode(insn_nopl3);
 +exec_in_big_real_mode(insn_nopl4);
 +exec_in_big_real_mode(insn_nopl5);
 +exec_in_big_real_mode(insn_nopl6);
 +exec_in_big_real_mode(insn_nopl7);
 +exec_in_big_real_mode(insn_nopl8);
 +exec_in_big_real_mode(insn_nopl9);
 +report(nopl, 0, 1);
 +}
 +
  void realmode_start(void)
  {
   test_null();
 @@ -1548,6 +1571,7 @@ void realmode_start(void)
   test_xlat();
   test_salc();
   test_fninit();
 + test_nopl();
 
   exit(0);
  }
 --
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
 kvm_add_routing_entry makes an attempt to
 zero-initialize any new routing entry.
 However, it fails to initialize padding
 within the u field of the structure
 kvm_irq_routing_entry.
 
 Other functions like kvm_irqchip_update_msi_route
 also fail to initialize the padding field in
 kvm_irq_routing_entry.
 
 While mostly harmless, this would prevent us from
 reusing these fields for something useful in
 the future.
 
The fact that kernel does not check them for zero value is what will
prevents us from doing so.

 It's better to just make sure all input is initialized.
 
 Once it is, we can also drop complex field by field assignment and just
 do the simple *a = *b to update a route entry.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  kvm-all.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 405480e..f119ce1 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
  }
  n = s-irq_routes-nr++;
  new = s-irq_routes-entries[n];
 -memset(new, 0, sizeof(*new));
 -new-gsi = entry-gsi;
 -new-type = entry-type;
 -new-flags = entry-flags;
 -new-u = entry-u;
 +
 +*new = *entry;
  
  set_gsi(s, entry-gsi);
  
 @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
  continue;
  }
  
 -entry-type = new_entry-type;
 -entry-flags = new_entry-flags;
 -entry-u = new_entry-u;
 +*entry = *new_entry;
  
  kvm_irqchip_commit_routes(s);
  
 @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
  
  void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
  {
 -struct kvm_irq_routing_entry e;
 +struct kvm_irq_routing_entry e = {};
  
  assert(pin  s-gsi_count);
  
 @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  return virq;
  }
  
 -route = g_malloc(sizeof(KVMMSIRoute));
 +route = g_malloc0(sizeof(KVMMSIRoute));
  route-kroute.gsi = virq;
  route-kroute.type = KVM_IRQ_ROUTING_MSI;
  route-kroute.flags = 0;
 @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  
  int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  int virq;
  
  if (!kvm_gsi_routing_enabled()) {
 @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
 msg)
  
  int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  
  if (!kvm_irqchip_in_kernel()) {
  return -ENOSYS;
 -- 
 MST

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-05 Thread Sergei Shtylyov


Hello.

On 05-06-2013 11:40, Jason Wang wrote:


When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.



Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
(vhost-net: skip head management if no outstanding).



This solves the following warnings:



WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc 
openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
a0198323 88007c9ebd08 81796b73 88007c9ebd48
8103d66b 7b773e20 8800779f 8800779f43f0
8800779f8418 015c 0062 88007c9ebd58
Call Trace:
[81796b73] dump_stack+0x19/0x1e
[8103d66b] warn_slowpath_common+0x6b/0xa0
[8103d6b5] warn_slowpath_null+0x15/0x20
[a0197627] handle_tx+0x477/0x4b0 [vhost_net]
[a0197690] handle_tx_kick+0x10/0x20 [vhost_net]
[a019541e] vhost_worker+0xfe/0x1a0 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[81061f46] kthread+0xc6/0xd0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70
[817a1aec] ret_from_fork+0x7c/0xb0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70



Signed-off-by: Jason Wang jasow...@redhat.com
---
  drivers/vhost/net.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)



diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2b51e23..b07d96b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net)
kref_get(ubufs-kref);
}
nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV;
-   }
+   } else


   You have to use {} on the *else* branch if you have it of the *if* 
branch (and vice versa), according to Documentation/CodingStyle.



+   msg.msg_control = NULL;


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote:
 On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
  kvm_add_routing_entry makes an attempt to
  zero-initialize any new routing entry.
  However, it fails to initialize padding
  within the u field of the structure
  kvm_irq_routing_entry.
  
  Other functions like kvm_irqchip_update_msi_route
  also fail to initialize the padding field in
  kvm_irq_routing_entry.
  
  While mostly harmless, this would prevent us from
  reusing these fields for something useful in
  the future.
  
 The fact that kernel does not check them for zero value is what will
 prevents us from doing so.

Well we can not change kernel now (it would break userspace)
but we can start zeroing everything in userspace.

Also, checkers like coverity might get confused by this.

Finally, simpler assignment and comparison make it worth it,
don't they?

  It's better to just make sure all input is initialized.
  
  Once it is, we can also drop complex field by field assignment and just
  do the simple *a = *b to update a route entry.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   kvm-all.c | 19 +++
   1 file changed, 7 insertions(+), 12 deletions(-)
  
  diff --git a/kvm-all.c b/kvm-all.c
  index 405480e..f119ce1 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
   }
   n = s-irq_routes-nr++;
   new = s-irq_routes-entries[n];
  -memset(new, 0, sizeof(*new));
  -new-gsi = entry-gsi;
  -new-type = entry-type;
  -new-flags = entry-flags;
  -new-u = entry-u;
  +
  +*new = *entry;
   
   set_gsi(s, entry-gsi);
   
  @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
   continue;
   }
   
  -entry-type = new_entry-type;
  -entry-flags = new_entry-flags;
  -entry-u = new_entry-u;
  +*entry = *new_entry;
   
   kvm_irqchip_commit_routes(s);
   
  @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
   
   void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
   {
  -struct kvm_irq_routing_entry e;
  +struct kvm_irq_routing_entry e = {};
   
   assert(pin  s-gsi_count);
   
  @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
   return virq;
   }
   
  -route = g_malloc(sizeof(KVMMSIRoute));
  +route = g_malloc0(sizeof(KVMMSIRoute));
   route-kroute.gsi = virq;
   route-kroute.type = KVM_IRQ_ROUTING_MSI;
   route-kroute.flags = 0;
  @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
   
   int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
   {
  -struct kvm_irq_routing_entry kroute;
  +struct kvm_irq_routing_entry kroute = {};
   int virq;
   
   if (!kvm_gsi_routing_enabled()) {
  @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
  msg)
   
   int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
   {
  -struct kvm_irq_routing_entry kroute;
  +struct kvm_irq_routing_entry kroute = {};
   
   if (!kvm_irqchip_in_kernel()) {
   return -ENOSYS;
  -- 
  MST
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote:
  On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
   kvm_add_routing_entry makes an attempt to
   zero-initialize any new routing entry.
   However, it fails to initialize padding
   within the u field of the structure
   kvm_irq_routing_entry.
   
   Other functions like kvm_irqchip_update_msi_route
   also fail to initialize the padding field in
   kvm_irq_routing_entry.
   
   While mostly harmless, this would prevent us from
   reusing these fields for something useful in
   the future.
   
  The fact that kernel does not check them for zero value is what will
  prevents us from doing so.
 
 Well we can not change kernel now (it would break userspace)
 but we can start zeroing everything in userspace.
 
 Also, checkers like coverity might get confused by this.
 
 Finally, simpler assignment and comparison make it worth it,
 don't they?
 
I am not at all against the patch! Just pointing out a mistake in the
commit message.

   It's better to just make sure all input is initialized.
   
   Once it is, we can also drop complex field by field assignment and just
   do the simple *a = *b to update a route entry.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
kvm-all.c | 19 +++
1 file changed, 7 insertions(+), 12 deletions(-)
   
   diff --git a/kvm-all.c b/kvm-all.c
   index 405480e..f119ce1 100644
   --- a/kvm-all.c
   +++ b/kvm-all.c
   @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
}
n = s-irq_routes-nr++;
new = s-irq_routes-entries[n];
   -memset(new, 0, sizeof(*new));
   -new-gsi = entry-gsi;
   -new-type = entry-type;
   -new-flags = entry-flags;
   -new-u = entry-u;
   +
   +*new = *entry;

set_gsi(s, entry-gsi);

   @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
continue;
}

   -entry-type = new_entry-type;
   -entry-flags = new_entry-flags;
   -entry-u = new_entry-u;
   +*entry = *new_entry;

kvm_irqchip_commit_routes(s);

   @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,

void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int 
   pin)
{
   -struct kvm_irq_routing_entry e;
   +struct kvm_irq_routing_entry e = {};

assert(pin  s-gsi_count);

   @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage 
   msg)
return virq;
}

   -route = g_malloc(sizeof(KVMMSIRoute));
   +route = g_malloc0(sizeof(KVMMSIRoute));
route-kroute.gsi = virq;
route-kroute.type = KVM_IRQ_ROUTING_MSI;
route-kroute.flags = 0;
   @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage 
   msg)

int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
{
   -struct kvm_irq_routing_entry kroute;
   +struct kvm_irq_routing_entry kroute = {};
int virq;

if (!kvm_gsi_routing_enabled()) {
   @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, 
   MSIMessage msg)

int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
{
   -struct kvm_irq_routing_entry kroute;
   +struct kvm_irq_routing_entry kroute = {};

if (!kvm_irqchip_in_kernel()) {
return -ENOSYS;
   -- 
   MST
  
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 So... does violating the PCI-e spec really matter?  Is it preventing
 any guest from working properly?

Yes, absolutely, this wording in spec is not there without reason.

Existing guests allocate io space for PCI express ports in
multiples on 4K.

Since each express device is behind such a port, this means
at most 15 such devices can use IO ports in a system.

That's why to make a pci express virtio device,
we must allow MMIO and/or some other communication
mechanism as the spec requires.

That's on x86.

Besides x86, there are achitectures where IO is unavailable or very slow.

 I don't think we should rush an ABI breakage if the only benefit is
 claiming spec compliance.
 
 Regards,
 
 Anthony Liguori

Why do you bring this up? No one advocates any ABI breakage,
I only suggest extensions.


 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote:
 On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote:
   On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
kvm_add_routing_entry makes an attempt to
zero-initialize any new routing entry.
However, it fails to initialize padding
within the u field of the structure
kvm_irq_routing_entry.

Other functions like kvm_irqchip_update_msi_route
also fail to initialize the padding field in
kvm_irq_routing_entry.

While mostly harmless, this would prevent us from
reusing these fields for something useful in
the future.

   The fact that kernel does not check them for zero value is what will
   prevents us from doing so.
  
  Well we can not change kernel now (it would break userspace)
  but we can start zeroing everything in userspace.
  
  Also, checkers like coverity might get confused by this.
  
  Finally, simpler assignment and comparison make it worth it,
  don't they?
  
 I am not at all against the patch! Just pointing out a mistake in the
 commit message.

I think we can agree both userspace not initializing them and kernel not
checking it are mistakes?

Anyway ... could you commit this tweaking the commit
message in a way that you consider appropriate?
Or want me to repost?


It's better to just make sure all input is initialized.

Once it is, we can also drop complex field by field assignment and just
do the simple *a = *b to update a route entry.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm-all.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 405480e..f119ce1 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
 }
 n = s-irq_routes-nr++;
 new = s-irq_routes-entries[n];
-memset(new, 0, sizeof(*new));
-new-gsi = entry-gsi;
-new-type = entry-type;
-new-flags = entry-flags;
-new-u = entry-u;
+
+*new = *entry;
 
 set_gsi(s, entry-gsi);
 
@@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
 continue;
 }
 
-entry-type = new_entry-type;
-entry-flags = new_entry-flags;
-entry-u = new_entry-u;
+*entry = *new_entry;
 
 kvm_irqchip_commit_routes(s);
 
@@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
 
 void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int 
pin)
 {
-struct kvm_irq_routing_entry e;
+struct kvm_irq_routing_entry e = {};
 
 assert(pin  s-gsi_count);
 
@@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage 
msg)
 return virq;
 }
 
-route = g_malloc(sizeof(KVMMSIRoute));
+route = g_malloc0(sizeof(KVMMSIRoute));
 route-kroute.gsi = virq;
 route-kroute.type = KVM_IRQ_ROUTING_MSI;
 route-kroute.flags = 0;
@@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage 
msg)
 
 int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 {
-struct kvm_irq_routing_entry kroute;
+struct kvm_irq_routing_entry kroute = {};
 int virq;
 
 if (!kvm_gsi_routing_enabled()) {
@@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, 
MSIMessage msg)
 
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
 {
-struct kvm_irq_routing_entry kroute;
+struct kvm_irq_routing_entry kroute = {};
 
 if (!kvm_irqchip_in_kernel()) {
 return -ENOSYS;
-- 
MST
   
   --
 Gleb.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 05:11:44PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote:
  On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote:
On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
 kvm_add_routing_entry makes an attempt to
 zero-initialize any new routing entry.
 However, it fails to initialize padding
 within the u field of the structure
 kvm_irq_routing_entry.
 
 Other functions like kvm_irqchip_update_msi_route
 also fail to initialize the padding field in
 kvm_irq_routing_entry.
 
 While mostly harmless, this would prevent us from
 reusing these fields for something useful in
 the future.
 
The fact that kernel does not check them for zero value is what will
prevents us from doing so.
   
   Well we can not change kernel now (it would break userspace)
   but we can start zeroing everything in userspace.
   
   Also, checkers like coverity might get confused by this.
   
   Finally, simpler assignment and comparison make it worth it,
   don't they?
   
  I am not at all against the patch! Just pointing out a mistake in the
  commit message.
 
 I think we can agree both userspace not initializing them and kernel not
 checking it are mistakes?
 
Mistake that cannot be fixed at this point.

 Anyway ... could you commit this tweaking the commit
 message in a way that you consider appropriate?
 Or want me to repost?
 
No need to report.

 
 It's better to just make sure all input is initialized.
 
 Once it is, we can also drop complex field by field assignment and 
 just
 do the simple *a = *b to update a route entry.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  kvm-all.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 405480e..f119ce1 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
  }
  n = s-irq_routes-nr++;
  new = s-irq_routes-entries[n];
 -memset(new, 0, sizeof(*new));
 -new-gsi = entry-gsi;
 -new-type = entry-type;
 -new-flags = entry-flags;
 -new-u = entry-u;
 +
 +*new = *entry;
  
  set_gsi(s, entry-gsi);
  
 @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
  continue;
  }
  
 -entry-type = new_entry-type;
 -entry-flags = new_entry-flags;
 -entry-u = new_entry-u;
 +*entry = *new_entry;
  
  kvm_irqchip_commit_routes(s);
  
 @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
  
  void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, 
 int pin)
  {
 -struct kvm_irq_routing_entry e;
 +struct kvm_irq_routing_entry e = {};
  
  assert(pin  s-gsi_count);
  
 @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, 
 MSIMessage msg)
  return virq;
  }
  
 -route = g_malloc(sizeof(KVMMSIRoute));
 +route = g_malloc0(sizeof(KVMMSIRoute));
  route-kroute.gsi = virq;
  route-kroute.type = KVM_IRQ_ROUTING_MSI;
  route-kroute.flags = 0;
 @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, 
 MSIMessage msg)
  
  int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  int virq;
  
  if (!kvm_gsi_routing_enabled()) {
 @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, 
 MSIMessage msg)
  
  int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage 
 msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  
  if (!kvm_irqchip_in_kernel()) {
  return -ENOSYS;
 -- 
 MST

--
Gleb.
  
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 05:12:32PM +0300, Gleb Natapov wrote:
 On Wed, Jun 05, 2013 at 05:11:44PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 05, 2013 at 05:04:55PM +0300, Gleb Natapov wrote:
   On Wed, Jun 05, 2013 at 05:02:07PM +0300, Michael S. Tsirkin wrote:
On Wed, Jun 05, 2013 at 04:00:20PM +0300, Gleb Natapov wrote:
 On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
  kvm_add_routing_entry makes an attempt to
  zero-initialize any new routing entry.
  However, it fails to initialize padding
  within the u field of the structure
  kvm_irq_routing_entry.
  
  Other functions like kvm_irqchip_update_msi_route
  also fail to initialize the padding field in
  kvm_irq_routing_entry.
  
  While mostly harmless, this would prevent us from
  reusing these fields for something useful in
  the future.
  
 The fact that kernel does not check them for zero value is what will
 prevents us from doing so.

Well we can not change kernel now (it would break userspace)
but we can start zeroing everything in userspace.

Also, checkers like coverity might get confused by this.

Finally, simpler assignment and comparison make it worth it,
don't they?

   I am not at all against the patch! Just pointing out a mistake in the
   commit message.
  
  I think we can agree both userspace not initializing them and kernel not
  checking it are mistakes?
  
 Mistake that cannot be fixed at this point.
 
  Anyway ... could you commit this tweaking the commit
  message in a way that you consider appropriate?
  Or want me to repost?
  
 No need to report.

Thanks.

  
  It's better to just make sure all input is initialized.
  
  Once it is, we can also drop complex field by field assignment and 
  just
  do the simple *a = *b to update a route entry.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   kvm-all.c | 19 +++
   1 file changed, 7 insertions(+), 12 deletions(-)
  
  diff --git a/kvm-all.c b/kvm-all.c
  index 405480e..f119ce1 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState 
  *s,
   }
   n = s-irq_routes-nr++;
   new = s-irq_routes-entries[n];
  -memset(new, 0, sizeof(*new));
  -new-gsi = entry-gsi;
  -new-type = entry-type;
  -new-flags = entry-flags;
  -new-u = entry-u;
  +
  +*new = *entry;
   
   set_gsi(s, entry-gsi);
   
  @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState 
  *s,
   continue;
   }
   
  -entry-type = new_entry-type;
  -entry-flags = new_entry-flags;
  -entry-u = new_entry-u;
  +*entry = *new_entry;
   
   kvm_irqchip_commit_routes(s);
   
  @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState 
  *s,
   
   void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, 
  int pin)
   {
  -struct kvm_irq_routing_entry e;
  +struct kvm_irq_routing_entry e = {};
   
   assert(pin  s-gsi_count);
   
  @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, 
  MSIMessage msg)
   return virq;
   }
   
  -route = g_malloc(sizeof(KVMMSIRoute));
  +route = g_malloc0(sizeof(KVMMSIRoute));
   route-kroute.gsi = virq;
   route-kroute.type = KVM_IRQ_ROUTING_MSI;
   route-kroute.flags = 0;
  @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, 
  MSIMessage msg)
   
   int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
   {
  -struct kvm_irq_routing_entry kroute;
  +struct kvm_irq_routing_entry kroute = {};
   int virq;
   
   if (!kvm_gsi_routing_enabled()) {
  @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, 
  MSIMessage msg)
   
   int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage 
  msg)
   {
  -struct kvm_irq_routing_entry kroute;
  +struct kvm_irq_routing_entry kroute = {};
   
   if (!kvm_irqchip_in_kernel()) {
   return -ENOSYS;
  -- 
  MST
 
 --
   Gleb.
   
   --
 Gleb.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-05 Thread Gleb Natapov

On Tue, Jun 04, 2013 at 02:52:32PM +0300, Michael S. Tsirkin wrote:
 kvm_add_routing_entry makes an attempt to
 zero-initialize any new routing entry.
 However, it fails to initialize padding
 within the u field of the structure
 kvm_irq_routing_entry.
 
 Other functions like kvm_irqchip_update_msi_route
 also fail to initialize the padding field in
 kvm_irq_routing_entry.
 
 While mostly harmless, this would prevent us from
 reusing these fields for something useful in
 the future.
 
 It's better to just make sure all input is initialized.
 
 Once it is, we can also drop complex field by field assignment and just
 do the simple *a = *b to update a route entry.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
Applied, thanks.

 ---
  kvm-all.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 405480e..f119ce1 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
  }
  n = s-irq_routes-nr++;
  new = s-irq_routes-entries[n];
 -memset(new, 0, sizeof(*new));
 -new-gsi = entry-gsi;
 -new-type = entry-type;
 -new-flags = entry-flags;
 -new-u = entry-u;
 +
 +*new = *entry;
  
  set_gsi(s, entry-gsi);
  
 @@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
  continue;
  }
  
 -entry-type = new_entry-type;
 -entry-flags = new_entry-flags;
 -entry-u = new_entry-u;
 +*entry = *new_entry;
  
  kvm_irqchip_commit_routes(s);
  
 @@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
  
  void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
  {
 -struct kvm_irq_routing_entry e;
 +struct kvm_irq_routing_entry e = {};
  
  assert(pin  s-gsi_count);
  
 @@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  return virq;
  }
  
 -route = g_malloc(sizeof(KVMMSIRoute));
 +route = g_malloc0(sizeof(KVMMSIRoute));
  route-kroute.gsi = virq;
  route-kroute.type = KVM_IRQ_ROUTING_MSI;
  route-kroute.flags = 0;
 @@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  
  int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  int virq;
  
  if (!kvm_gsi_routing_enabled()) {
 @@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage 
 msg)
  
  int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
  {
 -struct kvm_irq_routing_entry kroute;
 +struct kvm_irq_routing_entry kroute = {};
  
  if (!kvm_irqchip_in_kernel()) {
  return -ENOSYS;
 -- 
 MST

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread 李春奇

Hi Gleb,
I generate this mail by git send-email and I think the format is OK.

This is my first try to commit a patch in open source community. Sorry
for annoying you guys so much.

Thanks,
Arthur

On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
 Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
 into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/realmode.c |   24 
  1 file changed, 24 insertions(+)

 diff --git a/x86/realmode.c b/x86/realmode.c
 index 981be08..e103ca6 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -1504,6 +1504,29 @@ static void test_fninit(void)
 report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
  }

 +static void test_nopl(void)
 +{
 +   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
 +   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
 +   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
 +   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
 +   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes 
 nop
 +   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 
 bytes nop
 +   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); 
 // 7 bytes nop
 +   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 8 bytes nop
 +   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 9 bytes nop
 +   exec_in_big_real_mode(insn_nopl1);
 +   exec_in_big_real_mode(insn_nopl2);
 +   exec_in_big_real_mode(insn_nopl3);
 +   exec_in_big_real_mode(insn_nopl4);
 +   exec_in_big_real_mode(insn_nopl5);
 +   exec_in_big_real_mode(insn_nopl6);
 +   exec_in_big_real_mode(insn_nopl7);
 +   exec_in_big_real_mode(insn_nopl8);
 +   exec_in_big_real_mode(insn_nopl9);
 +   report(nopl, 0, 1);
 +}
 +
  void realmode_start(void)
  {
 test_null();
 @@ -1548,6 +1571,7 @@ void realmode_start(void)
 test_xlat();
 test_salc();
 test_fninit();
 +   test_nopl();

 exit(0);
  }
 --
 1.7.9.5




-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 So... does violating the PCI-e spec really matter?  Is it preventing
 any guest from working properly?

 Yes, absolutely, this wording in spec is not there without reason.

 Existing guests allocate io space for PCI express ports in
 multiples on 4K.

 Since each express device is behind such a port, this means
 at most 15 such devices can use IO ports in a system.

 That's why to make a pci express virtio device,
 we must allow MMIO and/or some other communication
 mechanism as the spec requires.

This is precisely why this is an ABI breaker.

If you disable IO bars in the BIOS, than the interface that the OS sees
will *not have an IO bar*.

This *breaks existing guests*.

Any time the programming interfaces changes on a PCI device, the
revision ID and/or device ID must change.  The spec is very clear about
this.

We cannot disable the IO BAR without changing revision ID/device ID.

 That's on x86.

 Besides x86, there are achitectures where IO is unavailable or very slow.

 I don't think we should rush an ABI breakage if the only benefit is
 claiming spec compliance.
 
 Regards,
 
 Anthony Liguori

 Why do you bring this up? No one advocates any ABI breakage,
 I only suggest extensions.

It's an ABI breakage.  You're claiming that the guests you tested
handle the breakage reasonably but it is unquestionably an ABI breakage.

Regards,

Anthony Liguori



 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: fix crash on rmmod

2013-06-05 Thread Alex Williamson

On Wed, 2013-06-05 at 16:03 +1000, Alexey Kardashevskiy wrote:
 devtmpfs_delete_node() calls devnode() callback with mode==NULL but
 vfio still tries to write there.
 
 The patch fixes this.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---

Oops.  Applied.  The mode change just went in for 3.10, so I'll get this
in before the final rc.  Thanks,

Alex

 Steps to reproduce on freshly booted system with no devices given to VFIO:
 modprobe vfio
 rmmod vfio_iommu_spapr_tce
 rmmod vfio
 ---
  drivers/vfio/vfio.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index 523c121..259ad28 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1360,7 +1360,7 @@ static const struct file_operations vfio_device_fops = {
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
  {
 - if (MINOR(dev-devt) == 0)
 + if (mode  (MINOR(dev-devt) == 0))
   *mode = S_IRUGO | S_IWUGO;
  
   return kasprintf(GFP_KERNEL, vfio/%s, dev_name(dev));



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Hi Gleb,
 I generate this mail by git send-email and I think the format is OK.
 
But I have not received the email, only this your reply to it.

 This is my first try to commit a patch in open source community. Sorry
 for annoying you guys so much.
 
 Thanks,
No problem.

 Arthur
 
 On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
  Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
  into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
   x86/realmode.c |   24 
   1 file changed, 24 insertions(+)
 
  diff --git a/x86/realmode.c b/x86/realmode.c
  index 981be08..e103ca6 100644
  --- a/x86/realmode.c
  +++ b/x86/realmode.c
  @@ -1504,6 +1504,29 @@ static void test_fninit(void)
  report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
   }
 
  +static void test_nopl(void)
  +{
  +   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
  +   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
  +   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
  +   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
  +   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 
  bytes nop
  +   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 
  6 bytes nop
  +   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 
  0x00\n\r); // 7 bytes nop
  +   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
  0x00\n\r); // 8 bytes nop
  +   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 
  0x00, 0x00\n\r); // 9 bytes nop
  +   exec_in_big_real_mode(insn_nopl1);
  +   exec_in_big_real_mode(insn_nopl2);
  +   exec_in_big_real_mode(insn_nopl3);
  +   exec_in_big_real_mode(insn_nopl4);
  +   exec_in_big_real_mode(insn_nopl5);
  +   exec_in_big_real_mode(insn_nopl6);
  +   exec_in_big_real_mode(insn_nopl7);
  +   exec_in_big_real_mode(insn_nopl8);
  +   exec_in_big_real_mode(insn_nopl9);
  +   report(nopl, 0, 1);
  +}
  +
   void realmode_start(void)
   {
  test_null();
  @@ -1548,6 +1571,7 @@ void realmode_start(void)
  test_xlat();
  test_salc();
  test_fninit();
  +   test_nopl();
 
  exit(0);
   }
  --
  1.7.9.5
 
 
 
 
 -- 
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread 李春奇

On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote:
 Hi Gleb,
 I generate this mail by git send-email and I think the format is OK.

 But I have not received the email, only this your reply to it.
Maybe the initial mail is in your spam box.


 This is my first try to commit a patch in open source community. Sorry
 for annoying you guys so much.

 Thanks,
 No problem.

 Arthur

 On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
  Add multibyte NOP test case to kvm-unit-tests. This version adds test 
  cases into x86/realmode.c. This can test one of bugs when booting RHEL5.9 
  64-bit.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
   x86/realmode.c |   24 
   1 file changed, 24 insertions(+)
 
  diff --git a/x86/realmode.c b/x86/realmode.c
  index 981be08..e103ca6 100644
  --- a/x86/realmode.c
  +++ b/x86/realmode.c
  @@ -1504,6 +1504,29 @@ static void test_fninit(void)
  report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
   }
 
  +static void test_nopl(void)
  +{
  +   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
  +   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
  +   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
  +   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
  +   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 
  bytes nop
  +   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 
  6 bytes nop
  +   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 
  0x00\n\r); // 7 bytes nop
  +   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
  0x00\n\r); // 8 bytes nop
  +   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 
  0x00, 0x00\n\r); // 9 bytes nop
  +   exec_in_big_real_mode(insn_nopl1);
  +   exec_in_big_real_mode(insn_nopl2);
  +   exec_in_big_real_mode(insn_nopl3);
  +   exec_in_big_real_mode(insn_nopl4);
  +   exec_in_big_real_mode(insn_nopl5);
  +   exec_in_big_real_mode(insn_nopl6);
  +   exec_in_big_real_mode(insn_nopl7);
  +   exec_in_big_real_mode(insn_nopl8);
  +   exec_in_big_real_mode(insn_nopl9);
  +   report(nopl, 0, 1);
  +}
  +
   void realmode_start(void)
   {
  test_null();
  @@ -1548,6 +1571,7 @@ void realmode_start(void)
  test_xlat();
  test_salc();
  test_fninit();
  +   test_nopl();
 
  exit(0);
   }
  --
  1.7.9.5
 



 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

 --
 Gleb.



--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
   You mean make BAR0 an MMIO BAR?
   Yes, it would break current windows guests.
   Further, as long as we use same address to notify all queues,
   we would also need to decode the instruction on x86 and that's
   measureably slower than PIO.
   We could go back to discussing hypercall use for notifications,
   but that has its own set of issues...
  
  So... does violating the PCI-e spec really matter?  Is it preventing
  any guest from working properly?
 
  Yes, absolutely, this wording in spec is not there without reason.
 
  Existing guests allocate io space for PCI express ports in
  multiples on 4K.
 
  Since each express device is behind such a port, this means
  at most 15 such devices can use IO ports in a system.
 
  That's why to make a pci express virtio device,
  we must allow MMIO and/or some other communication
  mechanism as the spec requires.
 
 This is precisely why this is an ABI breaker.
 
 If you disable IO bars in the BIOS, than the interface that the OS sees
 will *not have an IO bar*.
 
 This *breaks existing guests*.
 Any time the programming interfaces changes on a PCI device, the
 revision ID and/or device ID must change.  The spec is very clear about
 this.
 
 We cannot disable the IO BAR without changing revision ID/device ID.
 

But it's a bios/PC issue. It's not a device issue.

Anyway, let's put express aside.

It's easy to create non-working setups with pci, today:

- create 16 pci bridges
- put one virtio device behind each

boom

Try it.

I want to fix that.


  That's on x86.
 
  Besides x86, there are achitectures where IO is unavailable or very slow.
 
  I don't think we should rush an ABI breakage if the only benefit is
  claiming spec compliance.
  
  Regards,
  
  Anthony Liguori
 
  Why do you bring this up? No one advocates any ABI breakage,
  I only suggest extensions.
 
 It's an ABI breakage.  You're claiming that the guests you tested
 handle the breakage reasonably but it is unquestionably an ABI breakage.
 
 Regards,
 
 Anthony Liguori

Adding BAR is not an ABI breakage, do we agree on that?

Disabling IO would be but I am not proposing disabling IO.

Guests might disable IO.

I propose a way to make virtio still work if they do.

This is *fixing* things. Not breaking.

 
 
  
   -- 
   MST
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread 李春奇

On Wed, Jun 5, 2013 at 11:17 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote:
 On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote:
  Hi Gleb,
  I generate this mail by git send-email and I think the format is OK.
 
  But I have not received the email, only this your reply to it.
 Maybe the initial mail is in your spam box.

 Doubt it. I do not see it in mailing list archive either:
 http://news.gmane.org/gmane.comp.emulators.kvm.devel

 What git command line you've used to sent the email?
git send-email --to kvm@vger.kernel.org --cc pbonz...@redhat.com --cc
g...@redhat.com  0001-Test-case-of-emulating-multibyte-NOP.patch

Should I need to add some smtp options?

Arthur

 
  This is my first try to commit a patch in open source community. Sorry
  for annoying you guys so much.
 
  Thanks,
  No problem.
 
  Arthur
 
  On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com 
  wrote:
   Add multibyte NOP test case to kvm-unit-tests. This version adds test 
   cases into x86/realmode.c. This can test one of bugs when booting 
   RHEL5.9 64-bit.
  
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
x86/realmode.c |   24 
1 file changed, 24 insertions(+)
  
   diff --git a/x86/realmode.c b/x86/realmode.c
   index 981be08..e103ca6 100644
   --- a/x86/realmode.c
   +++ b/x86/realmode.c
   @@ -1504,6 +1504,29 @@ static void test_fninit(void)
   report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
}
  
   +static void test_nopl(void)
   +{
   +   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
   +   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
   +   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
   +   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes 
   nop
   +   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 
   bytes nop
   +   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); 
   // 6 bytes nop
   +   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 
   0x00\n\r); // 7 bytes nop
   +   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
   0x00\n\r); // 8 bytes nop
   +   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 
   0x00, 0x00\n\r); // 9 bytes nop
   +   exec_in_big_real_mode(insn_nopl1);
   +   exec_in_big_real_mode(insn_nopl2);
   +   exec_in_big_real_mode(insn_nopl3);
   +   exec_in_big_real_mode(insn_nopl4);
   +   exec_in_big_real_mode(insn_nopl5);
   +   exec_in_big_real_mode(insn_nopl6);
   +   exec_in_big_real_mode(insn_nopl7);
   +   exec_in_big_real_mode(insn_nopl8);
   +   exec_in_big_real_mode(insn_nopl9);
   +   report(nopl, 0, 1);
   +}
   +
void realmode_start(void)
{
   test_null();
   @@ -1548,6 +1571,7 @@ void realmode_start(void)
   test_xlat();
   test_salc();
   test_fninit();
   +   test_nopl();
  
   exit(0);
}
   --
   1.7.9.5
  
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.



 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote:
 On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote:
  Hi Gleb,
  I generate this mail by git send-email and I think the format is OK.
 
  But I have not received the email, only this your reply to it.
 Maybe the initial mail is in your spam box.
 
Doubt it. I do not see it in mailing list archive either:
http://news.gmane.org/gmane.comp.emulators.kvm.devel

What git command line you've used to sent the email?

 
  This is my first try to commit a patch in open source community. Sorry
  for annoying you guys so much.
 
  Thanks,
  No problem.
 
  Arthur
 
  On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com wrote:
   Add multibyte NOP test case to kvm-unit-tests. This version adds test 
   cases into x86/realmode.c. This can test one of bugs when booting 
   RHEL5.9 64-bit.
  
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
x86/realmode.c |   24 
1 file changed, 24 insertions(+)
  
   diff --git a/x86/realmode.c b/x86/realmode.c
   index 981be08..e103ca6 100644
   --- a/x86/realmode.c
   +++ b/x86/realmode.c
   @@ -1504,6 +1504,29 @@ static void test_fninit(void)
   report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
}
  
   +static void test_nopl(void)
   +{
   +   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
   +   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
   +   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
   +   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes 
   nop
   +   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 
   bytes nop
   +   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); 
   // 6 bytes nop
   +   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 
   0x00\n\r); // 7 bytes nop
   +   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
   0x00\n\r); // 8 bytes nop
   +   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 
   0x00, 0x00\n\r); // 9 bytes nop
   +   exec_in_big_real_mode(insn_nopl1);
   +   exec_in_big_real_mode(insn_nopl2);
   +   exec_in_big_real_mode(insn_nopl3);
   +   exec_in_big_real_mode(insn_nopl4);
   +   exec_in_big_real_mode(insn_nopl5);
   +   exec_in_big_real_mode(insn_nopl6);
   +   exec_in_big_real_mode(insn_nopl7);
   +   exec_in_big_real_mode(insn_nopl8);
   +   exec_in_big_real_mode(insn_nopl9);
   +   report(nopl, 0, 1);
   +}
   +
void realmode_start(void)
{
   test_null();
   @@ -1548,6 +1571,7 @@ void realmode_start(void)
   test_xlat();
   test_salc();
   test_fninit();
   +   test_nopl();
  
   exit(0);
}
   --
   1.7.9.5
  
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.
 
 
 
 --
 Arthur Chunqi Li
 Department of Computer Science
 School of EECS
 Peking University
 Beijing, China

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Gleb Natapov

On Wed, Jun 05, 2013 at 11:22:19PM +0800, 李春奇 Arthur Chunqi Li wrote:
 On Wed, Jun 5, 2013 at 11:17 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Jun 05, 2013 at 11:13:37PM +0800, 李春奇 Arthur Chunqi Li wrote:
  On Wed, Jun 5, 2013 at 11:11 PM, Gleb Natapov g...@redhat.com wrote:
   On Wed, Jun 05, 2013 at 10:56:54PM +0800, 李春奇 Arthur Chunqi Li wrote:
   Hi Gleb,
   I generate this mail by git send-email and I think the format is OK.
  
   But I have not received the email, only this your reply to it.
  Maybe the initial mail is in your spam box.
 
  Doubt it. I do not see it in mailing list archive either:
  http://news.gmane.org/gmane.comp.emulators.kvm.devel
 
  What git command line you've used to sent the email?
 git send-email --to kvm@vger.kernel.org --cc pbonz...@redhat.com --cc
 g...@redhat.com  0001-Test-case-of-emulating-multibyte-NOP.patch
 
 Should I need to add some smtp options?
 
You need to configure smtp server. Just like you need to do that for any
other email application.
 
 Arthur
 
  
   This is my first try to commit a patch in open source community. Sorry
   for annoying you guys so much.
  
   Thanks,
   No problem.
  
   Arthur
  
   On Wed, Jun 5, 2013 at 10:54 PM, Arthur Chunqi Li yzt...@gmail.com 
   wrote:
Add multibyte NOP test case to kvm-unit-tests. This version adds test 
cases into x86/realmode.c. This can test one of bugs when booting 
RHEL5.9 64-bit.
   
Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/realmode.c |   24 
 1 file changed, 24 insertions(+)
   
diff --git a/x86/realmode.c b/x86/realmode.c
index 981be08..e103ca6 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1504,6 +1504,29 @@ static void test_fninit(void)
report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
 }
   
+static void test_nopl(void)
+{
+   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
+   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
+   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
+   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 
bytes nop
+   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 
5 bytes nop
+   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 
0x00\n\r); // 6 bytes nop
+   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 
0x00\n\r); // 7 bytes nop
+   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 
0x00, 0x00\n\r); // 8 bytes nop
+   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 
0x00, 0x00, 0x00\n\r); // 9 bytes nop
+   exec_in_big_real_mode(insn_nopl1);
+   exec_in_big_real_mode(insn_nopl2);
+   exec_in_big_real_mode(insn_nopl3);
+   exec_in_big_real_mode(insn_nopl4);
+   exec_in_big_real_mode(insn_nopl5);
+   exec_in_big_real_mode(insn_nopl6);
+   exec_in_big_real_mode(insn_nopl7);
+   exec_in_big_real_mode(insn_nopl8);
+   exec_in_big_real_mode(insn_nopl9);
+   report(nopl, 0, 1);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -1548,6 +1571,7 @@ void realmode_start(void)
test_xlat();
test_salc();
test_fninit();
+   test_nopl();
   
exit(0);
 }
--
1.7.9.5
   
  
  
  
   --
   Arthur Chunqi Li
   Department of Computer Science
   School of EECS
   Peking University
   Beijing, China
  
   --
   Gleb.
 
 
 
  --
  Arthur Chunqi Li
  Department of Computer Science
  School of EECS
  Peking University
  Beijing, China
 
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
   You mean make BAR0 an MMIO BAR?
   Yes, it would break current windows guests.
   Further, as long as we use same address to notify all queues,
   we would also need to decode the instruction on x86 and that's
   measureably slower than PIO.
   We could go back to discussing hypercall use for notifications,
   but that has its own set of issues...
  
  So... does violating the PCI-e spec really matter?  Is it preventing
  any guest from working properly?
 
  Yes, absolutely, this wording in spec is not there without reason.
 
  Existing guests allocate io space for PCI express ports in
  multiples on 4K.
 
  Since each express device is behind such a port, this means
  at most 15 such devices can use IO ports in a system.
 
  That's why to make a pci express virtio device,
  we must allow MMIO and/or some other communication
  mechanism as the spec requires.
 
 This is precisely why this is an ABI breaker.
 
 If you disable IO bars in the BIOS, than the interface that the OS sees
 will *not have an IO bar*.
 
 This *breaks existing guests*.
 Any time the programming interfaces changes on a PCI device, the
 revision ID and/or device ID must change.  The spec is very clear about
 this.
 
 We cannot disable the IO BAR without changing revision ID/device ID.
 

 But it's a bios/PC issue. It's not a device issue.

 Anyway, let's put express aside.

 It's easy to create non-working setups with pci, today:

 - create 16 pci bridges
 - put one virtio device behind each

 boom

 Try it.

 I want to fix that.


  That's on x86.
 
  Besides x86, there are achitectures where IO is unavailable or very slow.
 
  I don't think we should rush an ABI breakage if the only benefit is
  claiming spec compliance.
  
  Regards,
  
  Anthony Liguori
 
  Why do you bring this up? No one advocates any ABI breakage,
  I only suggest extensions.
 
 It's an ABI breakage.  You're claiming that the guests you tested
 handle the breakage reasonably but it is unquestionably an ABI breakage.
 
 Regards,
 
 Anthony Liguori

 Adding BAR is not an ABI breakage, do we agree on that?

 Disabling IO would be but I am not proposing disabling IO.

 Guests might disable IO.

Look, it's very simple.

If the failure in the guest is that BAR0 mapping fails because the
device is enabled but the BAR is disabled, then you've broken the ABI.

And what's worse is that this isn't for an obscure scenario (like having
15 PCI bridges) but for something that would become the standard
scenario (using a PCI-e bus).

We need to either bump the revision ID or the device ID if we do this.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] vhost: cleanups and fixes

2013-06-05 Thread Michael S. Tsirkin

On Thu, May 02, 2013 at 12:49:42PM -0700, Linus Torvalds wrote:
 On Thu, May 2, 2013 at 12:33 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
  I prefer not rebasing,
 
 Good.
 
will play with git to see why
  does request-pull get me a wrong diffstat and how
  to trick it into doing the right thing.
  Maybe merging my branch into master will do this.
 
 No, don't do an unnecessary merge just to get the diffstat right.
 
 git pull-request ends up assuming that there are no back-merges, and
 that you have a uniquely defined single shared merge base. That allows
 pull-request to just generate the diff directly from that merge base,
 without actually trying to do the merge itself (which may have
 conflicts etc).
 
 But because git pull-request doesn't actually *do* the merge, it means
 that it will fail to give the correct diffstat if the tree is
 complicated and has multiple merge bases, and it can't really figure
 what the original shared state was before the development.
 
 This is just one reason I do *not* want to see back-merges. They make
 history harder to read not just for humans.
 
 You can either ignore the problem (I'll see the real diffstat when I
 actually do the merge), or you can do a test-merge yourself (that you
 do *not* then push out in the development branch - keep it in a
 temporary branch for your own edification or just delete it after
 doing the merge, and don't do development on it!)
 
 In this case, it's an indirect back-merge: you back-merged a commit
 from the target tree that I have now merged, so it has become a
 back-merge. I'm not sure why you did that - if you needed to start
 from that state, it would actually have been better to just start at
 that state instead of merging it.

OK I'm in that situation again. I have some vhost-net patches that depend on
changes in tip.
But I also have a vhost-next branch with unrelated changes, that
I started from -rc3.

Previously I would just merge tip into vhost-next, then everyone's
happy, but it will create an implicit back-merge again, won't it?

So what should I do?

Sorry about being dense.

 But whatever. You can get the
 diffstat by using your merge as the base, so
 
 git diff -M --stat --summary bc7562355fda..
 
 in your branch should get the right result without any merges etc..
 But please do send me a proper pull request.
 
   Linus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Arthur Chunqi Li

Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/realmode.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index 981be08..e103ca6 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1504,6 +1504,29 @@ static void test_fninit(void)
report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
 }
 
+static void test_nopl(void)
+{
+   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
+   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
+   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
+   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
+   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop
+   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 
bytes nop
+   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); 
// 7 bytes nop
+   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
0x00\n\r); // 8 bytes nop
+   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
0x00\n\r); // 9 bytes nop
+   exec_in_big_real_mode(insn_nopl1);
+   exec_in_big_real_mode(insn_nopl2);
+   exec_in_big_real_mode(insn_nopl3);
+   exec_in_big_real_mode(insn_nopl4);
+   exec_in_big_real_mode(insn_nopl5);
+   exec_in_big_real_mode(insn_nopl6);
+   exec_in_big_real_mode(insn_nopl7);
+   exec_in_big_real_mode(insn_nopl8);
+   exec_in_big_real_mode(insn_nopl9);
+   report(nopl, 0, 1);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -1548,6 +1571,7 @@ void realmode_start(void)
test_xlat();
test_salc();
test_fninit();
+   test_nopl();
 
exit(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread yzt356

From: Arthur Chunqi Li yzt...@gmail.com

Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/realmode.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/x86/realmode.c b/x86/realmode.c
index 981be08..e103ca6 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -1504,6 +1504,29 @@ static void test_fninit(void)
report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
 }
 
+static void test_nopl(void)
+{
+   MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
+   MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
+   MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
+   MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
+   MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop
+   MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 
bytes nop
+   MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); 
// 7 bytes nop
+   MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
0x00\n\r); // 8 bytes nop
+   MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
0x00\n\r); // 9 bytes nop
+   exec_in_big_real_mode(insn_nopl1);
+   exec_in_big_real_mode(insn_nopl2);
+   exec_in_big_real_mode(insn_nopl3);
+   exec_in_big_real_mode(insn_nopl4);
+   exec_in_big_real_mode(insn_nopl5);
+   exec_in_big_real_mode(insn_nopl6);
+   exec_in_big_real_mode(insn_nopl7);
+   exec_in_big_real_mode(insn_nopl8);
+   exec_in_big_real_mode(insn_nopl9);
+   report(nopl, 0, 1);
+}
+
 void realmode_start(void)
 {
test_null();
@@ -1548,6 +1571,7 @@ void realmode_start(void)
test_xlat();
test_salc();
test_fninit();
+   test_nopl();
 
exit(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread Gleb Natapov

This time the email is perfect :)

On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote:
 Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
 into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.
 
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/realmode.c |   24 
  1 file changed, 24 insertions(+)
 
 diff --git a/x86/realmode.c b/x86/realmode.c
 index 981be08..e103ca6 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -1504,6 +1504,29 @@ static void test_fninit(void)
   report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
  }
  
 +static void test_nopl(void)
 +{
 + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
 + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
 + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
 + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
But all nops below that are not supported in 16 bit mode. You can
disassemble realmode.elf in 16bit node (objdump -z -d -mi8086
x86/realmode.elf) and check yourself. Lets not complicate things for now
and test only those that are easy to test.

 + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes nop
 + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 
 bytes nop
 + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); 
 // 7 bytes nop
 + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 8 bytes nop
 + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 9 bytes nop
 + exec_in_big_real_mode(insn_nopl1);
 + exec_in_big_real_mode(insn_nopl2);
 + exec_in_big_real_mode(insn_nopl3);
 + exec_in_big_real_mode(insn_nopl4);
 + exec_in_big_real_mode(insn_nopl5);
 + exec_in_big_real_mode(insn_nopl6);
 + exec_in_big_real_mode(insn_nopl7);
 + exec_in_big_real_mode(insn_nopl8);
 + exec_in_big_real_mode(insn_nopl9);
 + report(nopl, 0, 1);
 +}
 +
  void realmode_start(void)
  {
   test_null();
 @@ -1548,6 +1571,7 @@ void realmode_start(void)
   test_xlat();
   test_salc();
   test_fninit();
 + test_nopl();
  
   exit(0);
  }
 -- 
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Test case of emulating multibyte NOP

2013-06-05 Thread 李春奇

On Thu, Jun 6, 2013 at 12:13 AM, Gleb Natapov g...@redhat.com wrote:
 This time the email is perfect :)

 On Thu, Jun 06, 2013 at 12:02:52AM +0800, Arthur Chunqi Li wrote:
 Add multibyte NOP test case to kvm-unit-tests. This version adds test cases 
 into x86/realmode.c. This can test one of bugs when booting RHEL5.9 64-bit.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  x86/realmode.c |   24 
  1 file changed, 24 insertions(+)

 diff --git a/x86/realmode.c b/x86/realmode.c
 index 981be08..e103ca6 100644
 --- a/x86/realmode.c
 +++ b/x86/realmode.c
 @@ -1504,6 +1504,29 @@ static void test_fninit(void)
   report(fninit, 0, fsw == 0  (fcw  0x103f) == 0x003f);
  }

 +static void test_nopl(void)
 +{
 + MK_INSN(nopl1, .byte 0x90\n\r); // 1 byte nop
 + MK_INSN(nopl2, .byte 0x66, 0x90\n\r); // 2 bytes nop
 + MK_INSN(nopl3, .byte 0x0f, 0x1f, 0x00\n\r); // 3 bytes nop
 + MK_INSN(nopl4, .byte 0x0f, 0x1f, 0x40, 0x00\n\r); // 4 bytes nop
 But all nops below that are not supported in 16 bit mode. You can
 disassemble realmode.elf in 16bit node (objdump -z -d -mi8086
 x86/realmode.elf) and check yourself. Lets not complicate things for now
 and test only those that are easy to test.
Yes. But what if a 7-bytes nop runs in 16bit mode? Just the same as
https://bugzilla.redhat.com/show_bug.cgi?id=967652

DR6=0ff0 DR7=0400
EFER=0500
Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 0f
1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39
d0 75 ec 48 89 f8

The error code is 0f 1f 80 00 00 00 00, which is a 7-bytes nop. Will
the emulator runs well in that case when booting RHEL5.9 64-bit?

Arthur



 + MK_INSN(nopl5, .byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 5 bytes 
 nop
 + MK_INSN(nopl6, .byte 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00\n\r); // 6 
 bytes nop
 + MK_INSN(nopl7, .byte 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00\n\r); 
 // 7 bytes nop
 + MK_INSN(nopl8, .byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 8 bytes nop
 + MK_INSN(nopl9, .byte 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 
 0x00\n\r); // 9 bytes nop
 + exec_in_big_real_mode(insn_nopl1);
 + exec_in_big_real_mode(insn_nopl2);
 + exec_in_big_real_mode(insn_nopl3);
 + exec_in_big_real_mode(insn_nopl4);
 + exec_in_big_real_mode(insn_nopl5);
 + exec_in_big_real_mode(insn_nopl6);
 + exec_in_big_real_mode(insn_nopl7);
 + exec_in_big_real_mode(insn_nopl8);
 + exec_in_big_real_mode(insn_nopl9);
 + report(nopl, 0, 1);
 +}
 +
  void realmode_start(void)
  {
   test_null();
 @@ -1548,6 +1571,7 @@ void realmode_start(void)
   test_xlat();
   test_salc();
   test_fninit();
 + test_nopl();

   exit(0);
  }
 --
 1.7.9.5

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support

2013-06-05 Thread Scott Wood


On 06/05/2013 02:10:07 AM, Caraman Mihai Claudiu-B02008 wrote:

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 05, 2013 12:39 AM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Alexander Graf
 Subject: Re: [RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support

 On 06/03/2013 03:54:22 PM, Mihai Caraman wrote:
  Mihai Caraman (6):
KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build  
breakage

KVM: PPC: Book3E: Refactor SPE_FP exit handling
KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC
KVM: PPC: Book3E: Add AltiVec support
KVM: PPC: Book3E: Add ONE_REG AltiVec support
KVM: PPC: Book3E: Enhance FPU laziness
 
   arch/powerpc/include/asm/kvm_asm.h|   16 ++-
   arch/powerpc/kvm/booke.c  |  189
  
   arch/powerpc/kvm/booke.h  |4 +-
   arch/powerpc/kvm/bookehv_interrupts.S |8 +-
   arch/powerpc/kvm/e500.c   |   10 +-
   arch/powerpc/kvm/e500_emulate.c   |8 +-
   arch/powerpc/kvm/e500mc.c |   10 ++-
   7 files changed, 199 insertions(+), 46 deletions(-)

 This looks like a bit much for 3.10 (certainly, subject lines like
 refactor and enhance and add support aren't going to make  
Linus

 happy given that we're past rc4) so I think we should apply
 http://patchwork.ozlabs.org/patch/242896/ for 3.10.  Then for 3.11,
 revert it after applying this patchset.


Why not 1/6 plus e6500 removal?


1/6 is not a bugfix.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
   Michael S. Tsirkin m...@redhat.com writes:
   
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
You mean make BAR0 an MMIO BAR?
Yes, it would break current windows guests.
Further, as long as we use same address to notify all queues,
we would also need to decode the instruction on x86 and that's
measureably slower than PIO.
We could go back to discussing hypercall use for notifications,
but that has its own set of issues...
   
   So... does violating the PCI-e spec really matter?  Is it preventing
   any guest from working properly?
  
   Yes, absolutely, this wording in spec is not there without reason.
  
   Existing guests allocate io space for PCI express ports in
   multiples on 4K.
  
   Since each express device is behind such a port, this means
   at most 15 such devices can use IO ports in a system.
  
   That's why to make a pci express virtio device,
   we must allow MMIO and/or some other communication
   mechanism as the spec requires.
  
  This is precisely why this is an ABI breaker.
  
  If you disable IO bars in the BIOS, than the interface that the OS sees
  will *not have an IO bar*.
  
  This *breaks existing guests*.
  Any time the programming interfaces changes on a PCI device, the
  revision ID and/or device ID must change.  The spec is very clear about
  this.
  
  We cannot disable the IO BAR without changing revision ID/device ID.
  
 
  But it's a bios/PC issue. It's not a device issue.
 
  Anyway, let's put express aside.
 
  It's easy to create non-working setups with pci, today:
 
  - create 16 pci bridges
  - put one virtio device behind each
 
  boom
 
  Try it.
 
  I want to fix that.
 
 
   That's on x86.
  
   Besides x86, there are achitectures where IO is unavailable or very slow.
  
   I don't think we should rush an ABI breakage if the only benefit is
   claiming spec compliance.
   
   Regards,
   
   Anthony Liguori
  
   Why do you bring this up? No one advocates any ABI breakage,
   I only suggest extensions.
  
  It's an ABI breakage.  You're claiming that the guests you tested
  handle the breakage reasonably but it is unquestionably an ABI breakage.
  
  Regards,
  
  Anthony Liguori
 
  Adding BAR is not an ABI breakage, do we agree on that?
 
  Disabling IO would be but I am not proposing disabling IO.
 
  Guests might disable IO.
 
 Look, it's very simple.
 
 If the failure in the guest is that BAR0 mapping fails because the
 device is enabled but the BAR is disabled,

There's no such thing as device is enabled and neither
BAR is disabled in PCI spec.

Spec says IO and memory can be enabled/disabled, separately.
PCI Express spec says devices should work without IO.

So modern guests will assume it's ok to work without IO,
and it will become more common in the future.

 then you've broken the ABI.


No.  It means that the ABI is broken.

Guests can disable IO *today* and when they do things don't work.

 
 And what's worse is that this isn't for an obscure scenario (like having
 15 PCI bridges) but for something that would become the standard
 scenario (using a PCI-e bus).
 
 We need to either bump the revision ID or the device ID if we do this.
 
 Regards,
 
 Anthony Liguori

We only need to do it if we do a change that breaks guests.

Please find a guest that is broken by the patches. You won't find any.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
 Look, it's very simple.
 We only need to do it if we do a change that breaks guests.

 Please find a guest that is broken by the patches. You won't find any.

I think the problem in this whole discussion is that we're talking past
each other.

Here is my understanding:

1) PCI-e says that you must be able to disable IO bars and still have a
functioning device.

2) It says (1) because you must size IO bars to 4096 which means that
practically speaking, once you enable a dozen or so PIO bars, you run
out of PIO space (16 * 4k == 64k and not all that space can be used).

virtio-pci uses a IO bars exclusively today.  Existing guest drivers
assume that there is an IO bar that contains the virtio-pci registers.

So let's consider the following scenarios:

QEMU of today:

1) qemu -drive file=ubuntu-13.04.img,if=virtio

This works today.  Does adding an MMIO bar at BAR1 break this?
Certainly not if the device is behind a PCI bus...

But are we going to put devices behind a PCI-e bus by default?  Are we
going to ask the user to choose whether devices are put behind a legacy
bus or the express bus?

What happens if we put the device behind a PCI-e bus by default?  Well,
it can still work.  That is, until we do something like this:

2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
-device virtio-balloon..

Such that we have more than a dozen or so devices.  This works
perfectly fine today.  It works fine because we've designed virtio to
make sure it works fine.  Quoting the spec:

Configuration space is generally used for rarely-changing or
 initialization-time parameters. But it is a limited resource, so it
 might be better to use a virtqueue to update configuration information
 (the network device does this for filtering, otherwise the table in the
 config space could potentially be very large).

In fact, we can have 100s of PCI devices today without running out of IO
space because we're so careful about this.

So if we switch to using PCI-e by default *and* we keep virtio-pci
without modifying the device IDs, then very frequently we are going to
break existing guests because the drivers they already have no longer
work.

A few virtio-serial channels, a few block devices, a couple of network
adapters, the balloon and RNG driver, and we hit the IO space limit
pretty damn quickly so this is not a contrived scenario at all.  I would
expect that we frequently run into this if we don't address this problem.

So we have a few options:

1) Punt all of this complexity to libvirt et al and watch people make
   the wrong decisions about when to use PCI-e.  This will become yet
   another example of KVM being too hard to configure.

2) Enable PCI-e by default and just force people to upgrade their
   drivers.

3) Don't use PCI-e by default but still add BAR1 to virtio-pci

4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
   it a new device/vendor ID.   Continue to use virtio-pci for existing
   devices potentially adding virtio-{net,blk,...}-pcie variants for
   people that care to use them.

I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
policy so they're going to make a simple decision and always use the
same bus by default.  I suspect if we made PCI the default, they might
just always set the PCI-e flag just because.

There are hundreds of thousands if not millions of guests with existing
virtio-pci drivers.  Forcing them to upgrade better have an extremely
good justification.

I think 4 is the best path forward.  It's better for users (guests
continue to work as they always have).  There's less confusion about
enabling PCI-e support--you must ask for the virtio-pcie variant and you
must have a virtio-pcie driver.  It's easy to explain.

It also maps to what regular hardware does.  I highly doubt that there
are any real PCI cards that made the shift from PCI to PCI-e without
bumping at least a revision ID.

It also means we don't need to play games about sometimes enabling IO
bars and sometimes not.

Regards,

Anthony Liguori



 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling

2013-06-05 Thread Scott Wood


On 06/05/2013 02:29:47 AM, Caraman Mihai Claudiu-B02008 wrote:

case BOOKE_INTERRUPT_SPE_FP_ROUND:
  +#ifdef CONFIG_SPE
kvmppc_booke_queue_irqprio(vcpu,
  BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;

 Why not use kvmppc_supports_spe() here, for consistency?

I added cpu_has_feature(CPU_FTR_SPE) for the case specified above,  
but here
SPE_FP_ROUND is not shared with ALTIVEC. CONFIG_SPE is used in other  
places

in KVM without this check, shouldn't be all or nothing?


I'd rather it be consistent, at least between handling one exception  
and another.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Error while running android as guest ---FastModels

2013-06-05 Thread Christoffer Dall

On Wed, Jun 05, 2013 at 08:31:13PM +0200, Mai Daftedar wrote:
 Dear All,
 
 After following the Android on Fast Models pdf. I managed to successfully
 run android as the host and when running the following command to run
 android as a guest:
 
 ./qemu-system-arm \
 -vnc :0 \
 -k en-us \
 -enable-kvm \
 -serial stdio \
 -kernel zImage \
 -m 512 -M vexpress-a15 -cpu cortex-a15 \
 -drive file=./android.img,id=virtio-blk,if=none \
 -device virtio-blk,drive=virtio-blk,transport=virtio-mmio.0 \
 -append virtio_mmio.device=1M@0x4e00:74:0 init=/init
 console=ttyAMA0 mem=512M root=/dev/vda rw
 
 The following errors appear:
 
 *device virtio-blk, drive=virtio-blk, transport=virtio-mmio.0: No 'PCI' bus
 found for device virtio-blk-pci*
 
 Any suggestions on what I am missing out
 

Sounds like you're using the wrong version of QEMU.  For virtio to work
you need to use the rather old kvm-arm-virtio branch in my tree (it may
be replicated elsewhere too).

There is ongoing work to get this support updated to newer QEMU
implementations.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).


Let me add 3 other issues which I mentioned and you seem to miss:

3) architectures which don't have fast access to IO ports, exist
   virtio does not work there ATM

4) setups with many PCI bridges exist and have the same issue
   as PCI express. virtio does not work there ATM

5) On x86, even with nested page tables, firmware only decodes
   the page address on an invalid PTE, not the data. You need to
   emulate the guest to get at the data. Without
   nested page tables, we have to do page table walk and emulate
   to get both address and data. Since this is how MMIO
   is implemented in kvm on x86, MMIO is much slower than PIO
   (with nested page tables by a factor of 2, did not test without).

 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),

We can't do this - it will hurt performance.

give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.
 
 I think 1 == 2 == 3 and I view 2 as an ABI breaker.

Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

 libvirt does like
 policy so they're going to make a simple decision and always use the
 same bus by default.  I suspect if we made PCI the default, they might
 just always set the PCI-e flag just because.

This sounds very strange. But let's assume you are right for
the sake of the argument ...

 There are hundreds of thousands if not millions of guests with existing
 virtio-pci drivers.  Forcing them to upgrade better have an extremely
 good justification.
 
 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

I don't think how this changes the situation.

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
 Let me add 3 other issues which I mentioned and you seem to miss:
 
 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM
 
 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM
 
 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

Oh I forgot:

6) access to MMIO BARs is painful in the BIOS environment
   so BIOS would typically need to enable IO for the boot device.



  virtio-pci uses a IO bars exclusively today.  Existing guest drivers
  assume that there is an IO bar that contains the virtio-pci registers.
  So let's consider the following scenarios:
  
  QEMU of today:
  
  1) qemu -drive file=ubuntu-13.04.img,if=virtio
  
  This works today.  Does adding an MMIO bar at BAR1 break this?
  Certainly not if the device is behind a PCI bus...
  
  But are we going to put devices behind a PCI-e bus by default?  Are we
  going to ask the user to choose whether devices are put behind a legacy
  bus or the express bus?
  
  What happens if we put the device behind a PCI-e bus by default?  Well,
  it can still work.  That is, until we do something like this:
  
  2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
  -device virtio-balloon..
  
  Such that we have more than a dozen or so devices.  This works
  perfectly fine today.  It works fine because we've designed virtio to
  make sure it works fine.  Quoting the spec:
  
  Configuration space is generally used for rarely-changing or
   initialization-time parameters. But it is a limited resource, so it
   might be better to use a virtqueue to update configuration information
   (the network device does this for filtering, otherwise the table in the
   config space could potentially be very large).
  
  In fact, we can have 100s of PCI devices today without running out of IO
  space because we're so careful about this.
  
  So if we switch to using PCI-e by default *and* we keep virtio-pci
  without modifying the device IDs, then very frequently we are going to
  break existing guests because the drivers they already have no longer
  work.
  
  A few virtio-serial channels, a few block devices, a couple of network
  adapters, the balloon and RNG driver, and we hit the IO space limit
  pretty damn quickly so this is not a contrived scenario at all.  I would
  expect that we frequently run into this if we don't address this problem.
  
  So we have a few options:
  1) Punt all of this complexity to libvirt et al and watch people make
 the wrong decisions about when to use PCI-e.  This will become yet
 another example of KVM being too hard to configure.
  
  2) Enable PCI-e by default and just force people to upgrade their
 drivers.
  
  3) Don't use PCI-e by default but still add BAR1 to virtio-pci
  
  4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
 
 We can't do this - it will hurt performance.
 
 give
 it a new device/vendor ID.   Continue to use virtio-pci for existing
 devices potentially adding virtio-{net,blk,...}-pcie variants for
 people that care to use them.
  
  I think 1 == 2 == 3 and I view 2 as an ABI breaker.
 
 Why do you think 2 == 3? 2 changes default behaviour. 3 does not.
 
  libvirt does like
  policy so they're going to make a simple decision and always use the
  same bus by default.  I suspect if we made PCI the default, they might
  just always set the PCI-e flag just because.
 
 This sounds very strange. But let's assume you are right for
 the sake of the argument ...
 
  There are hundreds of thousands if not millions of guests with existing
  virtio-pci drivers.  Forcing them to upgrade better have an extremely
  good

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

For the record, with respect to PCI-e discussion, I have no
problem with the idea of changing the device ID or
revision id and asking guests to upgrade if they
want to use a pcie device.

That's not exactly 4 however.


I see no reason to couple PCI-e with MMIO discussion,
that's just one of the reasons to support MMIO.


 I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
 policy so they're going to make a simple decision and always use the
 same bus by default.  I suspect if we made PCI the default, they might
 just always set the PCI-e flag just because.
 
 There are hundreds of thousands if not millions of guests with existing
 virtio-pci drivers.  Forcing them to upgrade better have an extremely
 good justification.
 
 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.
 
 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.
 
 It also means we don't need to play games about sometimes enabling IO
 bars and sometimes not.
 
 Regards,
 
 Anthony Liguori
 
 
 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).


 Let me add 3 other issues which I mentioned and you seem to miss:

 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM

Which architectures have PCI but no IO ports?

 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),

 We can't do this - it will hurt performance.

Can you explain?  I thought the whole trick with separating out the
virtqueue notification register was to regain the performance?

give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.
 
 I think 1 == 2 == 3 and I view 2 as an ABI breaker.

 Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

It doesn't change the default behavior but then we're pushing the
decision of when to use pci-e to the user.  They have to understand that
there can be subtle breakages because the virtio-pci driver may not work
if they are using an old guest.

 libvirt does like
 policy so they're

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
 Let me add 3 other issues which I mentioned and you seem to miss:
 
 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM
 
 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM
 
 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

 Oh I forgot:

 6) access to MMIO BARs is painful in the BIOS environment
so BIOS would typically need to enable IO for the boot device.

But if you want to boot from the 16th device, the BIOS needs to solve
this problem anyway.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness

2013-06-05 Thread Scott Wood

On 06/05/2013 04:14:21 AM, Caraman Mihai Claudiu-B02008 wrote:

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, June 05, 2013 1:54 AM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Caraman Mihai Claudiu-B02008
 Subject: Re: [RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness

 On 06/03/2013 03:54:28 PM, Mihai Caraman wrote:
  Adopt AltiVec approach to increase laziness by calling
  kvmppc_load_guest_fp()
  just before returning to guest instaed of each sched in.

  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com

 If you did this *before* adding Altivec it would have saved a  
question

 in an earlier patch. :-)

I kept asking myself about the order and in the end I decided that  
this is
an improvement originated from AltiVec work. FPU may be further  
cleaned up

(get rid of active state, etc).

  ---
   arch/powerpc/kvm/booke.c  |1 +
   arch/powerpc/kvm/e500mc.c |2 --
   2 files changed, 1 insertions(+), 2 deletions(-)

  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 019496d..5382238 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -1258,6 +1258,7 @@ int kvmppc_handle_exit(struct kvm_run *run,
  struct kvm_vcpu *vcpu,
} else {
kvmppc_lazy_ee_enable();
kvmppc_load_guest_altivec(vcpu);
  + kvmppc_load_guest_fp(vcpu);
}
}

 You should probably do these before kvmppc_lazy_ee_enable().

Why? I wanted to look like part of lightweight_exit.

We want to minimize the portion of the code that runs with interrupts  
disabled while telling tracers that interrupts are enabled.  We want to  
minimize the C code run with lazy EE in an inconsistent state.

The same applies to kvm_vcpu_run()...

 Actually, I don't think this is a good idea at all.  As I understand
 it, you're not supposed to take kernel ownersship of floating point  
in

 non-atomic context, because an interrupt could itself call
 enable_kernel_fp().

So lightweight_exit isn't executed in atomic context?

Ignore this, I misread what the patch was doing.  I thought you were  
doing the opposite you did. :-P

As such, this patch appears to fix the thing I was complaining about --  
before, we could have taken an interrupt after kvmppc_core_vcpu_load(),  
and that interrupt could have claimed the floating point (unlikely with  
the kernel as is, but you never know what could happen in the future or  
out-of-tree...).

Will be lazyee fixes including  kvmppc_fix_ee_before_entry() in 3.10?
64-bit Book3E KVM is unreliable without them. Should we disable e5500  
too

for 3.10?

I hope so...  I meant to ask Gleb to take them while Alex was away, but  
I forgot about them. :-P

Alex, are you back from vacation yet?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-05 Thread Luiz Capitulino

The balloon_page_dequeue() function can return NULL. If it does for
the first page being freed, then leak_balloon() will create a
scatter list with len=0. Which in turn seems to generate an invalid
virtio request.

Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---

PS: I didn't get this in practice. I found it by code review. On the other
hand, automatic-ballooning was able to put such invalid requests in
the virtqueue and QEMU would explode...

PPS: Very lightly tested

 drivers/virtio/virtio_balloon.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bd3ae32..71af7b5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to do it in this order
 */
-   tell_host(vb, vb-deflate_vq);
+   if (vb-num_pfns != 0)
+   tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin

On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
 
 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.
 

For native endpoints.  Currently virtio would be a legacy endpoint
which is quite correct -- it is compatible with a legacy interface.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
  Let me add 3 other issues which I mentioned and you seem to miss:
 
  3) architectures which don't have fast access to IO ports, exist
 virtio does not work there ATM
 
 Which architectures have PCI but no IO ports?
 
  4) setups with many PCI bridges exist and have the same issue
 as PCI express. virtio does not work there ATM
 
 This is not virtio specific.  This is true for all devices that use IO.

Absolutely. And you will find that modern devices make use of IO
ports optional.

  5) On x86, even with nested page tables, firmware only decodes
 the page address on an invalid PTE, not the data. You need to
 emulate the guest to get at the data. Without
 nested page tables, we have to do page table walk and emulate
 to get both address and data. Since this is how MMIO
 is implemented in kvm on x86, MMIO is much slower than PIO
 (with nested page tables by a factor of 2, did not test without).
 
 Am well aware of this, this is why we use PIO.
 
 I fully agree with you that when we do MMIO, we should switch the
 notification mechanism to avoid encoding anything meaningful as data.
 
  virtio-pci uses a IO bars exclusively today.  Existing guest drivers
  assume that there is an IO bar that contains the virtio-pci registers.
  So let's consider the following scenarios:
  
  QEMU of today:
  
  1) qemu -drive file=ubuntu-13.04.img,if=virtio
  
  This works today.  Does adding an MMIO bar at BAR1 break this?
  Certainly not if the device is behind a PCI bus...
  
  But are we going to put devices behind a PCI-e bus by default?  Are we
  going to ask the user to choose whether devices are put behind a legacy
  bus or the express bus?
  
  What happens if we put the device behind a PCI-e bus by default?  Well,
  it can still work.  That is, until we do something like this:
  
  2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
  -device virtio-balloon..
  
  Such that we have more than a dozen or so devices.  This works
  perfectly fine today.  It works fine because we've designed virtio to
  make sure it works fine.  Quoting the spec:
  
  Configuration space is generally used for rarely-changing or
   initialization-time parameters. But it is a limited resource, so it
   might be better to use a virtqueue to update configuration information
   (the network device does this for filtering, otherwise the table in the
   config space could potentially be very large).
  
  In fact, we can have 100s of PCI devices today without running out of IO
  space because we're so careful about this.
  
  So if we switch to using PCI-e by default *and* we keep virtio-pci
  without modifying the device IDs, then very frequently we are going to
  break existing guests because the drivers they already have no longer
  work.
  
  A few virtio-serial channels, a few block devices, a couple of network
  adapters, the balloon and RNG driver, and we hit the IO space limit
  pretty damn quickly so this is not a contrived scenario at all.  I would
  expect that we frequently run into this if we don't address this problem.
  
  So we have a few options:
  1) Punt all of this complexity to libvirt et al and watch people make
 the wrong decisions about when to use PCI-e.  This will become yet
 another example of KVM being too hard to configure.
  
  2) Enable PCI-e by default and just force people to upgrade their
 drivers.
  
  3) Don't use PCI-e by default but still add BAR1 to virtio-pci
  
  4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
 
  We can't do this - it will hurt performance.
 
 Can you explain?  I thought the whole trick with separating out the
 virtqueue notification register was to regain the performance?

Yes but this trick only works well with NPT (it's still a bit
slower than PIO but not so drastically).
Without NPT you still need a page walk so it will be slow.

 give
 it a new device/vendor ID.   Continue to use virtio-pci for existing
 devices potentially adding virtio-{net,blk,...}-pcie variants for
 people that care

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin

On 06/05/2013 01:45 PM, Anthony Liguori wrote:
 
 But if you want to boot from the 16th device, the BIOS needs to solve
 this problem anyway.
 

No.  Just have the BIOS plumb the path to the device it wants to boot
from.  Problem solved.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 03:45:09PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
   Michael S. Tsirkin m...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
Look, it's very simple.
We only need to do it if we do a change that breaks guests.
   
Please find a guest that is broken by the patches. You won't find any.
   
   I think the problem in this whole discussion is that we're talking past
   each other.
   
   Here is my understanding:
   
   1) PCI-e says that you must be able to disable IO bars and still have a
   functioning device.
   
   2) It says (1) because you must size IO bars to 4096 which means that
   practically speaking, once you enable a dozen or so PIO bars, you run
   out of PIO space (16 * 4k == 64k and not all that space can be used).
  
  
  Let me add 3 other issues which I mentioned and you seem to miss:
  
  3) architectures which don't have fast access to IO ports, exist
 virtio does not work there ATM
  
  4) setups with many PCI bridges exist and have the same issue
 as PCI express. virtio does not work there ATM
  
  5) On x86, even with nested page tables, firmware only decodes
 the page address on an invalid PTE, not the data. You need to
 emulate the guest to get at the data. Without
 nested page tables, we have to do page table walk and emulate
 to get both address and data. Since this is how MMIO
 is implemented in kvm on x86, MMIO is much slower than PIO
 (with nested page tables by a factor of 2, did not test without).
 
  Oh I forgot:
 
  6) access to MMIO BARs is painful in the BIOS environment
 so BIOS would typically need to enable IO for the boot device.
 
 But if you want to boot from the 16th device, the BIOS needs to solve
 this problem anyway.
 
 Regards,
 
 Anthony Liguori

But it's solvable: just enable devices one by one to boot from them.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin

On Wed, Jun 05, 2013 at 02:10:03PM -0700, H. Peter Anvin wrote:
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
  
  Spec says IO and memory can be enabled/disabled, separately.
  PCI Express spec says devices should work without IO.
  
 
 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
   -hpa

Yes. But it's not the spec wording that we care for most.
It's supporting setups that have trouble using IO.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-05 Thread Rafael Aquini

On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote:
 The balloon_page_dequeue() function can return NULL. If it does for
 the first page being freed, then leak_balloon() will create a
 scatter list with len=0. Which in turn seems to generate an invalid
 virtio request.
 
 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
 ---
 
 PS: I didn't get this in practice. I found it by code review. On the other
 hand, automatic-ballooning was able to put such invalid requests in
 the virtqueue and QEMU would explode...


Nice catch! The patch looks sane and replicates the check done at
fill_balloon(). I think we also could use this P.S. as a commentary 
to let others aware of this scenario. Thanks Luiz!

Acked-by: Rafael Aquini aqu...@redhat.com

 
 PPS: Very lightly tested

  drivers/virtio/virtio_balloon.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index bd3ae32..71af7b5 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
* is true, we *have* to do it in this order
*/
 - tell_host(vb, vb-deflate_vq);
 + if (vb-num_pfns != 0)
 + tell_host(vb, vb-deflate_vq);
   mutex_unlock(vb-balloon_lock);
   release_pages_by_pfn(vb-pfns, vb-num_pfns);
  }
 -- 
 1.8.1.4
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
 
 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.
 

 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.

Do legacy endpoints also use 4k for BARs?

If not, can't we use a new device id for native endpoints and call it a
day?  Legacy endpoints would continue using the existing BAR layout.

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
 Can you explain?  I thought the whole trick with separating out the
 virtqueue notification register was to regain the performance?

 Yes but this trick only works well with NPT (it's still a bit
 slower than PIO but not so drastically).
 Without NPT you still need a page walk so it will be slow.

Do you mean NPT/EPT?

If your concern is shadow paging, then I think you're concerned about
hardware that is so slow to start with that it's not worth considering.

  It also maps to what regular hardware does.  I highly doubt that there
  are any real PCI cards that made the shift from PCI to PCI-e without
  bumping at least a revision ID.
 
  Only because the chance it's 100% compatible on the software level is 0.
  It always has some hardware specific quirks.
  No such excuse here.
 
  It also means we don't need to play games about sometimes enabling IO
  bars and sometimes not.
 
  This last paragraph is wrong, it ignores the issues 3) to 5) 
  I added above.
 
  If you do take them into account:
 - there are reasons to add MMIO BAR to PCI,
   even without PCI express
 
 So far, the only reason you've provided is it doesn't work on some
 architectures.  Which architectures?

 PowerPC wants this.

Existing PowerPC remaps PIO to MMAP so it works fine today.

Future platforms may not do this but future platforms can use a
different device.  They certainly won't be able to use the existing
drivers anyway.

Ben, am I wrong here?

 - we won't be able to drop IO BAR from virtio
 
 An IO BAR is useless if it means we can't have more than 12 devices.


 It's not useless. A smart BIOS can enable devices one by one as
 it tries to boot from them.

A smart BIOS can also use MMIO to program virtio.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin

On 06/05/2013 02:50 PM, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:

 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.


 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
 Do legacy endpoints also use 4k for BARs?

There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
is no technical enforcement, however) to 256 bytes.

The 4K come from the upstream bridge windows, which are only 4K granular
(historic stuff from when bridges were assumed rare.)  However, there
can be multiple devices, functions, and BARs inside that window.

The issue with PCIe is that each PCIe port is a bridge, so in reality
there is only one real device per bus number.

 If not, can't we use a new device id for native endpoints and call it a
 day?  Legacy endpoints would continue using the existing BAR layout.

Definitely an option.  However, we want to be able to boot from native
devices, too, so having an I/O BAR (which would not be used by the OS
driver) should still at the very least be an option.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 02:50 PM, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:

 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.


 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
 Do legacy endpoints also use 4k for BARs?

 There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
 is no technical enforcement, however) to 256 bytes.

 The 4K come from the upstream bridge windows, which are only 4K granular
 (historic stuff from when bridges were assumed rare.)  However, there
 can be multiple devices, functions, and BARs inside that window.

Got it.


 The issue with PCIe is that each PCIe port is a bridge, so in reality
 there is only one real device per bus number.

 If not, can't we use a new device id for native endpoints and call it a
 day?  Legacy endpoints would continue using the existing BAR layout.

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
forward.  Is there something special about PCI-e here?

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Benjamin Herrenschmidt

On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:

 Existing PowerPC remaps PIO to MMAP so it works fine today.
 
 Future platforms may not do this but future platforms can use a
 different device.  They certainly won't be able to use the existing
 drivers anyway.
 
 Ben, am I wrong here?

Well, we do remap PIO, though it's ugly and annoying so we'd rather
avoid it alltogether :-)

Our latest HW PHBs don't support PIO at all, so while we can still
support it in the virtual ones in the guest for ever, it's becoming
pretty clear that PIO is on the way out ...

- we won't be able to drop IO BAR from virtio
  
  An IO BAR is useless if it means we can't have more than 12 devices.
 
 
  It's not useless. A smart BIOS can enable devices one by one as
  it tries to boot from them.
 
 A smart BIOS can also use MMIO to program virtio.

Indeed :-)

I see no reason why not providing both access path though. Have the PIO
BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
MMIO window which I'd be happy to favor on power.

We could even put somewhere in there a feature bit set by qemu to
indicate whether it thinks PIO or MMIO is faster on a given platform if
you really think that's worth it (I don't).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:

 A smart BIOS can also use MMIO to program virtio.

 Indeed :-)

 I see no reason why not providing both access path though. Have the PIO
 BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
 MMIO window which I'd be happy to favor on power.

 We could even put somewhere in there a feature bit set by qemu to
 indicate whether it thinks PIO or MMIO is faster on a given platform if
 you really think that's worth it (I don't).

That's okay, but what I'm most concerned about is compatibility.

A virtio PCI device that's a native endpoint needs to have a different
device ID than one that is a legacy endpoint.  The current drivers
have no hope of working (well) with virtio PCI devices exposed as native
endpoints.

I don't care if the native PCI endpoint also has a PIO bar.  But it
seems silly (and confusing) to me to make that layout be the legacy
layout verses a straight mirror of the new layout if we're already
changing the device ID.

In addition, it doesn't seem at all necessary to have an MMIO bar to the
legacy device.  If the reason you want MMIO is to avoid using PIO, then
you break existing drivers because they assume PIO.  If you are breaking
existing drivers then you should change the device ID.

If strictly speaking it's just that MMIO is a bit faster, I'm not sure
that complexity is worth it without seeing performance numbers first.

Regards,

Anthony Liguori


 Cheers,
 Ben.


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin

On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
 forward.  Is there something special about PCI-e here?
 

It's not tracking allocation.  It is that accessing memory above 1 MiB
is incredibly painful in the BIOS environment, which basically means
MMIO is inaccessible.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-05 Thread Luiz Capitulino

On Wed, 5 Jun 2013 18:24:49 -0300
Rafael Aquini aqu...@redhat.com wrote:

 On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote:
  The balloon_page_dequeue() function can return NULL. If it does for
  the first page being freed, then leak_balloon() will create a
  scatter list with len=0. Which in turn seems to generate an invalid
  virtio request.
  
  Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
  ---
  
  PS: I didn't get this in practice. I found it by code review. On the other
  hand, automatic-ballooning was able to put such invalid requests in
  the virtqueue and QEMU would explode...
 
 
 Nice catch! The patch looks sane and replicates the check done at
 fill_balloon(). I think we also could use this P.S. as a commentary 
 to let others aware of this scenario. Thanks Luiz!

Want me to respin?

 Acked-by: Rafael Aquini aqu...@redhat.com

Thanks for your review!

 
  
  PPS: Very lightly tested
 
   drivers/virtio/virtio_balloon.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/virtio/virtio_balloon.c 
  b/drivers/virtio/virtio_balloon.c
  index bd3ae32..71af7b5 100644
  --- a/drivers/virtio/virtio_balloon.c
  +++ b/drivers/virtio/virtio_balloon.c
  @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, 
  size_t num)
   * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
   * is true, we *have* to do it in this order
   */
  -   tell_host(vb, vb-deflate_vq);
  +   if (vb-num_pfns != 0)
  +   tell_host(vb, vb-deflate_vq);
  mutex_unlock(vb-balloon_lock);
  release_pages_by_pfn(vb-pfns, vb-num_pfns);
   }
  -- 
  1.8.1.4
  
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Benjamin Herrenschmidt

On Wed, 2013-06-05 at 17:53 -0500, Anthony Liguori wrote:
 If strictly speaking it's just that MMIO is a bit faster, I'm not sure
 that complexity is worth it without seeing performance numbers first.

You mean PIO. I agree with all your points here. The only thing I
saw as an option would be to add a PIO BAR that is an exact mirror
of the MMIO BAR if and only if:

 - PIO is indeed significantly faster on platforms we care about

and/or

 - There is significant simplification in things like BIOSes in
   using PIO over MMIO

I personally don't care and would chose to use MMIO always including in
firmware on ppc.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-05 Thread Rafael Aquini

On Wed, Jun 05, 2013 at 07:08:44PM -0400, Luiz Capitulino wrote:
 On Wed, 5 Jun 2013 18:24:49 -0300
 Rafael Aquini aqu...@redhat.com wrote:
 
  On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote:
   The balloon_page_dequeue() function can return NULL. If it does for
   the first page being freed, then leak_balloon() will create a
   scatter list with len=0. Which in turn seems to generate an invalid
   virtio request.
   
   Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
   ---
   
   PS: I didn't get this in practice. I found it by code review. On the other
   hand, automatic-ballooning was able to put such invalid requests in
   the virtqueue and QEMU would explode...
  
  
  Nice catch! The patch looks sane and replicates the check done at
  fill_balloon(). I think we also could use this P.S. as a commentary 
  to let others aware of this scenario. Thanks Luiz!
 
 Want me to respin?


That would be great, indeed. I guess the commentary could also go for the same
if case at fill_balloon(), assuming the tests are placed to prevent the same
scenario you described at changelog. You can stick my Ack on it, if reposting.

Cheers!
-- Rafael
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori

H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory above 1 MiB
 is incredibly painful in the BIOS environment, which basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Regards,

Anthony Liguori


   -hpa


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-05 Thread Luiz Capitulino

The balloon_page_dequeue() function can return NULL. If it does for
the first page being freed, then leak_balloon() will create a
scatter list with len=0. Which in turn seems to generate an invalid
virtio request.

I didn't get this in practice, I found it by code review. On the other
hand, such an invalid virtio request will cause errors in QEMU and
fill_balloon() also performs the same check implemented by this commit.

Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
Acked-by: Rafael Aquini aqu...@redhat.com
---

o v2

 - Improve changelog

 drivers/virtio/virtio_balloon.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bd3ae32..71af7b5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to do it in this order
 */
-   tell_host(vb, vb-deflate_vq);
+   if (vb-num_pfns != 0)
+   tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Intercepting task switches in svm/vmx with tdp enabled

2013-06-05 Thread Leo Prasath

Thanks much for the reply. It seems Linux stopped using the hardware
context switch mechanisms ( like far jmp ) since kernel version 2.2 (
per understanding linux kernel book ).

For now, I am just going to use cr3 write interception to detect guest
process context switches. ( on a related note however, with linux
running in a single cpu guest vm, I see interceptions printing writes
to cr3 with same value as the one that already is in the register -
possibly threads  or other scenarios )

Thanks,
-Leo


On Wed, Jun 5, 2013 at 1:16 AM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Jun 05, 2013 at 12:51:29AM -0500, Leo Prasath wrote:
 Hi,

 I am interested in intercepting task switches in vmx/svm in 64 bit
 mode with ept/npt enabled.
 However, I am not seeing the exit code due to task switch ( 9 for vmx
 and 125 for svm ) in the list of vm exits that I see in a typical
 guest run.
 I do not think task switch exit means what you think it means. This is
 not OS context switches, but some x86 cpu concept of task that can be
 switched by using HW mechanism. No modern OS uses it. Actually in 64 bit
 mode it does not exists at all.

 I log the vm exit codes in the x86/svm.c:handle_exit method for svm
 and x86/vmx.c:vmx_handle_exit for vmx.

 Any pointers regarding this is very much appreciated.

 On a related note, does cr3 write interception approximate task switch
 interception ?
 Depending on how OS works. For Linux it is probably true (if cr3 value
 changes).

 ( I was able to intercept cr3 writes with svm while npt was enabled.
 but with vmx, I could intercept cr3 writes only with ept disabled )

 Thanks,
 Leo

 Looking through the manuals, svm has a control bit in VMCS for
 enabling / disabling task switch interception while vmx does not seem
 to have such a control bit.
 Again, this is not task switch you are looking for.

 -
 Excerpts from the manuals :

 Intel
 --

 Exit reason #9 indicates a vm exit due to task switch.

 Vol. 3C 24-9 : Some instructions cause VM exits regardless of the
 settings of the processor-based VM-execution controls (see Section
 25.1.2), as
 do task switches (see Section 25.2).

 Vol. 3C 25-6 : Task switches. Task switches are not allowed in VMX
 non-root operation. Any attempt to effect a task switch in VMX
 non-root operation causes a VM exit. See Section 25.4.2

 AMD
 ---

 Intercept code to look for is: 7Dh VMEXIT_TASK_SWITCH task switch

 15.14 AMD64 Technology Miscellaneous Intercepts : The SVM architecture
 includes intercepts to handle task switches, processor freezes due to
 FERR, and shutdown operations.
 Task switches can modify several resources that a VMM may want to
 protect (CR3, EFLAGS, LDT).  However, instead of checking various
 intercepts (e.g., CR3 Write, LDTR Write) individually, task switches
 check only a single intercept bit.

 Page 581 : Layout of VMCB says Byte offset 00Ch : bit 29 Intercept
 task switches.

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-05 Thread Jason Wang

On 06/05/2013 09:44 PM, Sergei Shtylyov wrote:
 Hello.

 On 05-06-2013 11:40, Jason Wang wrote:

 When we decide not use zero-copy, msg.control should be set to NULL
 otherwise
 macvtap/tap may set zerocopy callbacks which may decrease the kref of
 ubufs
 wrongly.

 Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
 (vhost-net: skip head management if no outstanding).

 This solves the following warnings:

 WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
 Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge
 stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
 CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
 a0198323 88007c9ebd08 81796b73 88007c9ebd48
 8103d66b 7b773e20 8800779f 8800779f43f0
 8800779f8418 015c 0062 88007c9ebd58
 Call Trace:
 [81796b73] dump_stack+0x19/0x1e
 [8103d66b] warn_slowpath_common+0x6b/0xa0
 [8103d6b5] warn_slowpath_null+0x15/0x20
 [a0197627] handle_tx+0x477/0x4b0 [vhost_net]
 [a0197690] handle_tx_kick+0x10/0x20 [vhost_net]
 [a019541e] vhost_worker+0xfe/0x1a0 [vhost_net]
 [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [81061f46] kthread+0xc6/0xd0
 [81061e80] ? kthread_freezable_should_stop+0x70/0x70
 [817a1aec] ret_from_fork+0x7c/0xb0
 [81061e80] ? kthread_freezable_should_stop+0x70/0x70

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
   drivers/vhost/net.c |3 ++-
   1 files changed, 2 insertions(+), 1 deletions(-)

 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 2b51e23..b07d96b 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -436,7 +436,8 @@ static void handle_tx(struct vhost_net *net)
   kref_get(ubufs-kref);
   }
   nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV;
 -}
 +} else

You have to use {} on the *else* branch if you have it of the *if*
 branch (and vice versa), according to Documentation/CodingStyle.

checkpatch.pl didn't complain this, will send v2.

Thanks

 +msg.msg_control = NULL;

 WBR, Sergei

 -- 
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-05 Thread Jason Wang

When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.

Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
(vhost-net: skip head management if no outstanding).

This solves the following warnings:

WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc 
openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
a0198323 88007c9ebd08 81796b73 88007c9ebd48
8103d66b 7b773e20 8800779f 8800779f43f0
8800779f8418 015c 0062 88007c9ebd58
Call Trace:
[81796b73] dump_stack+0x19/0x1e
[8103d66b] warn_slowpath_common+0x6b/0xa0
[8103d6b5] warn_slowpath_null+0x15/0x20
[a0197627] handle_tx+0x477/0x4b0 [vhost_net]
[a0197690] handle_tx_kick+0x10/0x20 [vhost_net]
[a019541e] vhost_worker+0xfe/0x1a0 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[a0195320] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[81061f46] kthread+0xc6/0xd0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70
[817a1aec] ret_from_fork+0x7c/0xb0
[81061e80] ? kthread_freezable_should_stop+0x70/0x70

Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
The patch is needed for -stable.

Changes from v1:
- code style issue fix
---
 drivers/vhost/net.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2b51e23..518622d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,6 +436,8 @@ static void handle_tx(struct vhost_net *net)
kref_get(ubufs-kref);
}
nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV;
+   } else {
+   msg.msg_control = NULL;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock-ops-sendmsg(NULL, sock, msg, len);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Rusty Russell

Anthony Liguori aligu...@us.ibm.com writes:
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

Now you have a different compatibility problem; how do you know the
guest supports the new virtio-pcie net?

If you put a virtio-pci card behind a PCI-e bridge today, it's not
compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I've been assuming we'd avoid a flag day change; that devices would
look like existing virtio-pci with capabilities indicating the new
config layout.

 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

Removing both forward and backward compatibility is easy to explain, but
I think it'll be harder to deploy.  This is your area though, so perhaps
I'm wrong.

 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.

Noone expected the new cards to Just Work with old OSes: a new machine
meant a new OS and new drivers.  Hardware vendors like that.

Since virtualization often involves legacy, our priorities might be
different.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm-unit-tests: Add test case for accessing bpl via modr/m

2013-06-05 Thread Arthur Chunqi Li

Test access to %bpl via modr/m addressing mode. This case can test another bug 
in the boot of RHEL5.9 64-bit.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 x86/emulator.c |   41 +
 1 file changed, 41 insertions(+)

diff --git a/x86/emulator.c b/x86/emulator.c
index 96576e5..3563971 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -901,6 +901,45 @@ static void test_simplealu(u32 *mem)
 report(test, *mem == 0x8400);
 }
 
+static void test_bpl_modrm(uint64_t *mem, uint8_t *insn_page,
+   uint8_t *alt_insn_page, void *insn_ram)
+{
+   ulong *cr3 = (ulong *)read_cr3();
+   uint16_t cx = 0;
+
+   // Pad with RET instructions
+   memset(insn_page, 0xc3, 4096);
+   memset(alt_insn_page, 0xc3, 4096);
+   // Place a trapping instruction in the page to trigger a VMEXIT
+   insn_page[0] = 0x66; // mov $0x4321, %cx
+   insn_page[1] = 0xb9;
+   insn_page[2] = 0x21;
+   insn_page[3] = 0x43;
+   insn_page[4] = 0x89; // mov %eax, (%rax)
+   insn_page[5] = 0x00;
+   insn_page[6] = 0x90; // nop
+   // Place mov %cl, %bpl in alt_insn_page for emulator to execuate
+   // If emulator mistaken addressing %bpl, %cl may be moved to %ch
+   // %cx will be broken to 0x2121, not 0x4321
+   alt_insn_page[4] = 0x40;
+   alt_insn_page[5] = 0x88;
+   alt_insn_page[6] = 0xcd;
+
+   // Load the code TLB with insn_page, but point the page tables at
+   // alt_insn_page (and keep the data TLB clear, for AMD decode assist).
+   // This will make the CPU trap on the insn_page instruction but the
+   // hypervisor will see alt_insn_page.
+   install_page(cr3, virt_to_phys(insn_page), insn_ram);
+   // Load code TLB
+   invlpg(insn_ram);
+   asm volatile(call *%0 : : r(insn_ram+3));
+   // Trap, let hypervisor emulate at alt_insn_page
+   install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
+   asm volatile(call *%0 : : r(insn_ram), a(mem));
+   asm volatile(:=c(cx));
+   report(access bpl in modr/m, cx == 0x4321);
+}
+
 int main()
 {
void *mem;
@@ -964,6 +1003,8 @@ int main()
 
test_string_io_mmio(mem);
 
+   test_bpl_modrm(mem, insn_page, alt_insn_page, insn_ram);
+
printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
return fails ? 1 : 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 113 matches

Mail list logo