Re: [GIT PULL] KVM/ARM Updates for 3.12

2013-09-01 Thread Gleb Natapov
On Fri, Aug 30, 2013 at 03:59:53PM -0700, Christoffer Dall wrote:
 Hi Gleb and Paolo,
 
 The following changes since commit cc2df20c7c4ce594c3e17e9cc260c330646012c8:
 
   KVM: x86: Update symbolic exit codes (2013-08-13 16:58:42 +0200)
 
 are available in the git repository at:
 
   git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-for-3.12
 
 for you to fetch changes up to 1fe40f6d39d23f39e643607a3e1883bfc74f1244:
 
   ARM: KVM: Add newlines to panic strings (2013-08-30 15:48:02 -0700)
 
Pulled, thanks.

 
 KVM/ARM Updates for Linux 3.12
 
 
 Christoffer Dall (4):
   ARM: KVM: Fix kvm_set_pte assignment
   ARM: KVM: Simplify tracepoint text
   ARM: KVM: Work around older compiler bug
   ARM: KVM: Add newlines to panic strings
 
  arch/arm/include/asm/kvm_mmu.h |2 +-
  arch/arm/kvm/interrupts.S  |8 
  arch/arm/kvm/reset.c   |2 +-
  arch/arm/kvm/trace.h   |7 +++
  4 files changed, 9 insertions(+), 10 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots

2013-09-01 Thread Gleb Natapov
On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
 Page tables in a read-only memory slot will currently cause a triple
 fault because the page walker uses gfn_to_hva and it fails on such a slot.
 
 OVMF uses such a page table; however, real hardware seems to be fine with
 that as long as the accessed/dirty bits are set.  Save whether the slot
 is readonly, and later check it when updating the accessed and dirty bits.
 
The fix looks OK to me, but some comment below.

 Cc: sta...@vger.kernel.org
 Cc: g...@redhat.com
 Cc: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
   CCing to stable@ since the regression was introduced with
   support for readonly memory slots.
 
  arch/x86/kvm/paging_tmpl.h |  7 ++-
  include/linux/kvm_host.h   |  1 +
  virt/kvm/kvm_main.c| 14 +-
  3 files changed, 16 insertions(+), 6 deletions(-)
 
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 0433301..dadc5c0 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -99,6 +99,7 @@ struct guest_walker {
   pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
   gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
   pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
 + bool pte_writable[PT_MAX_FULL_LEVELS];
   unsigned pt_access;
   unsigned pte_access;
   gfn_t gfn;
 @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct 
 kvm_vcpu *vcpu,
   if (pte == orig_pte)
   continue;
  
 + if (unlikely(!walker-pte_writable[level - 1]))
 + return -EACCES;
 +
   ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, 
 orig_pte, pte);
   if (ret)
   return ret;
 @@ -309,7 +313,8 @@ retry_walk:
   goto error;
   real_gfn = gpa_to_gfn(real_gfn);
  
 - host_addr = gfn_to_hva(vcpu-kvm, real_gfn);
 + host_addr = gfn_to_hva_read(vcpu-kvm, real_gfn,
 + walker-pte_writable[walker-level 
 - 1]);
The use of gfn_to_hva_read is misleading. The code can still write into
gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
to gfn_to_hva_write().

This makes me think are there other places where gfn_to_hva() was
used, but gfn_to_hva_prot() should have been?
 - kvm_host_page_size() looks incorrect. We never use huge page to map
   read only memory slots currently.
 - kvm_handle_bad_page() also looks incorrect and may cause incorrect
   address to be reported to userspace.
 - kvm_setup_async_pf() also incorrect. Makes all page fault on read
   only slot to be sync.
 - kvm_vm_fault() one looks OK since function assumes write only slots,
   but it is obsolete and should be deleted anyway.

Others in generic and x86 code looks OK, somebody need to check ppc and
arm code.


   if (unlikely(kvm_is_error_hva(host_addr)))
   goto error;
  
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index ca645a0..22f9cdf 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -533,6 +533,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, 
 struct page **pages,
  
  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
  unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable);
  unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
  void kvm_release_page_clean(struct page *page);
  void kvm_release_page_dirty(struct page *page);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index f7e4334..418d037 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1078,11 +1078,15 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
  EXPORT_SYMBOL_GPL(gfn_to_hva);
  
  /*
 - * The hva returned by this function is only allowed to be read.
 - * It should pair with kvm_read_hva() or kvm_read_hva_atomic().
 + * If writable is set to false, the hva returned by this function is only
 + * allowed to be read.
   */
 -static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
 +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable)
  {
 + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
 + if (writable)
 + *writable = !memslot_is_readonly(slot);
 +
   return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false);
  }
  
 @@ -1450,7 +1454,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, 
 void *data, int offset,
   int r;
   unsigned long addr;
  
 - addr = gfn_to_hva_read(kvm, gfn);
 + addr = gfn_to_hva_read(kvm, gfn, NULL);
   if (kvm_is_error_hva(addr))
   return -EFAULT;
   r = kvm_read_hva(data, (void __user *)addr + offset, len);
 @@ -1488,7 +1492,7 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, 

Re: [PATCH v2] kvm: warn if num cpus is greater than num recommended

2013-09-01 Thread Gleb Natapov
On Fri, Aug 23, 2013 at 03:24:37PM +0200, Andrew Jones wrote:
 The comment in kvm_max_vcpus() states that it's using the recommended
 procedure from the kernel API documentation to get the max number
 of vcpus that kvm supports. It is, but by always returning the
 maximum number supported. The maximum number should only be used
 for development purposes. qemu should check KVM_CAP_NR_VCPUS for
 the recommended number of vcpus. This patch adds a warning if a user
 specifies a number of cpus between the recommended and max.
 
 v2:
 Incorporate tests for max_cpus, which specifies the maximum number
 of hotpluggable cpus. An additional note is that the message for
 the fail case was slightly changed, 'exceeds max cpus' to
 'exceeds the maximum cpus'. If this is unacceptable change for
 users like libvirt, then I'll need to spin a v3.
 
Looks good to me. Any ACKs, objections?

 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  kvm-all.c | 69 
 ---
  1 file changed, 40 insertions(+), 29 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index a2d49786365e3..021f5f47e53da 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1322,24 +1322,20 @@ static int kvm_irqchip_create(KVMState *s)
  return 0;
  }
  
 -static int kvm_max_vcpus(KVMState *s)
 +/* Find number of supported CPUs using the recommended
 + * procedure from the kernel API documentation to cope with
 + * older kernels that may be missing capabilities.
 + */
 +static int kvm_recommended_vcpus(KVMState *s)
  {
 -int ret;
 -
 -/* Find number of supported CPUs using the recommended
 - * procedure from the kernel API documentation to cope with
 - * older kernels that may be missing capabilities.
 - */
 -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 -if (ret) {
 -return ret;
 -}
 -ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
 -if (ret) {
 -return ret;
 -}
 +int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
 +return (ret) ? ret : 4;
 +}
  
 -return 4;
 +static int kvm_max_vcpus(KVMState *s)
 +{
 +int ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 +return (ret) ? ret : kvm_recommended_vcpus(s);
  }
  
  int kvm_init(void)
 @@ -1347,11 +1343,19 @@ int kvm_init(void)
  static const char upgrade_note[] =
  Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n
  (see http://sourceforge.net/projects/kvm).\n;
 +struct {
 +const char *name;
 +int num;
 +} num_cpus[] = {
 +{ SMP,  smp_cpus },
 +{ hotpluggable, max_cpus },
 +{ NULL, }
 +}, *nc = num_cpus;
 +int soft_vcpus_limit, hard_vcpus_limit;
  KVMState *s;
  const KVMCapabilityInfo *missing_cap;
  int ret;
  int i;
 -int max_vcpus;
  
  s = g_malloc0(sizeof(KVMState));
  
 @@ -1392,19 +1396,26 @@ int kvm_init(void)
  goto err;
  }
  
 -max_vcpus = kvm_max_vcpus(s);
 -if (smp_cpus  max_vcpus) {
 -ret = -EINVAL;
 -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus 
 -supported by KVM (%d)\n, smp_cpus, max_vcpus);
 -goto err;
 -}
 +/* check the vcpu limits */
 +soft_vcpus_limit = kvm_recommended_vcpus(s);
 +hard_vcpus_limit = kvm_max_vcpus(s);
  
 -if (max_cpus  max_vcpus) {
 -ret = -EINVAL;
 -fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds 
 max cpus 
 -supported by KVM (%d)\n, max_cpus, max_vcpus);
 -goto err;
 +while (nc-name) {
 +if (nc-num  soft_vcpus_limit) {
 +fprintf(stderr,
 +Warning: Number of %s cpus requested (%d) exceeds 
 +the recommended cpus supported by KVM (%d)\n,
 +nc-name, nc-num, soft_vcpus_limit);
 +
 +if (nc-num  hard_vcpus_limit) {
 +ret = -EINVAL;
 +fprintf(stderr, Number of %s cpus requested (%d) exceeds 
 +the maximum cpus supported by KVM (%d)\n,
 +nc-name, nc-num, hard_vcpus_limit);
 +goto err;
 +}
 +}
 +nc++;
  }
  
  s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
 -- 
 1.8.1.4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] cpu: Move cpu state syncs up into cpu_dump_state()

2013-09-01 Thread Gleb Natapov
On Tue, Aug 27, 2013 at 12:19:10PM +0100, James Hogan wrote:
 The x86 and ppc targets call cpu_synchronize_state() from their
 *_cpu_dump_state() callbacks to ensure that up to date state is dumped
 when KVM is enabled (for example when a KVM internal error occurs).
 
 Move this call up into the generic cpu_dump_state() function so that
 other KVM targets (namely MIPS) can take advantage of it.
 
 This requires kvm_cpu_synchronize_state() and cpu_synchronize_state() to
 be moved out of the #ifdef NEED_CPU_H in sysemu/kvm.h so that they're
 accessible to qom/cpu.c.
 
Applied, thanks.

 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Andreas Färber afaer...@suse.de
 Cc: Alexander Graf ag...@suse.de
 Cc: Gleb Natapov g...@redhat.com
 Cc: qemu-...@nongnu.org
 Cc: kvm@vger.kernel.org
 ---
 Changes in v2 (was kvm: sync cpu state on internal error before dump)
  - rewrite to fix in cpu_dump_state() (Gleb Natapov)
 ---
  include/sysemu/kvm.h   | 20 ++--
  qom/cpu.c  |  1 +
  target-i386/helper.c   |  2 --
  target-ppc/translate.c |  2 --
  4 files changed, 11 insertions(+), 14 deletions(-)
 
 diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
 index de74411..71a0186 100644
 --- a/include/sysemu/kvm.h
 +++ b/include/sysemu/kvm.h
 @@ -270,16 +270,6 @@ int kvm_check_extension(KVMState *s, unsigned int 
 extension);
  
  uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
uint32_t index, int reg);
 -void kvm_cpu_synchronize_state(CPUState *cpu);
 -
 -/* generic hooks - to be moved/refactored once there are more users */
 -
 -static inline void cpu_synchronize_state(CPUState *cpu)
 -{
 -if (kvm_enabled()) {
 -kvm_cpu_synchronize_state(cpu);
 -}
 -}
  
  #if !defined(CONFIG_USER_ONLY)
  int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr,
 @@ -288,9 +278,19 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void 
 *ram_addr,
  
  #endif /* NEED_CPU_H */
  
 +void kvm_cpu_synchronize_state(CPUState *cpu);
  void kvm_cpu_synchronize_post_reset(CPUState *cpu);
  void kvm_cpu_synchronize_post_init(CPUState *cpu);
  
 +/* generic hooks - to be moved/refactored once there are more users */
 +
 +static inline void cpu_synchronize_state(CPUState *cpu)
 +{
 +if (kvm_enabled()) {
 +kvm_cpu_synchronize_state(cpu);
 +}
 +}
 +
  static inline void cpu_synchronize_post_reset(CPUState *cpu)
  {
  if (kvm_enabled()) {
 diff --git a/qom/cpu.c b/qom/cpu.c
 index aa95108..cfe7e24 100644
 --- a/qom/cpu.c
 +++ b/qom/cpu.c
 @@ -174,6 +174,7 @@ void cpu_dump_state(CPUState *cpu, FILE *f, 
 fprintf_function cpu_fprintf,
  CPUClass *cc = CPU_GET_CLASS(cpu);
  
  if (cc-dump_state) {
 +cpu_synchronize_state(cpu);
  cc-dump_state(cpu, f, cpu_fprintf, flags);
  }
  }
 diff --git a/target-i386/helper.c b/target-i386/helper.c
 index bf3e2ac..2aecfd0 100644
 --- a/target-i386/helper.c
 +++ b/target-i386/helper.c
 @@ -188,8 +188,6 @@ void x86_cpu_dump_state(CPUState *cs, FILE *f, 
 fprintf_function cpu_fprintf,
  char cc_op_name[32];
  static const char *seg_name[6] = { ES, CS, SS, DS, FS, GS };
  
 -cpu_synchronize_state(cs);
 -
  eflags = cpu_compute_eflags(env);
  #ifdef TARGET_X86_64
  if (env-hflags  HF_CS64_MASK) {
 diff --git a/target-ppc/translate.c b/target-ppc/translate.c
 index f07d70d..c6a6ff8 100644
 --- a/target-ppc/translate.c
 +++ b/target-ppc/translate.c
 @@ -9536,8 +9536,6 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, 
 fprintf_function cpu_fprintf,
  CPUPPCState *env = cpu-env;
  int i;
  
 -cpu_synchronize_state(cs);
 -
  cpu_fprintf(f, NIP  TARGET_FMT_lxLR  TARGET_FMT_lx  CTR 
  TARGET_FMT_lx  XER  TARGET_FMT_lx \n,
  env-nip, env-lr, env-ctr, cpu_read_xer(env));
 -- 
 1.8.1.2
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Gleb Natapov
On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
 This reserves a capability number for upcoming support
 of VFIO-IOMMU DMA operations in real mode.
 
 This reserves a number for a new SPAPR TCE IOMMU KVM device
 which is going to manage lifetime of SPAPR TCE IOMMU object.
 
 This defines an attribute of the SPAPR TCE IOMMU KVM device
 which is going to be used for initialization.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 Changes:
 v9:
 * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
 KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute
 
 2013/08/15:
 * fixed mistype in comments
 * fixed commit message which says what uses ioctls 0xad and 0xae
 
 2013/07/16:
 * changed the number
 
 2013/07/11:
 * changed order in a file, added comment about a gap in ioctl number
 ---
  arch/powerpc/include/uapi/asm/kvm.h | 8 
  include/uapi/linux/kvm.h| 2 ++
  2 files changed, 10 insertions(+)
 
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 0fb1a6e..c1ae1e5 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
  #define  KVM_XICS_MASKED (1ULL  41)
  #define  KVM_XICS_PENDING(1ULL  42)
  
 +/* SPAPR TCE IOMMU device specification */
 +struct kvm_create_spapr_tce_iommu_linkage {
 + __u64 liobn;
 + __u32 fd;
 + __u32 flags;
 +};
 +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 99c2533..9d20630 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_ARM_EL1_32BIT 93
  #define KVM_CAP_SPAPR_MULTITCE 94
 +#define KVM_CAP_SPAPR_TCE_IOMMU 95
  
You do not need capability to check for a device support. Device API
supports checking for that with KVM_CREATE_DEVICE_TEST flag to
KVM_CREATE_DEVICE ioctl.

  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -843,6 +844,7 @@ struct kvm_device_attr {
  #define KVM_DEV_TYPE_FSL_MPIC_20 1
  #define KVM_DEV_TYPE_FSL_MPIC_42 2
  #define KVM_DEV_TYPE_XICS3
 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU 4
  
  /*
   * ioctls for VM fds
 -- 
 1.8.4.rc4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Alexey Kardashevskiy
On 09/01/2013 09:27 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
 This reserves a capability number for upcoming support
 of VFIO-IOMMU DMA operations in real mode.

 This reserves a number for a new SPAPR TCE IOMMU KVM device
 which is going to manage lifetime of SPAPR TCE IOMMU object.

 This defines an attribute of the SPAPR TCE IOMMU KVM device
 which is going to be used for initialization.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v9:
 * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
 KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute

 2013/08/15:
 * fixed mistype in comments
 * fixed commit message which says what uses ioctls 0xad and 0xae

 2013/07/16:
 * changed the number

 2013/07/11:
 * changed order in a file, added comment about a gap in ioctl number
 ---
  arch/powerpc/include/uapi/asm/kvm.h | 8 
  include/uapi/linux/kvm.h| 2 ++
  2 files changed, 10 insertions(+)

 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 0fb1a6e..c1ae1e5 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
  #define  KVM_XICS_MASKED(1ULL  41)
  #define  KVM_XICS_PENDING   (1ULL  42)
  
 +/* SPAPR TCE IOMMU device specification */
 +struct kvm_create_spapr_tce_iommu_linkage {
 +__u64 liobn;
 +__u32 fd;
 +__u32 flags;
 +};
 +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE0
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 99c2533..9d20630 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_ARM_EL1_32BIT 93
  #define KVM_CAP_SPAPR_MULTITCE 94
 +#define KVM_CAP_SPAPR_TCE_IOMMU 95
  
 You do not need capability to check for a device support. Device API
 supports checking for that with KVM_CREATE_DEVICE_TEST flag to
 KVM_CREATE_DEVICE ioctl.

Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for
it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special
in some way but SPAPR TCE IOMMU is not? I am confused, sorry.


 
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -843,6 +844,7 @@ struct kvm_device_attr {
  #define KVM_DEV_TYPE_FSL_MPIC_201
  #define KVM_DEV_TYPE_FSL_MPIC_422
  #define KVM_DEV_TYPE_XICS   3
 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU4
  
  /*
   * ioctls for VM fds
 -- 
 1.8.4.rc4
 
 --
   Gleb.
 


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Gleb Natapov
On Sun, Sep 01, 2013 at 09:39:23PM +1000, Alexey Kardashevskiy wrote:
 On 09/01/2013 09:27 PM, Gleb Natapov wrote:
  On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
  This reserves a capability number for upcoming support
  of VFIO-IOMMU DMA operations in real mode.
 
  This reserves a number for a new SPAPR TCE IOMMU KVM device
  which is going to manage lifetime of SPAPR TCE IOMMU object.
 
  This defines an attribute of the SPAPR TCE IOMMU KVM device
  which is going to be used for initialization.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
  Changes:
  v9:
  * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute
 
  2013/08/15:
  * fixed mistype in comments
  * fixed commit message which says what uses ioctls 0xad and 0xae
 
  2013/07/16:
  * changed the number
 
  2013/07/11:
  * changed order in a file, added comment about a gap in ioctl number
  ---
   arch/powerpc/include/uapi/asm/kvm.h | 8 
   include/uapi/linux/kvm.h| 2 ++
   2 files changed, 10 insertions(+)
 
  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
  b/arch/powerpc/include/uapi/asm/kvm.h
  index 0fb1a6e..c1ae1e5 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
   #define  KVM_XICS_MASKED  (1ULL  41)
   #define  KVM_XICS_PENDING (1ULL  42)
   
  +/* SPAPR TCE IOMMU device specification */
  +struct kvm_create_spapr_tce_iommu_linkage {
  +  __u64 liobn;
  +  __u32 fd;
  +  __u32 flags;
  +};
  +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE  0
  +
   #endif /* __LINUX_KVM_POWERPC_H */
  diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
  index 99c2533..9d20630 100644
  --- a/include/uapi/linux/kvm.h
  +++ b/include/uapi/linux/kvm.h
  @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_IRQ_XICS 92
   #define KVM_CAP_ARM_EL1_32BIT 93
   #define KVM_CAP_SPAPR_MULTITCE 94
  +#define KVM_CAP_SPAPR_TCE_IOMMU 95
   
  You do not need capability to check for a device support. Device API
  supports checking for that with KVM_CREATE_DEVICE_TEST flag to
  KVM_CREATE_DEVICE ioctl.
 
 Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for
 it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special
 in some way but SPAPR TCE IOMMU is not? I am confused, sorry.
 
 
Looking at it KVM_CAP_IRQ_XICS/KVM_CAP_IRQ_MPIC are not used to detect
device existence, but to link a device to vcpu. KVM_CAP_IRQ_MPIC was
introduced separately from MPIC device code.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Gleb Natapov
On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.
 
 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.
 
 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.
 
 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 v8:
 * fixed warnings from check_patch.pl
 
 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.
 
 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes
 
 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API
 
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
Why not have one device that can handle multimple links?

 +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
 +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
 +IOMMU group is a minimal isolated device set which can be passed to
 +the user space via VFIO.
 +
 +Right after creation the device is in uninitlized state and requires
 +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set.
 +The attribute contains liobn, IOMMU fd and flags:
 +
 +struct kvm_create_spapr_tce_iommu_linkage {
 + __u64 liobn;
 + __u32 fd;
 + __u32 flags;
 +};
 +
 

[PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices

2013-09-01 Thread Gleb Natapov
XICS failed to free xics structure on error path. MPIC destroy handler
forgot to delete kvm_device structure.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 Be warned that this is not even compiled tested.

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 94c1dd4..97adfe8 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1244,8 +1244,10 @@ static int kvmppc_xics_create(struct kvm_device *dev, 
u32 type)
kvm-arch.xics = xics;
mutex_unlock(kvm-lock);
 
-   if (ret)
+   if (ret) {
+   kfree(xics);
return ret;
+   }
 
xics_debugfs_init(xics);
 
diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 2861ae9..efbd996 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1635,6 +1635,7 @@ static void mpic_destroy(struct kvm_device *dev)
 
dev-kvm-arch.mpic = NULL;
kfree(opp);
+   kfree(dev);
 }
 
 static int mpic_set_default_irq_routing(struct openpic *opp)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] RFC: KVM: Simple optimization based on Xiao's patch

2013-09-01 Thread Gleb Natapov
On Fri, Aug 30, 2013 at 12:50:11PM +0900, Takuya Yoshikawa wrote:
 I think this patch set answers Gleb's comment.
 
It does. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: warn if num cpus is greater than num recommended

2013-09-01 Thread Marcelo Tosatti
On Fri, Aug 23, 2013 at 03:24:37PM +0200, Andrew Jones wrote:
 The comment in kvm_max_vcpus() states that it's using the recommended
 procedure from the kernel API documentation to get the max number
 of vcpus that kvm supports. It is, but by always returning the
 maximum number supported. The maximum number should only be used
 for development purposes. qemu should check KVM_CAP_NR_VCPUS for
 the recommended number of vcpus. This patch adds a warning if a user
 specifies a number of cpus between the recommended and max.
 
 v2:
 Incorporate tests for max_cpus, which specifies the maximum number
 of hotpluggable cpus. An additional note is that the message for
 the fail case was slightly changed, 'exceeds max cpus' to
 'exceeds the maximum cpus'. If this is unacceptable change for
 users like libvirt, then I'll need to spin a v3.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  kvm-all.c | 69 
 ---
  1 file changed, 40 insertions(+), 29 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index a2d49786365e3..021f5f47e53da 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1322,24 +1322,20 @@ static int kvm_irqchip_create(KVMState *s)
  return 0;
  }
  
 -static int kvm_max_vcpus(KVMState *s)
 +/* Find number of supported CPUs using the recommended
 + * procedure from the kernel API documentation to cope with
 + * older kernels that may be missing capabilities.
 + */
 +static int kvm_recommended_vcpus(KVMState *s)
  {
 -int ret;
 -
 -/* Find number of supported CPUs using the recommended
 - * procedure from the kernel API documentation to cope with
 - * older kernels that may be missing capabilities.
 - */
 -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 -if (ret) {
 -return ret;
 -}
 -ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
 -if (ret) {
 -return ret;
 -}
 +int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
 +return (ret) ? ret : 4;
 +}
  
 -return 4;
 +static int kvm_max_vcpus(KVMState *s)
 +{
 +int ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 +return (ret) ? ret : kvm_recommended_vcpus(s);
  }
  
  int kvm_init(void)
 @@ -1347,11 +1343,19 @@ int kvm_init(void)
  static const char upgrade_note[] =
  Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n
  (see http://sourceforge.net/projects/kvm).\n;
 +struct {
 +const char *name;
 +int num;
 +} num_cpus[] = {
 +{ SMP,  smp_cpus },
 +{ hotpluggable, max_cpus },
 +{ NULL, }
 +}, *nc = num_cpus;
 +int soft_vcpus_limit, hard_vcpus_limit;
  KVMState *s;
  const KVMCapabilityInfo *missing_cap;
  int ret;
  int i;
 -int max_vcpus;
  
  s = g_malloc0(sizeof(KVMState));
  
 @@ -1392,19 +1396,26 @@ int kvm_init(void)
  goto err;
  }
  
 -max_vcpus = kvm_max_vcpus(s);
 -if (smp_cpus  max_vcpus) {
 -ret = -EINVAL;
 -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus 
 -supported by KVM (%d)\n, smp_cpus, max_vcpus);
 -goto err;
 -}
 +/* check the vcpu limits */
 +soft_vcpus_limit = kvm_recommended_vcpus(s);
 +hard_vcpus_limit = kvm_max_vcpus(s);
  
 -if (max_cpus  max_vcpus) {
 -ret = -EINVAL;
 -fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds 
 max cpus 
 -supported by KVM (%d)\n, max_cpus, max_vcpus);
 -goto err;
 +while (nc-name) {
 +if (nc-num  soft_vcpus_limit) {
 +fprintf(stderr,
 +Warning: Number of %s cpus requested (%d) exceeds 
 +the recommended cpus supported by KVM (%d)\n,
 +nc-name, nc-num, soft_vcpus_limit);
 +
 +if (nc-num  hard_vcpus_limit) {
 +ret = -EINVAL;
 +fprintf(stderr, Number of %s cpus requested (%d) exceeds 
 +the maximum cpus supported by KVM (%d)\n,
 +nc-name, nc-num, hard_vcpus_limit);
 +goto err;
 +}
 +}
 +nc++;
  }
  
  s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
 -- 
 1.8.1.4

ACK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done

2013-09-01 Thread Jason Wang
On 08/31/2013 12:44 AM, Ben Hutchings wrote:
 On Fri, 2013-08-30 at 12:29 +0800, Jason Wang wrote:
 We used to poll vhost queue before making DMA is done, this is racy if vhost
 thread were waked up before marking DMA is done which can result the signal 
 to
 be missed. Fix this by always poll the vhost thread before DMA is done.
 Does this bug only exist in net-next or is it older?  Should the fix go
 to net and stable branches?

This should go for the stable branches too (3.4 above).

Thanks for the checking.

 Ben.

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/vhost/net.c |9 +
  1 files changed, 5 insertions(+), 4 deletions(-)

 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index ff60c2a..d09c17c 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info 
 *ubuf, bool success)
  struct vhost_virtqueue *vq = ubufs-vq;
  int cnt = atomic_read(ubufs-kref.refcount);
  
 +/* set len to mark this desc buffers done DMA */
 +vq-heads[ubuf-desc].len = success ?
 +VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
 +vhost_net_ubuf_put(ubufs);
 +
  /*
   * Trigger polling thread if guest stopped submitting new buffers:
   * in this case, the refcount after decrement will eventually reach 1
 @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info 
 *ubuf, bool success)
   */
  if (cnt = 2 || !(cnt % 16))
  vhost_poll_queue(vq-poll);
 -/* set len to mark this desc buffers done DMA */
 -vq-heads[ubuf-desc].len = success ?
 -VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
 -vhost_net_ubuf_put(ubufs);
  }
  
  /* Expects to be always run from workqueue - which acts as

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Alexey Kardashevskiy
On 09/01/2013 10:06 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.

 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
 Why not have one device that can handle multimple links?


I can do that. If I make it so, it won't even look as a device at all, just
some weird interface to KVM but ok. What bothers me is it is just a
question what I will have to do next. Because I can easily predict a
suggestion to move kvmppc_spapr_tce_table's (a links list) from
kvm-arch.spapr_tce_tables to that device but I cannot do that for obvious
compatibility reasons caused by the fact that the list is already used for
emulated devices (for the starter - they need mmap()).

Or 

Re: [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time

2013-09-01 Thread Jason Wang
On 08/31/2013 02:35 AM, Sergei Shtylyov wrote:
 Hello.

 On 08/30/2013 08:29 AM, Jason Wang wrote:

 Currently, even if the packet length is smaller than
 VHOST_GOODCOPY_LEN, if
 upend_idx != done_idx we still set zcopy_used to true and rollback
 this choice
 later. This could be avoided by determine zerocopy once by checking all
 conditions at one time before.

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
   drivers/vhost/net.c |   46
 +++---
   1 files changed, 19 insertions(+), 27 deletions(-)

 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 8a6dd0d..ff60c2a 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -404,43 +404,35 @@ static void handle_tx(struct vhost_net *net)
  iov_length(nvq-hdr, s), hdr_size);
   break;
   }
 -zcopy_used = zcopy  (len = VHOST_GOODCOPY_LEN ||
 -   nvq-upend_idx != nvq-done_idx);
 +
 +zcopy_used = zcopy  len = VHOST_GOODCOPY_LEN
 + (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx
 + vhost_net_tx_select_zcopy(net);

Could you leave  on a first of two lines, matching the previous
 style?


ok.

   /* use msg_control to pass vhost zerocopy ubuf info to skb */
   if (zcopy_used) {
 +struct ubuf_info *ubuf;
 +ubuf = nvq-ubuf_info + nvq-upend_idx;
 +
   vq-heads[nvq-upend_idx].id = head;
 [...]
 +vq-heads[nvq-upend_idx].len = VHOST_DMA_IN_PROGRESS;
 +ubuf-callback = vhost_zerocopy_callback;
 +ubuf-ctx = nvq-ubufs;
 +ubuf-desc = nvq-upend_idx;
 +msg.msg_control = ubuf;
 +msg.msg_controllen = sizeof(ubuf);

'sizeof(ubuf)' where 'ubuf' is a pointer? Are you sure it shouldn't
 be 'sizeof(*ubuf)'?

Yes, pointer is sufficiet. Vhost allocate an arrays of ubuf and
tun/macvtap just need a reference of it.

 WBR, Sergei

 -- 
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is fallback vhost_net to qemu for live migrate available?

2013-09-01 Thread Jason Wang
On 08/31/2013 12:45 PM, Qin Chuanyu wrote:
 On 2013/8/30 0:08, Anthony Liguori wrote:
 Hi Qin,

 By change the memory copy and notify mechanism ,currently
 virtio-net with
 vhost_net could run on Xen with good performance。

 I think the key in doing this would be to implement a property
 ioeventfd and irqfd interface in the driver domain kernel.  Just
 hacking vhost_net with Xen specific knowledge would be pretty nasty
 IMHO.

 Yes, I add a kernel module which persist virtio-net pio_addr and msix
 address as what kvm module did. Guest wake up vhost thread by adding a
 hook func in evtchn_interrupt.

 Did you modify the front end driver to do grant table mapping or is
 this all being done by mapping the domain's memory?

 There is nothing changed in front end driver. Currently I use
 alloc_vm_area to get address space, and map the domain's memory as
 what what qemu did.

 KVM and Xen represent memory in a very different way.  KVM can only
 track when guest mode code dirties memory.  It relies on QEMU to track
 when guest memory is dirtied by QEMU.  Since vhost is running outside
 of QEMU, vhost also needs to tell QEMU when it has dirtied memory.

 I don't think this is a problem with Xen though.  I believe (although
 could be wrong) that Xen is able to track when either the domain or
 dom0 dirties memory.

 So I think you can simply ignore the dirty logging with vhost and it
 should Just Work.

 Thanks for your advice, I have tried it, without ping, it could
 migrate successfully, but if there has skb been received, domU would
 crash. I guess that because though Xen track domU memory, but it could
 only track memory that changed in DomU. memory changed by Dom0 is out
 of control.


 No, we don't have a mechanism to fallback  to QEMU for the datapath.
 It would be possible but I think it's a bad idea to mix and match the
 two.

 Next I would try to fallback datapath to qemu for three reason:
 1: memory translate mechanism has been changed for vhost_net on
 Xen,so there would be some necessary changed needed for vhost_log in
 kernel.

 2: I also maped IOREQ_PFN page(which is used for communication between
 qemu and Xen) in kernel notify module, so it also needed been marked
 as dirty when tx/rx exist in migrate period.

 3: Most important of all, Michael S. Tsirkin said that he hadn't
 considered about vhost_net migrate on Xen,so there would be some
 changed needed in vhost_log for qemu.

 fallback to qemu seems to much easier, isn't it.

Maybe we can just stop vhost_net in pre_save() and enable it in
post_load()? Then no need to use enable the dirty logging of vhost_net.


 Regards
 Qin chuanyu



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013-09-01 Thread Michael S. Tsirkin
On Fri, Aug 30, 2013 at 12:29:18PM +0800, Jason Wang wrote:
 We tend to batch the used adding and signaling in vhost_zerocopy_callback()
 which may result more than 100 used buffers to be updated in
 vhost_zerocopy_signal_used() in some cases. So wwitch to use

switch

 vhost_add_used_and_signal_n() to avoid multiple calls to
 vhost_add_used_and_signal(). Which means much more less times of used index
 updating and memory barriers.

pls put info on perf gain in commit log too

 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/vhost/net.c |   13 -
  1 files changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 280ee66..8a6dd0d 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
 *net,
  {
   struct vhost_net_virtqueue *nvq =
   container_of(vq, struct vhost_net_virtqueue, vq);
 - int i;
 + int i, add;
   int j = 0;
  
   for (i = nvq-done_idx; i != nvq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
 @@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
 *net,
   vhost_net_tx_err(net);
   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
   vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
 - vhost_add_used_and_signal(vq-dev, vq,
 -   vq-heads[i].id, 0);
   ++j;
   } else
   break;
   }
 - if (j)
 - nvq-done_idx = i;
 + while (j) {
 + add = min(UIO_MAXIOV - nvq-done_idx, j);
 + vhost_add_used_and_signal_n(vq-dev, vq,
 + vq-heads[nvq-done_idx], add);
 + nvq-done_idx = (nvq-done_idx + add) % UIO_MAXIOV;
 + j -= add;
 + }
  }
  
  static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void

2013-09-01 Thread Michael S. Tsirkin
tweak subj s/returns/return/

On Fri, Aug 30, 2013 at 12:29:17PM +0800, Jason Wang wrote:
 None of its caller use its return value, so let it return void.
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/vhost/net.c |5 ++---
  1 files changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 969a859..280ee66 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -276,8 +276,8 @@ static void copy_iovec_hdr(const struct iovec *from, 
 struct iovec *to,
   * of used idx. Once lower device DMA done contiguously, we will signal KVM
   * guest used idx.
   */
 -static int vhost_zerocopy_signal_used(struct vhost_net *net,
 -   struct vhost_virtqueue *vq)
 +static void vhost_zerocopy_signal_used(struct vhost_net *net,
 +struct vhost_virtqueue *vq)
  {
   struct vhost_net_virtqueue *nvq =
   container_of(vq, struct vhost_net_virtqueue, vq);
 @@ -297,7 +297,6 @@ static int vhost_zerocopy_signal_used(struct vhost_net 
 *net,
   }
   if (j)
   nvq-done_idx = i;
 - return j;
  }
  
  static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

2013-09-01 Thread Michael S. Tsirkin
On Fri, Aug 30, 2013 at 12:29:22PM +0800, Jason Wang wrote:
 As Michael point out, We used to limit the max pending DMAs to get better 
 cache
 utilization. But it was not done correctly since it was one done when there's 
 no
 new buffers submitted from guest. Guest can easily exceeds the limitation by
 keeping sending packets.
 
 So this patch moves the check into main loop. Tests shows about 5%-10%
 improvement on per cpu throughput for guest tx. But a 5% drop on per cpu
 transaction rate for a single session TCP_RR.

Any explanation for the drop? single session TCP_RR is unlikely to
exceed VHOST_MAX_PEND, correct?

 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/vhost/net.c |   15 ---
  1 files changed, 4 insertions(+), 11 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index d09c17c..592e1f2 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -363,6 +363,10 @@ static void handle_tx(struct vhost_net *net)
   if (zcopy)
   vhost_zerocopy_signal_used(net, vq);
  
 + if ((nvq-upend_idx + vq-num - VHOST_MAX_PEND) % UIO_MAXIOV ==
 + nvq-done_idx)
 + break;
 +
   head = vhost_get_vq_desc(net-dev, vq, vq-iov,
ARRAY_SIZE(vq-iov),
out, in,
 @@ -372,17 +376,6 @@ static void handle_tx(struct vhost_net *net)
   break;
   /* Nothing new?  Wait for eventfd to tell us they refilled. */
   if (head == vq-num) {
 - int num_pends;
 -
 - /* If more outstanding DMAs, queue the work.
 -  * Handle upend_idx wrap around
 -  */
 - num_pends = likely(nvq-upend_idx = nvq-done_idx) ?
 - (nvq-upend_idx - nvq-done_idx) :
 - (nvq-upend_idx + UIO_MAXIOV -
 -  nvq-done_idx);
 - if (unlikely(num_pends  VHOST_MAX_PEND))
 - break;
   if (unlikely(vhost_enable_notify(net-dev, vq))) {
   vhost_disable_notify(net-dev, vq);
   continue;
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Gleb Natapov
On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
 This reserves a capability number for upcoming support
 of VFIO-IOMMU DMA operations in real mode.
 
 This reserves a number for a new SPAPR TCE IOMMU KVM device
 which is going to manage lifetime of SPAPR TCE IOMMU object.
 
 This defines an attribute of the SPAPR TCE IOMMU KVM device
 which is going to be used for initialization.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 Changes:
 v9:
 * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
 KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute
 
 2013/08/15:
 * fixed mistype in comments
 * fixed commit message which says what uses ioctls 0xad and 0xae
 
 2013/07/16:
 * changed the number
 
 2013/07/11:
 * changed order in a file, added comment about a gap in ioctl number
 ---
  arch/powerpc/include/uapi/asm/kvm.h | 8 
  include/uapi/linux/kvm.h| 2 ++
  2 files changed, 10 insertions(+)
 
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 0fb1a6e..c1ae1e5 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
  #define  KVM_XICS_MASKED (1ULL  41)
  #define  KVM_XICS_PENDING(1ULL  42)
  
 +/* SPAPR TCE IOMMU device specification */
 +struct kvm_create_spapr_tce_iommu_linkage {
 + __u64 liobn;
 + __u32 fd;
 + __u32 flags;
 +};
 +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 99c2533..9d20630 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_ARM_EL1_32BIT 93
  #define KVM_CAP_SPAPR_MULTITCE 94
 +#define KVM_CAP_SPAPR_TCE_IOMMU 95
  
You do not need capability to check for a device support. Device API
supports checking for that with KVM_CREATE_DEVICE_TEST flag to
KVM_CREATE_DEVICE ioctl.

  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -843,6 +844,7 @@ struct kvm_device_attr {
  #define KVM_DEV_TYPE_FSL_MPIC_20 1
  #define KVM_DEV_TYPE_FSL_MPIC_42 2
  #define KVM_DEV_TYPE_XICS3
 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU 4
  
  /*
   * ioctls for VM fds
 -- 
 1.8.4.rc4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Alexey Kardashevskiy
On 09/01/2013 09:27 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
 This reserves a capability number for upcoming support
 of VFIO-IOMMU DMA operations in real mode.

 This reserves a number for a new SPAPR TCE IOMMU KVM device
 which is going to manage lifetime of SPAPR TCE IOMMU object.

 This defines an attribute of the SPAPR TCE IOMMU KVM device
 which is going to be used for initialization.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v9:
 * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
 KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute

 2013/08/15:
 * fixed mistype in comments
 * fixed commit message which says what uses ioctls 0xad and 0xae

 2013/07/16:
 * changed the number

 2013/07/11:
 * changed order in a file, added comment about a gap in ioctl number
 ---
  arch/powerpc/include/uapi/asm/kvm.h | 8 
  include/uapi/linux/kvm.h| 2 ++
  2 files changed, 10 insertions(+)

 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 0fb1a6e..c1ae1e5 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
  #define  KVM_XICS_MASKED(1ULL  41)
  #define  KVM_XICS_PENDING   (1ULL  42)
  
 +/* SPAPR TCE IOMMU device specification */
 +struct kvm_create_spapr_tce_iommu_linkage {
 +__u64 liobn;
 +__u32 fd;
 +__u32 flags;
 +};
 +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE0
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 99c2533..9d20630 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_ARM_EL1_32BIT 93
  #define KVM_CAP_SPAPR_MULTITCE 94
 +#define KVM_CAP_SPAPR_TCE_IOMMU 95
  
 You do not need capability to check for a device support. Device API
 supports checking for that with KVM_CREATE_DEVICE_TEST flag to
 KVM_CREATE_DEVICE ioctl.

Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for
it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special
in some way but SPAPR TCE IOMMU is not? I am confused, sorry.


 
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -843,6 +844,7 @@ struct kvm_device_attr {
  #define KVM_DEV_TYPE_FSL_MPIC_201
  #define KVM_DEV_TYPE_FSL_MPIC_422
  #define KVM_DEV_TYPE_XICS   3
 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU4
  
  /*
   * ioctls for VM fds
 -- 
 1.8.4.rc4
 
 --
   Gleb.
 


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO

2013-09-01 Thread Gleb Natapov
On Sun, Sep 01, 2013 at 09:39:23PM +1000, Alexey Kardashevskiy wrote:
 On 09/01/2013 09:27 PM, Gleb Natapov wrote:
  On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote:
  This reserves a capability number for upcoming support
  of VFIO-IOMMU DMA operations in real mode.
 
  This reserves a number for a new SPAPR TCE IOMMU KVM device
  which is going to manage lifetime of SPAPR TCE IOMMU object.
 
  This defines an attribute of the SPAPR TCE IOMMU KVM device
  which is going to be used for initialization.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
  Changes:
  v9:
  * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with
  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute
 
  2013/08/15:
  * fixed mistype in comments
  * fixed commit message which says what uses ioctls 0xad and 0xae
 
  2013/07/16:
  * changed the number
 
  2013/07/11:
  * changed order in a file, added comment about a gap in ioctl number
  ---
   arch/powerpc/include/uapi/asm/kvm.h | 8 
   include/uapi/linux/kvm.h| 2 ++
   2 files changed, 10 insertions(+)
 
  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
  b/arch/powerpc/include/uapi/asm/kvm.h
  index 0fb1a6e..c1ae1e5 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -511,4 +511,12 @@ struct kvm_get_htab_header {
   #define  KVM_XICS_MASKED  (1ULL  41)
   #define  KVM_XICS_PENDING (1ULL  42)
   
  +/* SPAPR TCE IOMMU device specification */
  +struct kvm_create_spapr_tce_iommu_linkage {
  +  __u64 liobn;
  +  __u32 fd;
  +  __u32 flags;
  +};
  +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE  0
  +
   #endif /* __LINUX_KVM_POWERPC_H */
  diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
  index 99c2533..9d20630 100644
  --- a/include/uapi/linux/kvm.h
  +++ b/include/uapi/linux/kvm.h
  @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_IRQ_XICS 92
   #define KVM_CAP_ARM_EL1_32BIT 93
   #define KVM_CAP_SPAPR_MULTITCE 94
  +#define KVM_CAP_SPAPR_TCE_IOMMU 95
   
  You do not need capability to check for a device support. Device API
  supports checking for that with KVM_CREATE_DEVICE_TEST flag to
  KVM_CREATE_DEVICE ioctl.
 
 Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for
 it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special
 in some way but SPAPR TCE IOMMU is not? I am confused, sorry.
 
 
Looking at it KVM_CAP_IRQ_XICS/KVM_CAP_IRQ_MPIC are not used to detect
device existence, but to link a device to vcpu. KVM_CAP_IRQ_MPIC was
introduced separately from MPIC device code.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Gleb Natapov
On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.
 
 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.
 
 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.
 
 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.
 
 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 v8:
 * fixed warnings from check_patch.pl
 
 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.
 
 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes
 
 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API
 
 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number
 
 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 
 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
Why not have one device that can handle multimple links?

 +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls
 +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE).
 +IOMMU group is a minimal isolated device set which can be passed to
 +the user space via VFIO.
 +
 +Right after creation the device is in uninitlized state and requires
 +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set.
 +The attribute contains liobn, IOMMU fd and flags:
 +
 +struct kvm_create_spapr_tce_iommu_linkage {
 + __u64 liobn;
 + __u32 fd;
 + __u32 flags;
 +};
 +
 

[PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices

2013-09-01 Thread Gleb Natapov
XICS failed to free xics structure on error path. MPIC destroy handler
forgot to delete kvm_device structure.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 Be warned that this is not even compiled tested.

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 94c1dd4..97adfe8 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1244,8 +1244,10 @@ static int kvmppc_xics_create(struct kvm_device *dev, 
u32 type)
kvm-arch.xics = xics;
mutex_unlock(kvm-lock);
 
-   if (ret)
+   if (ret) {
+   kfree(xics);
return ret;
+   }
 
xics_debugfs_init(xics);
 
diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 2861ae9..efbd996 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1635,6 +1635,7 @@ static void mpic_destroy(struct kvm_device *dev)
 
dev-kvm-arch.mpic = NULL;
kfree(opp);
+   kfree(dev);
 }
 
 static int mpic_set_default_irq_routing(struct openpic *opp)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling

2013-09-01 Thread Alexey Kardashevskiy
On 09/01/2013 10:06 PM, Gleb Natapov wrote:
 On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote:
 This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
 and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
 them to user space which saves time on switching to user space and back.

 Both real and virtual modes are supported. The kernel tries to
 handle a TCE request in the real mode, if fails it passes the request
 to the virtual mode to complete the operation. If it a virtual mode
 handler fails, the request is passed to user space.

 The first user of this is VFIO on POWER. Trampolines to the VFIO external
 user API functions are required for this patch.

 This adds a SPAPR TCE IOMMU KVM device to associate a logical bus
 number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
 of map/unmap requests. The device supports a single attribute which is
 a struct with LIOBN and IOMMU fd. When the attribute is set, the device
 establishes the connection between KVM and VFIO.

 Tests show that this patch increases transmission speed from 220MB/s
 to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Changes:
 v9:
 * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU
 KVM device
 * release_spapr_tce_table() is not shared between different TCE types
 * reduced the patch size by moving VFIO external API
 trampolines to separate patche
 * moved documentation from Documentation/virtual/kvm/api.txt to
 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 v8:
 * fixed warnings from check_patch.pl

 2013/07/11:
 * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
 for KVM_BOOK3S_64
 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
 for this here but the next patch for hugepages support will use it more.

 2013/07/06:
 * added realmode arch_spin_lock to protect TCE table from races
 in real and virtual modes
 * POWERPC IOMMU API is changed to support real mode
 * iommu_take_ownership and iommu_release_ownership are protected by
 iommu_table's locks
 * VFIO external user API use rewritten
 * multiple small fixes

 2013/06/27:
 * tce_list page is referenced now in order to protect it from accident
 invalidation during H_PUT_TCE_INDIRECT execution
 * added use of the external user VFIO API

 2013/06/05:
 * changed capability number
 * changed ioctl number
 * update the doc article number

 2013/05/20:
 * removed get_user() from real mode handlers
 * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
 translated TCEs, tries realmode_get_page() on those and if it fails, it
 passes control over the virtual mode handler which tries to finish
 the request handling
 * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
 on a page
 * The only reason to pass the request to user mode now is when the user mode
 did not register TCE table in the kernel, in all other cases the virtual mode
 handler is expected to do the job
 ---
  .../virtual/kvm/devices/spapr_tce_iommu.txt|  37 +++
  arch/powerpc/include/asm/kvm_host.h|   4 +
  arch/powerpc/kvm/book3s_64_vio.c   | 310 
 -
  arch/powerpc/kvm/book3s_64_vio_hv.c| 122 
  arch/powerpc/kvm/powerpc.c |   1 +
  include/linux/kvm_host.h   |   1 +
  virt/kvm/kvm_main.c|   5 +
  7 files changed, 477 insertions(+), 3 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

 diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt 
 b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 new file mode 100644
 index 000..4bc8fc3
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt
 @@ -0,0 +1,37 @@
 +SPAPR TCE IOMMU device
 +
 +Capability: KVM_CAP_SPAPR_TCE_IOMMU
 +Architectures: powerpc
 +
 +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU
 +
 +Groups:
 +  KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE
 +  Attributes: single attribute with pair { LIOBN, IOMMU fd}
 +
 +This is completely made up device which provides API to link
 +logical bus number (LIOBN) and IOMMU group. The user space has
 +to create a new SPAPR TCE IOMMU device per a logical bus.
 +
 Why not have one device that can handle multimple links?


I can do that. If I make it so, it won't even look as a device at all, just
some weird interface to KVM but ok. What bothers me is it is just a
question what I will have to do next. Because I can easily predict a
suggestion to move kvmppc_spapr_tce_table's (a links list) from
kvm-arch.spapr_tce_tables to that device but I cannot do that for obvious
compatibility reasons caused by the fact that the list is already used for
emulated devices (for the starter - they need mmap()).

Or