date:20200318

[PATCH v2] target/ppc: Fix ISA v3.0 (POWER9) slbia implementation

2020-03-18 Thread Nicholas Piggin

The new ISA v3.0 slbia variants have not been implemented for TCG,
which can lead to crashing when a POWER9 machine boots Linux using
the hash MMU, for example ("disable_radix" kernel command line).

Add them.

Signed-off-by: Nicholas Piggin 
---
Changes in v2:
- Rewrite changelog.
- Remove stray slbie hunk that crept in

I don't think the slbie invalidation is necessary, as explained on the
list.

 target/ppc/helper.h |  2 +-
 target/ppc/mmu-hash64.c | 56 +++--
 target/ppc/translate.c  |  5 +++-
 3 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index ee1498050d..2dfa1c6942 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -615,7 +615,7 @@ DEF_HELPER_FLAGS_3(store_slb, TCG_CALL_NO_RWG, void, env, 
tl, tl)
 DEF_HELPER_2(load_slb_esid, tl, env, tl)
 DEF_HELPER_2(load_slb_vsid, tl, env, tl)
 DEF_HELPER_2(find_slb_vsid, tl, env, tl)
-DEF_HELPER_FLAGS_1(slbia, TCG_CALL_NO_RWG, void, env)
+DEF_HELPER_FLAGS_2(slbia, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_FLAGS_2(slbie, TCG_CALL_NO_RWG, void, env, tl)
 DEF_HELPER_FLAGS_2(slbieg, TCG_CALL_NO_RWG, void, env, tl)
 #endif
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 373d44de74..e5baabf0e1 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -95,9 +95,10 @@ void dump_slb(PowerPCCPU *cpu)
 }
 }
 
-void helper_slbia(CPUPPCState *env)
+void helper_slbia(CPUPPCState *env, uint32_t ih)
 {
 PowerPCCPU *cpu = env_archcpu(env);
+int starting_entry;
 int n;
 
 /*
@@ -111,18 +112,59 @@ void helper_slbia(CPUPPCState *env)
  * expected that slbmte is more common than slbia, and slbia is usually
  * going to evict valid SLB entries, so that tradeoff is unlikely to be a
  * good one.
+ *
+ * ISA v2.05 introduced IH field with values 0,1,2,6. These all invalidate
+ * the same SLB entries (everything but entry 0), but differ in what
+ * "lookaside information" is invalidated. TCG can ignore this and flush
+ * everything.
+ *
+ * ISA v3.0 introduced additional values 3,4,7, which change what SLBs are
+ * invalidated.
  */
 
-/* XXX: Warning: slbia never invalidates the first segment */
-for (n = 1; n < cpu->hash64_opts->slb_size; n++) {
-ppc_slb_t *slb = &env->slb[n];
+env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
+
+starting_entry = 1; /* default for IH=0,1,2,6 */
+
+if (env->mmu_model == POWERPC_MMU_3_00) {
+switch (ih) {
+case 0x7:
+/* invalidate no SLBs, but all lookaside information */
+return;
 
-if (slb->esid & SLB_ESID_V) {
-slb->esid &= ~SLB_ESID_V;
+case 0x3:
+case 0x4:
+/* also considers SLB entry 0 */
+starting_entry = 0;
+break;
+
+case 0x5:
+/* treat undefined values as ih==0, and warn */
+qemu_log_mask(LOG_GUEST_ERROR,
+  "slbia undefined IH field %u.\n", ih);
+break;
+
+default:
+/* 0,1,2,6 */
+break;
 }
 }
 
-env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
+for (n = starting_entry; n < cpu->hash64_opts->slb_size; n++) {
+ppc_slb_t *slb = &env->slb[n];
+
+if (!(slb->esid & SLB_ESID_V)) {
+continue;
+}
+if (env->mmu_model == POWERPC_MMU_3_00) {
+if (ih == 0x3 && (slb->vsid & SLB_VSID_C) == 0) {
+/* preserves entries with a class value of 0 */
+continue;
+}
+}
+
+slb->esid &= ~SLB_ESID_V;
+}
 }
 
 static void __helper_slbie(CPUPPCState *env, target_ulong addr,
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index eb0ddba850..e514732a09 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -5027,12 +5027,15 @@ static void gen_tlbsync(DisasContext *ctx)
 /* slbia */
 static void gen_slbia(DisasContext *ctx)
 {
+uint32_t ih = (ctx->opcode >> 21) & 0x7;
+TCGv_i32 t0 = tcg_const_i32(ih);
+
 #if defined(CONFIG_USER_ONLY)
 GEN_PRIV;
 #else
 CHK_SV;
 
-gen_helper_slbia(cpu_env);
+gen_helper_slbia(cpu_env, t0);
 #endif /* defined(CONFIG_USER_ONLY) */
 }
 
-- 
2.23.0

Re: [PATCH v14 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 03:41:14AM +0800, Kirti Wankhede wrote:
> Added a check such that only singleton IOMMU groups can pin pages.
> From the point when vendor driver pins any pages, consider IOMMU group
> dirty page scope to be limited to pinned pages.
> 
> To optimize to avoid walking list often, added flag
> pinned_page_dirty_scope to indicate if all of the vfio_groups for each
> vfio_domain in the domain_list dirty page scope is limited to pinned
> pages. This flag is updated on first pinned pages request for that IOMMU
> group and on attaching/detaching group.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio.c | 13 +--
>  drivers/vfio/vfio_iommu_type1.c | 77 
> +++--
>  include/linux/vfio.h|  4 ++-
>  3 files changed, 87 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 210fcf426643..311b5e4e111e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -85,6 +85,7 @@ struct vfio_group {
>   atomic_topened;
>   wait_queue_head_t   container_q;
>   boolnoiommu;
> + unsigned intdev_counter;
>   struct kvm  *kvm;
>   struct blocking_notifier_head   notifier;
>  };
> @@ -555,6 +556,7 @@ struct vfio_device *vfio_group_create_device(struct 
> vfio_group *group,
>  
>   mutex_lock(&group->device_lock);
>   list_add(&device->group_next, &group->device_list);
> + group->dev_counter++;
>   mutex_unlock(&group->device_lock);
>  
>   return device;
> @@ -567,6 +569,7 @@ static void vfio_device_release(struct kref *kref)
>   struct vfio_group *group = device->group;
>  
>   list_del(&device->group_next);
> + group->dev_counter--;
>   mutex_unlock(&group->device_lock);
>  
>   dev_set_drvdata(device->dev, NULL);
> @@ -1933,6 +1936,9 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   if (!group)
>   return -ENODEV;
>  
> + if (group->dev_counter > 1)
> + return -EINVAL;
> +
>   ret = vfio_group_add_container_user(group);
>   if (ret)
>   goto err_pin_pages;
> @@ -1940,7 +1946,8 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   container = group->container;
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
> - ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> + ret = driver->ops->pin_pages(container->iommu_data,
> +  group->iommu_group, user_pfn,
>npage, prot, phys_pfn);
>   else
>   ret = -ENOTTY;
> @@ -2038,8 +2045,8 @@ int vfio_group_pin_pages(struct vfio_group *group,
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
>   ret = driver->ops->pin_pages(container->iommu_data,
> -  user_iova_pfn, npage,
> -  prot, phys_pfn);
> +  group->iommu_group, user_iova_pfn,
> +  npage, prot, phys_pfn);
>   else
>   ret = -ENOTTY;
>  
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 912629320719..deec09f4b0f6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -72,6 +72,7 @@ struct vfio_iommu {
>   boolv2;
>   boolnesting;
>   booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
>  };
>  
>  struct vfio_domain {
> @@ -99,6 +100,7 @@ struct vfio_group {
>   struct iommu_group  *iommu_group;
>   struct list_headnext;
>   boolmdev_group; /* An mdev group */
> + boolpinned_page_dirty_scope;
>  };
>  
>  struct vfio_iova {
> @@ -132,6 +134,10 @@ struct vfio_regions {
>  static int put_pfn(unsigned long pfn, int prot);
>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
> +static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu 
> *iommu,
> +struct iommu_group *iommu_group);
> +
> +static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
>  /*
>   * This code handles mapping and unmapping of user data buffers
>   * into DMA'ble space using the IOMMU
> @@ -556,11 +562,13 @@ static int vfio_unpin_page_external(struct vfio_dma 
> *dma, dma_addr_t iova,
>  }
>  
>  static int vfio_iommu_type1_pin_pages(void *iommu_data,
> +   struct iommu_group *iommu_group,
> unsigned long

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 12:40:53PM +0800, Alex Williamson wrote:
> On Thu, 19 Mar 2020 00:15:33 -0400
> Yan Zhao  wrote:
> 
> > On Thu, Mar 19, 2020 at 12:01:00PM +0800, Alex Williamson wrote:
> > > On Wed, 18 Mar 2020 23:06:39 -0400
> > > Yan Zhao  wrote:
> > >   
> > > > On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:  
> > > > > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> > > > > - Start dirty pages tracking while migration is active
> > > > > - Stop dirty pages tracking.
> > > > > - Get dirty pages bitmap. Its user space application's responsibility 
> > > > > to
> > > > >   copy content of dirty pages from source to destination during 
> > > > > migration.
> > > > > 
> > > > > To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> > > > > structure. Bitmap size is calculated considering smallest supported 
> > > > > page
> > > > > size. Bitmap is allocated for all vfio_dmas when dirty logging is 
> > > > > enabled
> > > > > 
> > > > > Bitmap is populated for already pinned pages when bitmap is allocated 
> > > > > for
> > > > > a vfio_dma with the smallest supported page size. Update bitmap from
> > > > > pinning functions when tracking is enabled. When user application 
> > > > > queries
> > > > > bitmap, check if requested page size is same as page size used to
> > > > > populated bitmap. If it is equal, copy bitmap, but if not equal, 
> > > > > return
> > > > > error.
> > > > > 
> > > > > Signed-off-by: Kirti Wankhede 
> > > > > Reviewed-by: Neo Jia 
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 205 
> > > > > +++-
> > > > >  1 file changed, 203 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > > index 70aeab921d0f..d6417fb02174 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -71,6 +71,7 @@ struct vfio_iommu {
> > > > >   unsigned intdma_avail;
> > > > >   boolv2;
> > > > >   boolnesting;
> > > > > + booldirty_page_tracking;
> > > > >  };
> > > > >  
> > > > >  struct vfio_domain {
> > > > > @@ -91,6 +92,7 @@ struct vfio_dma {
> > > > >   boollock_cap;   /* 
> > > > > capable(CAP_IPC_LOCK) */
> > > > >   struct task_struct  *task;
> > > > >   struct rb_root  pfn_list;   /* Ex-user pinned pfn 
> > > > > list */
> > > > > + unsigned long   *bitmap;
> > > > >  };
> > > > >  
> > > > >  struct vfio_group {
> > > > > @@ -125,7 +127,10 @@ struct vfio_regions {
> > > > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> > > > >   
> > > > > (!list_empty(&iommu->domain_list))
> > > > >  
> > > > > +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> > > > > BITS_PER_BYTE)
> > > > > +
> > > > >  static int put_pfn(unsigned long pfn, int prot);
> > > > > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> > > > >  
> > > > >  /*
> > > > >   * This code handles mapping and unmapping of user data buffers
> > > > > @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu 
> > > > > *iommu, struct vfio_dma *old)
> > > > >   rb_erase(&old->node, &iommu->dma_list);
> > > > >  }
> > > > >  
> > > > > +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t 
> > > > > pgsize)
> > > > > +{
> > > > > + struct rb_node *n = rb_first(&iommu->dma_list);
> > > > > +
> > > > > + for (; n; n = rb_next(n)) {
> > > > > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, 
> > > > > node);
> > > > > + struct rb_node *p;
> > > > > + unsigned long npages = dma->size / pgsize;
> > > > > +
> > > > > + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), 
> > > > > GFP_KERNEL);
> > > > > + if (!dma->bitmap) {
> > > > > + struct rb_node *p = rb_prev(n);
> > > > > +
> > > > > + for (; p; p = rb_prev(p)) {
> > > > > + struct vfio_dma *dma = rb_entry(n,
> > > > > + struct 
> > > > > vfio_dma, node);
> > > > > +
> > > > > + kfree(dma->bitmap);
> > > > > + dma->bitmap = NULL;
> > > > > + }
> > > > > + return -ENOMEM;
> > > > > + }
> > > > > +
> > > > > + if (RB_EMPTY_ROOT(&dma->pfn_list))
> > > > > + continue;
> > > > > +
> > > > > + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> > > > > + struct vfio_pfn *vpfn = rb_entry(p, struct 
> > > > > vfio_pfn,
> > > > > +  node);
> > > > > +
> > > > > +

Re: [PATCH v3 2/2] target/ppc: Add support for scv and rfscv instructions

2020-03-18 Thread David Gibson

On Tue, Mar 17, 2020 at 03:49:18PM +1000, Nicholas Piggin wrote:
> POWER9 adds scv and rfscv instructions and the system call vectored
> interrupt. Linux does not support this instruction yet but it has
> been tested with a modified kernel that runs on real hardware.
> 
> Signed-off-by: Nicholas Piggin 

Applied to ppc-for-5.1.

> ---
> Since v2:
> - Rebased on top of FWNMI series
> 
>  linux-user/ppc/cpu_loop.c   |  1 +
>  target/ppc/cpu.h|  7 ++-
>  target/ppc/excp_helper.c| 98 -
>  target/ppc/helper.h |  1 +
>  target/ppc/translate.c  | 46 +++-
>  target/ppc/translate_init.inc.c |  3 +-
>  6 files changed, 126 insertions(+), 30 deletions(-)
> 
> diff --git a/linux-user/ppc/cpu_loop.c b/linux-user/ppc/cpu_loop.c
> index 5b27f8603e..df71e15a25 100644
> --- a/linux-user/ppc/cpu_loop.c
> +++ b/linux-user/ppc/cpu_loop.c
> @@ -267,6 +267,7 @@ void cpu_loop(CPUPPCState *env)
>  queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
>  break;
>  case POWERPC_EXCP_SYSCALL:  /* System call exception 
> */
> +case POWERPC_EXCP_SYSCALL_VECTORED:
>  cpu_abort(cs, "Syscall exception while in user mode. "
>"Aborting\n");
>  break;
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index ed8d2015bd..992f0a49e8 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -127,8 +127,9 @@ enum {
>  POWERPC_EXCP_SDOOR_HV = 100,
>  /* ISA 3.00 additions */
>  POWERPC_EXCP_HVIRT= 101,
> +POWERPC_EXCP_SYSCALL_VECTORED = 102, /* scv exception
>  */
>  /* EOL   
> */
> -POWERPC_EXCP_NB   = 102,
> +POWERPC_EXCP_NB   = 103,
>  /* QEMU exceptions: used internally during code translation  
> */
>  POWERPC_EXCP_STOP = 0x200, /* stop translation   
> */
>  POWERPC_EXCP_BRANCH   = 0x201, /* branch instruction 
> */
> @@ -475,6 +476,7 @@ typedef struct ppc_v3_pate_t {
>  /* Facility Status and Control (FSCR) bits */
>  #define FSCR_EBB(63 - 56) /* Event-Based Branch Facility */
>  #define FSCR_TAR(63 - 55) /* Target Address Register */
> +#define FSCR_SCV(63 - 51) /* System call vectored */
>  /* Interrupt cause mask and position in FSCR. HFSCR has the same format */
>  #define FSCR_IC_MASK(0xFFULL)
>  #define FSCR_IC_POS (63 - 7)
> @@ -484,6 +486,7 @@ typedef struct ppc_v3_pate_t {
>  #define FSCR_IC_TM  5
>  #define FSCR_IC_EBB 7
>  #define FSCR_IC_TAR 8
> +#define FSCR_IC_SCV12
>  
>  /* Exception state register bits definition  
> */
>  #define ESR_PIL   PPC_BIT(36) /* Illegal Instruction*/
> @@ -551,6 +554,8 @@ enum {
>  POWERPC_FLAG_VSX  = 0x0008,
>  /* Has Transaction Memory (ISA 2.07) 
> */
>  POWERPC_FLAG_TM   = 0x0010,
> +/* Has SCV (ISA 3.00)
> */
> +POWERPC_FLAG_SCV  = 0x0020,
>  };
>  
>  
> /*/
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 81ee19ebae..73b5c28d03 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -67,6 +67,18 @@ static inline void dump_syscall(CPUPPCState *env)
>ppc_dump_gpr(env, 8), env->nip);
>  }
>  
> +static inline void dump_syscall_vectored(CPUPPCState *env)
> +{
> +qemu_log_mask(CPU_LOG_INT, "syscall r0=%016" PRIx64
> +  " r3=%016" PRIx64 " r4=%016" PRIx64 " r5=%016" PRIx64
> +  " r6=%016" PRIx64 " r7=%016" PRIx64 " r8=%016" PRIx64
> +  " nip=" TARGET_FMT_lx "\n",
> +  ppc_dump_gpr(env, 0), ppc_dump_gpr(env, 3),
> +  ppc_dump_gpr(env, 4), ppc_dump_gpr(env, 5),
> +  ppc_dump_gpr(env, 6), ppc_dump_gpr(env, 7),
> +  ppc_dump_gpr(env, 8), env->nip);
> +}
> +
>  static inline void dump_hcall(CPUPPCState *env)
>  {
>  qemu_log_mask(CPU_LOG_INT, "hypercall r3=%016" PRIx64
> @@ -185,7 +197,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = &cpu->env;
>  target_ulong msr, new_msr, vector;
> -int srr0, srr1, asrr0, asrr1, lev, ail;
> +int srr0, srr1, asrr0, asrr1, lev = -1, ail;
>  bool lpes0;
>  
>  qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
> @@ -421,6 +433,13 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  new_msr |= (target_ulong)MSR_HVB;
>  }
>  break;
> +case POWERPC_EXCP_SYSCALL_VECTORED: /* scv exception

Re: [PATCH v3 1/2] target/ppc: Improve syscall exception logging

2020-03-18 Thread David Gibson

On Tue, Mar 17, 2020 at 03:49:17PM +1000, Nicholas Piggin wrote:
> system calls (at least in Linux) use registers r3-r8 for inputs, so
> include those registers in the dump.
> 
> This also adds a mode for PAPR hcalls, which have a different calling
> convention.
> 
> Signed-off-by: Nicholas Piggin 

Applied to a newly created ppc-for-5.1 branch.

> ---
> Since v2:
> - Rebased on top of FWNMI series
> 
>  target/ppc/excp_helper.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 08bc885ca6..81ee19ebae 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -57,12 +57,29 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  #else /* defined(CONFIG_USER_ONLY) */
>  static inline void dump_syscall(CPUPPCState *env)
>  {
> -qemu_log_mask(CPU_LOG_INT, "syscall r0=%016" PRIx64 " r3=%016" PRIx64
> -  " r4=%016" PRIx64 " r5=%016" PRIx64 " r6=%016" PRIx64
> +qemu_log_mask(CPU_LOG_INT, "syscall r0=%016" PRIx64
> +  " r3=%016" PRIx64 " r4=%016" PRIx64 " r5=%016" PRIx64
> +  " r6=%016" PRIx64 " r7=%016" PRIx64 " r8=%016" PRIx64
>" nip=" TARGET_FMT_lx "\n",
>ppc_dump_gpr(env, 0), ppc_dump_gpr(env, 3),
>ppc_dump_gpr(env, 4), ppc_dump_gpr(env, 5),
> -  ppc_dump_gpr(env, 6), env->nip);
> +  ppc_dump_gpr(env, 6), ppc_dump_gpr(env, 7),
> +  ppc_dump_gpr(env, 8), env->nip);
> +}
> +
> +static inline void dump_hcall(CPUPPCState *env)
> +{
> +qemu_log_mask(CPU_LOG_INT, "hypercall r3=%016" PRIx64
> +   " r4=%016" PRIx64 " r5=%016" PRIx64 " r6=%016" PRIx64
> +   " r7=%016" PRIx64 " r8=%016" PRIx64 " r9=%016" PRIx64
> +   " r10=%016" PRIx64 " r11=%016" PRIx64 " r12=%016" PRIx64
> +  " nip=" TARGET_FMT_lx "\n",
> +  ppc_dump_gpr(env, 3), ppc_dump_gpr(env, 4),
> +   ppc_dump_gpr(env, 5), ppc_dump_gpr(env, 6),
> +   ppc_dump_gpr(env, 7), ppc_dump_gpr(env, 8),
> +   ppc_dump_gpr(env, 9), ppc_dump_gpr(env, 10),
> +   ppc_dump_gpr(env, 11), ppc_dump_gpr(env, 12),
> +   env->nip);
>  }
>  
>  static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState *env, int excp,
> @@ -379,9 +396,14 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  }
>  break;
>  case POWERPC_EXCP_SYSCALL:   /* System call exception
> */
> -dump_syscall(env);
>  lev = env->error_code;
>  
> +if ((lev == 1) && cpu->vhyp) {
> +dump_hcall(env);
> +} else {
> +dump_syscall(env);
> +}
> +
>  /*
>   * We need to correct the NIP which in this case is supposed
>   * to point to the next instruction

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 1/1] target/ppc: don't byte swap ELFv2 signal handler

2020-03-18 Thread David Gibson

On Wed, Mar 18, 2020 at 12:01:16PM -0500, Vincent Fazio wrote:
> From: Vincent Fazio 
> 
> Previously, the signal handler would be byte swapped if the target and
> host CPU used different endianness. This would cause a SIGSEGV when
> attempting to translate the opcode pointed to by the swapped address.
> 
>  Thread 1 "qemu-ppc64" received signal SIGSEGV, Segmentation fault.
>  0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:351
>  351__builtin_memcpy(&r, ptr, sizeof(r));
> 
>  #0  0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:351
>  #1  0x600a92fe in ldl_be_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:449
>  #2  0x600c0790 in translator_ldl_swap at 
> qemu/include/exec/translator.h:201
>  #3  0x6011c1ab in ppc_tr_translate_insn at 
> qemu/target/ppc/translate.c:7856
>  #4  0x6005ae70 in translator_loop at qemu/accel/tcg/translator.c:102
> 
> Now, no swap is performed and execution continues properly.

Patch is good, but the message still needs some work.  It should point
out that the necessary swap is already done at sigaction() time by the
__get_user().

> 
> Signed-off-by: Vincent Fazio 
> Reviewed-by: Laurent Vivier 
> ---
> Changes since v1:
> - Drop host/target endianness callouts
> - Drop unnecessary pointer cast
> - Clarify commit message
> 
>  linux-user/ppc/signal.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
> index 5b82af6cb6..b8613c5e1b 100644
> --- a/linux-user/ppc/signal.c
> +++ b/linux-user/ppc/signal.c
> @@ -567,10 +567,8 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
>  env->nip = tswapl(handler->entry);
>  env->gpr[2] = tswapl(handler->toc);
>  } else {
> -/* ELFv2 PPC64 function pointers are entry points, but R12
> - * must also be set */
> -env->nip = tswapl((target_ulong) ka->_sa_handler);
> -env->gpr[12] = env->nip;
> +/* ELFv2 PPC64 function pointers are entry points. R12 must also be 
> set. */
> +env->gpr[12] = env->nip = ka->_sa_handler;
>  }
>  #else
>  env->nip = (target_ulong) ka->_sa_handler;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Bug 1866962] Re: [Regression]Powerpc kvm guest unable to start with hugepage backed memory

2020-03-18 Thread Satheesh Rajendran

Any updates?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866962

Title:
  [Regression]Powerpc kvm guest unable to start with hugepage backed
  memory

Status in QEMU:
  New

Bug description:
  Current upstream qemu master does not boot a powerpc kvm guest backed
  by hugepage.

  HW: Power9 (DD2.3)
  Host Kernel: 5.6.0-rc5
  Guest Kernel: 5.6.0-rc5
  Qemu: ba29883206d92a29ad5a466e679ccfc2ee6132ef

  Steps to reproduce:
  1. Allocate enough hugepage to boot a KVM guest
  # cat /proc/meminfo |grep ^HugePages
  HugePages_Total:5000
  HugePages_Free: 5000
  HugePages_Rsvd:0
  HugePages_Surp:0

  2. Define and boot a guest
  /usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 
'vm1' --machine pseries --memory=8192,hugepages=yes 
--vcpu=8,maxvcpus=8,sockets=1,cores=8,threads=1 --import --nographics --serial 
pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk 
path=/home/kvmci/tests/data/avocado-vt/images/f31-ppc64le.qcow2,bus=scsi,size=10,format=qcow2
 --network=bridge=virbr0,model=virtio,mac=52:54:00:5f:82:83 
--mac=52:54:00:5f:82:83 --boot 
emulator=/home/sath/qemu/ppc64-softmmu/qemu-system-ppc64,kernel=/home/kvmci/linux/vmlinux,kernel_args="root=/dev/sda5
 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug selinux=0" 
--noautoconsole

  Starting install...
  ERRORinternal error: qemu unexpectedly closed the monitor: 
qemu-system-ppc64: util/qemu-thread-posix.c:76: qemu_mutex_lock_impl: Assertion 
`mutex->initialized' failed.
  qemu-system-ppc64: util/qemu-thread-posix.c:76: qemu_mutex_lock_impl: 
Assertion `mutex->initialized' failed.

   ---NOK

  
  Bisected the issue to below commit.

  037fb5eb3941c80a2b7c36a843e47207ddb004d4 is the first bad commit
  commit 037fb5eb3941c80a2b7c36a843e47207ddb004d4
  Author: bauerchen 
  Date:   Tue Feb 11 17:10:35 2020 +0800

  mem-prealloc: optimize large guest startup
  
  [desc]:
  Large memory VM starts slowly when using -mem-prealloc, and
  there are some areas to optimize in current method;
  
  1、mmap will be used to alloc threads stack during create page
  clearing threads, and it will attempt mm->mmap_sem for write
  lock, but clearing threads have hold read lock, this competition
  will cause threads createion very slow;
  
  2、methods of calcuating pages for per threads is not well;if we use
  64 threads to split 160 hugepage,63 threads clear 2page,1 thread
  clear 34 page,so the entire speed is very slow;
  
  to solve the first problem,we add a mutex in thread function,and
  start all threads when all threads finished createion;
  and the second problem, we spread remainder to other threads,in
  situation that 160 hugepage and 64 threads, there are 32 threads
  clear 3 pages,and 32 threads clear 2 pages.
  
  [test]:
  320G 84c VM start time can be reduced to 10s
  680G 84c VM start time can be reduced to 18s
  
  Signed-off-by: bauerchen 
  Reviewed-by: Pan Rui 
  Reviewed-by: Ivan Ren 
  [Simplify computation of the number of pages per thread. - Paolo]
  Signed-off-by: Paolo Bonzini 

   util/oslib-posix.c | 32 
   1 file changed, 24 insertions(+), 8 deletions(-)


  bisect log:

  # git bisect log
  git bisect start
  # good: [52901abf94477b400cf88c1f70bb305e690ba2de] Update version for 
v4.2.0-rc5 release
  git bisect good 52901abf94477b400cf88c1f70bb305e690ba2de
  # bad: [ba29883206d92a29ad5a466e679ccfc2ee6132ef] Merge remote-tracking 
branch 'remotes/borntraeger/tags/s390x-20200310' into staging
  git bisect bad ba29883206d92a29ad5a466e679ccfc2ee6132ef
  # good: [d1ebbc9d16297b54b153ee33abe05eb4f1df0c66] target/arm/kvm: trivial: 
Clean up header documentation
  git bisect good d1ebbc9d16297b54b153ee33abe05eb4f1df0c66
  # good: [87b74e8b6edd287ea2160caa0ebea725fa8f1ca1] target/arm: Vectorize USHL 
and SSHL
  git bisect good 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1
  # bad: [e0175b71638cf4398903c0d25f93fe62e0606389] Merge remote-tracking 
branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging
  git bisect bad e0175b71638cf4398903c0d25f93fe62e0606389
  # bad: [ca6155c0f2bd39b4b4162533be401c98bd960820] Merge tag 
'patchew/20200219160953.13771-1-imamm...@redhat.com' of 
https://github.com/patchew-project/qemu into HEAD
  git bisect bad ca6155c0f2bd39b4b4162533be401c98bd960820
  # good: [ab74e543112957696f7c79b0c33ecebd18b52af5] ppc/spapr: use memdev for 
RAM
  git bisect good ab74e543112957696f7c79b0c33ecebd18b52af5
  # good: [cb06fdad05f3e546a4e20f1f3c0127f9ae53de1a] fuzz: support for 
fork-based fuzzing.
  git bisect good cb06fdad05f3e546a4e20f1f3c0127f9ae53de1a
  # bad: [037fb5eb3941c80a2b7c36a843e47207ddb004d4] mem-prealloc: opt

Re: [PATCH 1/1] target/ppc: fix ELFv2 signal handler endianness

2020-03-18 Thread David Gibson

On Wed, Mar 18, 2020 at 10:00:20AM -0500, Vincent Fazio wrote:
> David, Laurent,
> 
> On 3/15/20 9:21 PM, David Gibson wrote:
> > On Sun, Mar 15, 2020 at 07:29:04PM -0500, Vincent Fazio wrote:
> > > Laurent,
> > > 
> > > On Sun, Mar 15, 2020 at 1:10 PM Laurent Vivier  wrote:
> > > > Le 15/03/2020 à 16:52, Vincent Fazio a écrit :
> > > > > From: Vincent Fazio 
> > > > > 
> > > > > In ELFv2, function pointers are entry points and are in host 
> > > > > endianness.
> > > > "host endianness" is misleading here. "target endianness" is better.
> > Yeah, the trouble here is that I think the ELF spec will use "host"
> > and "target" in a quite different sense than qemu.
> > 
> I'll be simplifying the wording in the message to just mention the
> problematic cross-endian scenario
> > > I do want to clarify here. In a mixed endian scenario (my test case
> > > was an x86 host and e5500 PPC BE target), the function pointers are in
> > > host endianness (little endian) so that the virtual address can be
> > > dereferenced by the host for the target instructions to be
> > > translated.
> > This can't be right.  The ELF is operating entirely within the guest,
> > and has no concept of a host (in the qemu sense).  Therefore it's
> > impossible for it to specify anything as "host endian" (again in the
> > qemu sense).
> > 
> > It *is* possible that it's little endian explicitly (in which case
> > we'd need a conditional swap that's different from the one we have
> > now).
> > 
> > But even that seems pretty odd.  AFAICT that target_sigaction
> > structure is copied verbatim from guest memory when the guest makes
> > the sigaction() syscall.  Are we expecting a BE process to put LE
> > parameters into a syscall structure?  That seems unlikely.
> > 
> > I really think you need to put some instrumentation in the sigaction()
> > call that comes before this, to see exactly what the guest process is
> > supplying there.
> > 
> > And then we maybe need to look at your guest side libc and/or a native
> > e5500 BE kernel to see what it expects in that structure.
> As we discussed in the other thread, I missed the endian swap done as part
> of get_user in do_sigaction.

Right, as did I when I looked the first time.

> So while my initial determination for the root
> cause of the problem was wrong, the fix is still the same (drop the `tswapl`
> call). The commit message will be updated.

Right, agreed.

> > > > > Previously, the signal handler would be swapped if the target CPU was 
> > > > > a
> > > > > different endianness than the host. This would cause a SIGSEGV when
> > > > > attempting to translate the opcode pointed to by the swapped address.
> > > > This is correct.
> > > > 
> > > > >   Thread 1 "qemu-ppc64" received signal SIGSEGV, Segmentation fault.
> > > > >   0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> > > > > qemu/include/qemu/bswap.h:351
> > > > >   351__builtin_memcpy(&r, ptr, sizeof(r));
> > > > > 
> > > > >   #0  0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> > > > > qemu/include/qemu/bswap.h:351
> > > > >   #1  0x600a92fe in ldl_be_p (ptr=0x4c2c0610) at 
> > > > > qemu/include/qemu/bswap.h:449
> > > > >   #2  0x600c0790 in translator_ldl_swap at 
> > > > > qemu/include/exec/translator.h:201
> > > > >   #3  0x6011c1ab in ppc_tr_translate_insn at 
> > > > > qemu/target/ppc/translate.c:7856
> > > > >   #4  0x6005ae70 in translator_loop at 
> > > > > qemu/accel/tcg/translator.c:102
> > > > > 
> > > > > Now, no swap is performed and execution continues properly.
> > > > > 
> > > > > Signed-off-by: Vincent Fazio 
> > > > > ---
> > > > >   linux-user/ppc/signal.c | 10 +++---
> > > > >   1 file changed, 7 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
> > > > > index 5b82af6cb6..c7f6455170 100644
> > > > > --- a/linux-user/ppc/signal.c
> > > > > +++ b/linux-user/ppc/signal.c
> > > > > @@ -567,9 +567,13 @@ void setup_rt_frame(int sig, struct 
> > > > > target_sigaction *ka,
> > > > >   env->nip = tswapl(handler->entry);
> > > > >   env->gpr[2] = tswapl(handler->toc);
> > > > >   } else {
> > > > > -/* ELFv2 PPC64 function pointers are entry points, but R12
> > > > > - * must also be set */
> > > > > -env->nip = tswapl((target_ulong) ka->_sa_handler);
> > > > > +/*
> > > > > + * ELFv2 PPC64 function pointers are entry points and are in 
> > > > > host
> > > > > + * endianness so should not to be swapped.
> > > > "target endianness"
> > > > 
> > > > > + *
> > > > > + * Note: R12 must also be set.
> > > > > + */
> > > > > +env->nip = (target_ulong) ka->_sa_handler;
> > > > The cast is not needed: nip and _sa_handler are abi_ulong.
> > > I'll drop this in v2
> > > 
> > > > >   env->gpr[12] = env->nip;
> > > > >   }
> > > > >   #else
> > > > > 
> > > > If you repost with

Re: [PATCH 1/2] target/ppc: Fix slbia TLB invalidation gap

2020-03-18 Thread David Gibson

On Wed, Mar 18, 2020 at 02:41:34PM +1000, Nicholas Piggin wrote:
> slbia must invalidate TLBs even if it does not remove a valid SLB
> entry, because slbmte can overwrite valid entries without removing
> their TLBs.
> 
> As the architecture says, slbia invalidates all lookaside information,
> not conditionally based on if it removed valid entries.
> 
> It does not seem possible for POWER8 or earlier Linux kernels to hit
> this bug because it never changes its kernel SLB translations, and it
> should always have valid entries if any accesses are made to usespace
> regions. However other operating systems which may modify SLB entry 0
> or do more fancy things with segments might be affected.
> 
> When POWER9 slbia support is added in the next patch, this becomes a
> real problem because some new slbia variants don't invalidate all
> non-zero entries.
> 
> Signed-off-by: Nicholas Piggin 

Applied to ppc-for-5.0, thanks.

> ---
>  target/ppc/mmu-hash64.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 34f6009b1e..373d44de74 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -100,20 +100,29 @@ void helper_slbia(CPUPPCState *env)
>  PowerPCCPU *cpu = env_archcpu(env);
>  int n;
>  
> +/*
> + * slbia must always flush all TLB (which is equivalent to ERAT in ppc
> + * architecture). Matching on SLB_ESID_V is not good enough, because 
> slbmte
> + * can overwrite a valid SLB without flushing its lookaside information.
> + *
> + * It would be possible to keep the TLB in synch with the SLB by flushing
> + * when a valid entry is overwritten by slbmte, and therefore slbia would
> + * not have to flush unless it evicts a valid SLB entry. However it is
> + * expected that slbmte is more common than slbia, and slbia is usually
> + * going to evict valid SLB entries, so that tradeoff is unlikely to be a
> + * good one.
> + */
> +
>  /* XXX: Warning: slbia never invalidates the first segment */
>  for (n = 1; n < cpu->hash64_opts->slb_size; n++) {
>  ppc_slb_t *slb = &env->slb[n];
>  
>  if (slb->esid & SLB_ESID_V) {
>  slb->esid &= ~SLB_ESID_V;
> -/*
> - * XXX: given the fact that segment size is 256 MB or 1TB,
> - *  and we still don't have a tlb_flush_mask(env, n, mask)
> - *  in QEMU, we just invalidate all TLBs
> - */
> -env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
>  }
>  }
> +
> +env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
>  }
>  
>  static void __helper_slbie(CPUPPCState *env, target_ulong addr,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [EXTERNAL] [PATCH 2/2] target/ppc: Fix ISA v3.0 (POWER9) slbia implementation

2020-03-18 Thread David Gibson

On Thu, Mar 19, 2020 at 07:46:54AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2020-03-18 at 18:08 +0100, Cédric Le Goater wrote:
> > On 3/18/20 5:41 AM, Nicholas Piggin wrote:
> > > Linux using the hash MMU ("disable_radix" command line) on a POWER9
> > > machine quickly hits translation bugs due to using v3.0 slbia
> > > features that are not implemented in TCG. Add them.
> > 
> > I checked the ISA books and this looks OK but you are also modifying
> > slbie.
> 
> For the same reason, I believe slbie needs to invalidate caches even if
> the entry isn't present.
> 
> The kernel will under some circumstances overwrite SLB entries without
> invalidating (because the translation itself isn't invalid, it's just
> that the SLB is full, so anything cached in the ERAT is still
> technically ok).
> 
> However, when those things get really invalidated, they need to be
> taken out, even if they no longer have a corresponding SLB entry.

Right, the slbie change is certainly correct, but it doesn't match
what the commit message says this is doing.  Nick, can you split that
out please.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2] ppc/spapr: Set the effective address provided flag in mc error log.

2020-03-18 Thread David Gibson

On Wed, Mar 18, 2020 at 01:04:20PM +0530, Mahesh Salgaonkar wrote:
> Per PAPR, it is expected to set effective address provided flag in
> sub_err_type member of mc extended error log (i.e
> rtas_event_log_v6_mc.sub_err_type). This somehow got missed in original
> fwnmi-mce patch series. The current code just updates the effective address
> but does not set the flag to indicate that it is available. Hence guest
> fails to extract effective address from mce rtas log. This patch fixes
> that.
> 
> Without this patch guest MCE logs fails print DAR value:
> 
> [   11.933608] Disabling lock debugging due to kernel taint
> [   11.933773] MCE: CPU0: machine check (Severe) Host TLB Multihit [Recovered]
> [   11.933979] MCE: CPU0: NIP: [c0090b34] 
> radix__flush_tlb_range_psize+0x194/0xf00
> [   11.934223] MCE: CPU0: Initiator CPU
> [   11.934341] MCE: CPU0: Unknown
> 
> After the change:
> 
> [   22.454149] Disabling lock debugging due to kernel taint
> [   22.454316] MCE: CPU0: machine check (Severe) Host TLB Multihit DAR: 
> deadbeefdeadbeef [Recovered]
> [   22.454605] MCE: CPU0: NIP: [c03e5804] kmem_cache_alloc+0x84/0x330
> [   22.454820] MCE: CPU0: Initiator CPU
> [   22.454944] MCE: CPU0: Unknown
> 
> 
> Signed-off-by: Mahesh Salgaonkar 

Applied to ppc-for-5.0, thanks.

> ---
> Change in v2:
> - Fixed coding style issues.
> ---
>  hw/ppc/spapr_events.c |   26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 8b32b7eea5..cb6bfedc53 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -243,6 +243,14 @@ struct rtas_event_log_v6_mc {
>  #define RTAS_LOG_V6_MC_TLB_PARITY1
>  #define RTAS_LOG_V6_MC_TLB_MULTIHIT  2
>  #define RTAS_LOG_V6_MC_TLB_INDETERMINATE 3
> +/*
> + * Per PAPR,
> + * For UE error type, set bit 1 of sub_err_type to indicate effective addr is
> + * provided. For other error types (SLB/ERAT/TLB), set bit 0 to indicate
> + * same.
> + */
> +#define RTAS_LOG_V6_MC_UE_EA_ADDR_PROVIDED   0x40
> +#define RTAS_LOG_V6_MC_EA_ADDR_PROVIDED  0x80
>  uint8_t reserved_1[6];
>  uint64_t effective_address;
>  uint64_t logical_address;
> @@ -726,6 +734,22 @@ void 
> spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>  RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +static void spapr_mc_set_ea_provided_flag(struct mc_extended_log *ext_elog)
> +{
> +switch (ext_elog->mc.error_type) {
> +case RTAS_LOG_V6_MC_TYPE_UE:
> +ext_elog->mc.sub_err_type |= RTAS_LOG_V6_MC_UE_EA_ADDR_PROVIDED;
> +break;
> +case RTAS_LOG_V6_MC_TYPE_SLB:
> +case RTAS_LOG_V6_MC_TYPE_ERAT:
> +case RTAS_LOG_V6_MC_TYPE_TLB:
> +ext_elog->mc.sub_err_type |= RTAS_LOG_V6_MC_EA_ADDR_PROVIDED;
> +break;
> +default:
> +break;
> +}
> +}
> +
>  static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>  struct mc_extended_log *ext_elog)
>  {
> @@ -751,6 +775,7 @@ static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, 
> bool recovered,
>  ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>  if (mc_derror_table[i].dar_valid) {
>  ext_elog->mc.effective_address = 
> cpu_to_be64(env->spr[SPR_DAR]);
> +spapr_mc_set_ea_provided_flag(ext_elog);
>  }
>  
>  summary |= mc_derror_table[i].initiator
> @@ -769,6 +794,7 @@ static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, 
> bool recovered,
>  ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>  if (mc_ierror_table[i].nip_valid) {
>  ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +spapr_mc_set_ea_provided_flag(ext_elog);
>  }
>  
>  summary |= mc_ierror_table[i].initiator
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 1/2] target/ppc: Fix slbia TLB invalidation gap

2020-03-18 Thread David Gibson

On Wed, Mar 18, 2020 at 05:52:32PM +0100, Greg Kurz wrote:
65;5803;1c> On Wed, 18 Mar 2020 14:41:34 +1000
> Nicholas Piggin  wrote:
> 
> > slbia must invalidate TLBs even if it does not remove a valid SLB
> > entry, because slbmte can overwrite valid entries without removing
> > their TLBs.
> > 
> > As the architecture says, slbia invalidates all lookaside information,
> > not conditionally based on if it removed valid entries.
> > 
> > It does not seem possible for POWER8 or earlier Linux kernels to hit
> > this bug because it never changes its kernel SLB translations, and it
> > should always have valid entries if any accesses are made to usespace
> 
> s/usespace/userspace

Corrected in my tree, thanks.

> 
> > regions. However other operating systems which may modify SLB entry 0
> > or do more fancy things with segments might be affected.
> > 
> > When POWER9 slbia support is added in the next patch, this becomes a
> > real problem because some new slbia variants don't invalidate all
> > non-zero entries.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> 
> LGTM
> 
> Reviewed-by: Greg Kurz 
> 
> >  target/ppc/mmu-hash64.c | 21 +++--
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> > index 34f6009b1e..373d44de74 100644
> > --- a/target/ppc/mmu-hash64.c
> > +++ b/target/ppc/mmu-hash64.c
> > @@ -100,20 +100,29 @@ void helper_slbia(CPUPPCState *env)
> >  PowerPCCPU *cpu = env_archcpu(env);
> >  int n;
> >  
> > +/*
> > + * slbia must always flush all TLB (which is equivalent to ERAT in ppc
> > + * architecture). Matching on SLB_ESID_V is not good enough, because 
> > slbmte
> > + * can overwrite a valid SLB without flushing its lookaside 
> > information.
> > + *
> > + * It would be possible to keep the TLB in synch with the SLB by 
> > flushing
> > + * when a valid entry is overwritten by slbmte, and therefore slbia 
> > would
> > + * not have to flush unless it evicts a valid SLB entry. However it is
> > + * expected that slbmte is more common than slbia, and slbia is usually
> > + * going to evict valid SLB entries, so that tradeoff is unlikely to 
> > be a
> > + * good one.
> > + */
> > +
> >  /* XXX: Warning: slbia never invalidates the first segment */
> >  for (n = 1; n < cpu->hash64_opts->slb_size; n++) {
> >  ppc_slb_t *slb = &env->slb[n];
> >  
> >  if (slb->esid & SLB_ESID_V) {
> >  slb->esid &= ~SLB_ESID_V;
> > -/*
> > - * XXX: given the fact that segment size is 256 MB or 1TB,
> > - *  and we still don't have a tlb_flush_mask(env, n, mask)
> > - *  in QEMU, we just invalidate all TLBs
> > - */
> > -env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> >  }
> >  }
> > +
> > +env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> >  }
> >  
> >  static void __helper_slbie(CPUPPCState *env, target_ulong addr,
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:
> On Wed, 18 Mar 2020 21:17:03 -0400
> Yan Zhao  wrote:
> 
> > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:
> > > - Defined MIGRATION region type and sub-type.
> > > 
> > > - Defined vfio_device_migration_info structure which will be placed at the
> > >   0th offset of migration region to get/set VFIO device related
> > >   information. Defined members of structure and usage on read/write 
> > > access.
> > > 
> > > - Defined device states and state transition details.
> > > 
> > > - Defined sequence to be followed while saving and resuming VFIO device.
> > > 
> > > Signed-off-by: Kirti Wankhede 
> > > Reviewed-by: Neo Jia 
> > > ---
> > >  include/uapi/linux/vfio.h | 227 
> > > ++
> > >  1 file changed, 227 insertions(+)
> > > 
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 9e843a147ead..d0021467af53 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
> > >  #define VFIO_REGION_TYPE_GFX(1)
> > >  #define VFIO_REGION_TYPE_CCW (2)
> > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > >  
> > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > >  
> > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
> > >  
> > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > +
> > > +/*
> > > + * The structure vfio_device_migration_info is placed at the 0th offset 
> > > of
> > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device 
> > > related
> > > + * migration information. Field accesses from this structure are only 
> > > supported
> > > + * at their native width and alignment. Otherwise, the result is 
> > > undefined and
> > > + * vendor drivers should return an error.
> > > + *
> > > + * device_state: (read/write)
> > > + *  - The user application writes to this field to inform the vendor 
> > > driver
> > > + *about the device state to be transitioned to.
> > > + *  - The vendor driver should take the necessary actions to change 
> > > the
> > > + *device state. After successful transition to a given state, the
> > > + *vendor driver should return success on write(device_state, 
> > > state)
> > > + *system call. If the device state transition fails, the vendor 
> > > driver
> > > + *should return an appropriate -errno for the fault condition.
> > > + *  - On the user application side, if the device state transition 
> > > fails,
> > > + * that is, if write(device_state, state) returns an error, read
> > > + * device_state again to determine the current state of the 
> > > device from
> > > + * the vendor driver.
> > > + *  - The vendor driver should return previous state of the device 
> > > unless
> > > + *the vendor driver has encountered an internal error, in which 
> > > case
> > > + *the vendor driver may report the device_state 
> > > VFIO_DEVICE_STATE_ERROR.
> > > + *  - The user application must use the device reset ioctl to 
> > > recover the
> > > + *device from VFIO_DEVICE_STATE_ERROR state. If the device is
> > > + *indicated to be in a valid device state by reading 
> > > device_state, the
> > > + *user application may attempt to transition the device to any 
> > > valid
> > > + *state reachable from the current state or terminate itself.
> > > + *
> > > + *  device_state consists of 3 bits:
> > > + *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is 
> > > clear,
> > > + *it indicates the _STOP state. When the device state is changed 
> > > to
> > > + *_STOP, driver should stop the device before write() returns.
> > > + *  - If bit 1 is set, it indicates the _SAVING state, which means 
> > > that the
> > > + *driver should start gathering device state information that 
> > > will be
> > > + *provided to the VFIO user application to save the device's 
> > > state.
> > > + *  - If bit 2 is set, it indicates the _RESUMING state, which means 
> > > that
> > > + *the driver should prepare to resume the device. Data provided 
> > > through
> > > + *the migration region should be used to resume the device.
> > > + *  Bits 3 - 31 are reserved for future use. To preserve them, the 
> > > user
> > > + *  application should perform a read-modify-write operation on this
> > > + *  field when modifying the specified bits.
> > > + *
> > > + *  +--- _RESUMING
> > > + *  |+-- _SAVING
> > > + *  ||+- _RUNNING
> > > + *  |||
> > > + *

Re: [PATCH v1 1/1] target/riscv: Don't set write permissions on dirty PTEs

2020-03-18 Thread Palmer Dabbelt


On Tue, 03 Mar 2020 17:16:59 PST (-0800), Alistair Francis wrote:

The RISC-V spec specifies that when a write happens and the D bit is
clear the implementation will set the bit in the PTE. It does not
describe that the PTE being dirty means that we should provide write
access. This patch removes the write access granted to pages when the
dirty bit is set.

Following the prot variable we can see that it affects all of these
functions:
 riscv_cpu_tlb_fill()
   tlb_set_page()
 tlb_set_page_with_attrs()
   address_space_translate_for_iotlb()

Looking at the cputlb code (tlb_set_page_with_attrs() and
address_space_translate_for_iotlb()) it looks like the main affect of
setting write permissions is that the page can be marked as TLB_NOTDIRTY.

I don't see any other impacts (related to the dirty bit) for giving a
page write permissions.

Setting write permission on dirty PTEs results in userspace inside a
Hypervisor guest (VU) becoming corrupted. This appears to be because it
ends up with write permission in the second stage translation in cases
where we aren't doing a store.

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 target/riscv/cpu_helper.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 5ea5d133aa..cc9f20b471 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -572,10 +572,8 @@ restart:
 if ((pte & PTE_X)) {
 *prot |= PAGE_EXEC;
 }
-/* add write permission on stores or if the page is already dirty,
-   so that we TLB miss on later writes to update the dirty bit */
-if ((pte & PTE_W) &&
-(access_type == MMU_DATA_STORE || (pte & PTE_D))) {
+/* add write permission on stores */
+if ((pte & PTE_W) && (access_type == MMU_DATA_STORE)) {
 *prot |= PAGE_WRITE;
 }
 return TRANSLATE_SUCCESS;


I remember having seen this patch before and having some objections, but I feel
like I mistakenly had this backwards before or something because it makes sense
now.

Thanks!

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Alex Williamson

On Thu, 19 Mar 2020 00:15:33 -0400
Yan Zhao  wrote:

> On Thu, Mar 19, 2020 at 12:01:00PM +0800, Alex Williamson wrote:
> > On Wed, 18 Mar 2020 23:06:39 -0400
> > Yan Zhao  wrote:
> >   
> > > On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:  
> > > > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> > > > - Start dirty pages tracking while migration is active
> > > > - Stop dirty pages tracking.
> > > > - Get dirty pages bitmap. Its user space application's responsibility to
> > > >   copy content of dirty pages from source to destination during 
> > > > migration.
> > > > 
> > > > To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> > > > structure. Bitmap size is calculated considering smallest supported page
> > > > size. Bitmap is allocated for all vfio_dmas when dirty logging is 
> > > > enabled
> > > > 
> > > > Bitmap is populated for already pinned pages when bitmap is allocated 
> > > > for
> > > > a vfio_dma with the smallest supported page size. Update bitmap from
> > > > pinning functions when tracking is enabled. When user application 
> > > > queries
> > > > bitmap, check if requested page size is same as page size used to
> > > > populated bitmap. If it is equal, copy bitmap, but if not equal, return
> > > > error.
> > > > 
> > > > Signed-off-by: Kirti Wankhede 
> > > > Reviewed-by: Neo Jia 
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 205 
> > > > +++-
> > > >  1 file changed, 203 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > index 70aeab921d0f..d6417fb02174 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -71,6 +71,7 @@ struct vfio_iommu {
> > > > unsigned intdma_avail;
> > > > boolv2;
> > > > boolnesting;
> > > > +   booldirty_page_tracking;
> > > >  };
> > > >  
> > > >  struct vfio_domain {
> > > > @@ -91,6 +92,7 @@ struct vfio_dma {
> > > > boollock_cap;   /* 
> > > > capable(CAP_IPC_LOCK) */
> > > > struct task_struct  *task;
> > > > struct rb_root  pfn_list;   /* Ex-user pinned pfn 
> > > > list */
> > > > +   unsigned long   *bitmap;
> > > >  };
> > > >  
> > > >  struct vfio_group {
> > > > @@ -125,7 +127,10 @@ struct vfio_regions {
> > > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > > > 
> > > > (!list_empty(&iommu->domain_list))
> > > >  
> > > > +#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / 
> > > > BITS_PER_BYTE)
> > > > +
> > > >  static int put_pfn(unsigned long pfn, int prot);
> > > > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> > > >  
> > > >  /*
> > > >   * This code handles mapping and unmapping of user data buffers
> > > > @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu 
> > > > *iommu, struct vfio_dma *old)
> > > > rb_erase(&old->node, &iommu->dma_list);
> > > >  }
> > > >  
> > > > +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t 
> > > > pgsize)
> > > > +{
> > > > +   struct rb_node *n = rb_first(&iommu->dma_list);
> > > > +
> > > > +   for (; n; n = rb_next(n)) {
> > > > +   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, 
> > > > node);
> > > > +   struct rb_node *p;
> > > > +   unsigned long npages = dma->size / pgsize;
> > > > +
> > > > +   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), 
> > > > GFP_KERNEL);
> > > > +   if (!dma->bitmap) {
> > > > +   struct rb_node *p = rb_prev(n);
> > > > +
> > > > +   for (; p; p = rb_prev(p)) {
> > > > +   struct vfio_dma *dma = rb_entry(n,
> > > > +   struct 
> > > > vfio_dma, node);
> > > > +
> > > > +   kfree(dma->bitmap);
> > > > +   dma->bitmap = NULL;
> > > > +   }
> > > > +   return -ENOMEM;
> > > > +   }
> > > > +
> > > > +   if (RB_EMPTY_ROOT(&dma->pfn_list))
> > > > +   continue;
> > > > +
> > > > +   for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> > > > +   struct vfio_pfn *vpfn = rb_entry(p, struct 
> > > > vfio_pfn,
> > > > +node);
> > > > +
> > > > +   bitmap_set(dma->bitmap,
> > > > +   (vpfn->iova - dma->iova) / 
> > > > pgsize, 1);
> > > > +   }
> > > > +   }
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static void vfio_dma_bitmap_free(struct vfi

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 12:01:00PM +0800, Alex Williamson wrote:
> On Wed, 18 Mar 2020 23:06:39 -0400
> Yan Zhao  wrote:
> 
> > On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:
> > > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> > > - Start dirty pages tracking while migration is active
> > > - Stop dirty pages tracking.
> > > - Get dirty pages bitmap. Its user space application's responsibility to
> > >   copy content of dirty pages from source to destination during migration.
> > > 
> > > To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> > > structure. Bitmap size is calculated considering smallest supported page
> > > size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> > > 
> > > Bitmap is populated for already pinned pages when bitmap is allocated for
> > > a vfio_dma with the smallest supported page size. Update bitmap from
> > > pinning functions when tracking is enabled. When user application queries
> > > bitmap, check if requested page size is same as page size used to
> > > populated bitmap. If it is equal, copy bitmap, but if not equal, return
> > > error.
> > > 
> > > Signed-off-by: Kirti Wankhede 
> > > Reviewed-by: Neo Jia 
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 205 
> > > +++-
> > >  1 file changed, 203 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 70aeab921d0f..d6417fb02174 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -71,6 +71,7 @@ struct vfio_iommu {
> > >   unsigned intdma_avail;
> > >   boolv2;
> > >   boolnesting;
> > > + booldirty_page_tracking;
> > >  };
> > >  
> > >  struct vfio_domain {
> > > @@ -91,6 +92,7 @@ struct vfio_dma {
> > >   boollock_cap;   /* capable(CAP_IPC_LOCK) */
> > >   struct task_struct  *task;
> > >   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> > > + unsigned long   *bitmap;
> > >  };
> > >  
> > >  struct vfio_group {
> > > @@ -125,7 +127,10 @@ struct vfio_regions {
> > >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> > >   (!list_empty(&iommu->domain_list))
> > >  
> > > +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> > > BITS_PER_BYTE)
> > > +
> > >  static int put_pfn(unsigned long pfn, int prot);
> > > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> > >  
> > >  /*
> > >   * This code handles mapping and unmapping of user data buffers
> > > @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu 
> > > *iommu, struct vfio_dma *old)
> > >   rb_erase(&old->node, &iommu->dma_list);
> > >  }
> > >  
> > > +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t 
> > > pgsize)
> > > +{
> > > + struct rb_node *n = rb_first(&iommu->dma_list);
> > > +
> > > + for (; n; n = rb_next(n)) {
> > > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> > > + struct rb_node *p;
> > > + unsigned long npages = dma->size / pgsize;
> > > +
> > > + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> > > + if (!dma->bitmap) {
> > > + struct rb_node *p = rb_prev(n);
> > > +
> > > + for (; p; p = rb_prev(p)) {
> > > + struct vfio_dma *dma = rb_entry(n,
> > > + struct vfio_dma, node);
> > > +
> > > + kfree(dma->bitmap);
> > > + dma->bitmap = NULL;
> > > + }
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + if (RB_EMPTY_ROOT(&dma->pfn_list))
> > > + continue;
> > > +
> > > + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> > > + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> > > +  node);
> > > +
> > > + bitmap_set(dma->bitmap,
> > > + (vpfn->iova - dma->iova) / pgsize, 1);
> > > + }
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
> > > +{
> > > + struct rb_node *n = rb_first(&iommu->dma_list);
> > > +
> > > + for (; n; n = rb_next(n)) {
> > > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> > > +
> > > + kfree(dma->bitmap);
> > > + dma->bitmap = NULL;
> > > + }
> > > +}
> > > +
> > >  /*
> > >   * Helper Functions for host iova-pfn list
> > >   */
> > > @@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void 
> > > *iommu_data,
> > >   vfio_unpin_page_external(dma, iova, do_accounting);
> > >   goto pin_unwind;
> > >

[PATCH v2 1/1] device_tree: Add info message when dumping dtb to file

2020-03-18 Thread Leonardo Bras

When dumping dtb to a file, qemu exits silently before starting the VM.

Add info message so user can easily track why the proccess exits.
Add error message if dtb dump failed.

Signed-off-by: Leonardo Bras 
---
 device_tree.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/device_tree.c b/device_tree.c
index f8b46b3c73..bba6cc2164 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -530,7 +530,12 @@ void qemu_fdt_dumpdtb(void *fdt, int size)
 
 if (dumpdtb) {
 /* Dump the dtb to a file and quit */
-exit(g_file_set_contents(dumpdtb, fdt, size, NULL) ? 0 : 1);
+if (g_file_set_contents(dumpdtb, fdt, size, NULL)) {
+info_report("dtb dumped to %s. Exiting.", dumpdtb);
+exit(0);
+}
+error_report("%s: Failed dumping dtb to %s", __func__, dumpdtb);
+exit(1);
 }
 }
 
-- 
2.24.1

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Alex Williamson

On Wed, 18 Mar 2020 23:06:39 -0400
Yan Zhao  wrote:

> On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:
> > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> > - Start dirty pages tracking while migration is active
> > - Stop dirty pages tracking.
> > - Get dirty pages bitmap. Its user space application's responsibility to
> >   copy content of dirty pages from source to destination during migration.
> > 
> > To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> > structure. Bitmap size is calculated considering smallest supported page
> > size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> > 
> > Bitmap is populated for already pinned pages when bitmap is allocated for
> > a vfio_dma with the smallest supported page size. Update bitmap from
> > pinning functions when tracking is enabled. When user application queries
> > bitmap, check if requested page size is same as page size used to
> > populated bitmap. If it is equal, copy bitmap, but if not equal, return
> > error.
> > 
> > Signed-off-by: Kirti Wankhede 
> > Reviewed-by: Neo Jia 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 205 
> > +++-
> >  1 file changed, 203 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 70aeab921d0f..d6417fb02174 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -71,6 +71,7 @@ struct vfio_iommu {
> > unsigned intdma_avail;
> > boolv2;
> > boolnesting;
> > +   booldirty_page_tracking;
> >  };
> >  
> >  struct vfio_domain {
> > @@ -91,6 +92,7 @@ struct vfio_dma {
> > boollock_cap;   /* capable(CAP_IPC_LOCK) */
> > struct task_struct  *task;
> > struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> > +   unsigned long   *bitmap;
> >  };
> >  
> >  struct vfio_group {
> > @@ -125,7 +127,10 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > (!list_empty(&iommu->domain_list))
> >  
> > +#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / 
> > BITS_PER_BYTE)
> > +
> >  static int put_pfn(unsigned long pfn, int prot);
> > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >  
> >  /*
> >   * This code handles mapping and unmapping of user data buffers
> > @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> > struct vfio_dma *old)
> > rb_erase(&old->node, &iommu->dma_list);
> >  }
> >  
> > +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t pgsize)
> > +{
> > +   struct rb_node *n = rb_first(&iommu->dma_list);
> > +
> > +   for (; n; n = rb_next(n)) {
> > +   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> > +   struct rb_node *p;
> > +   unsigned long npages = dma->size / pgsize;
> > +
> > +   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> > +   if (!dma->bitmap) {
> > +   struct rb_node *p = rb_prev(n);
> > +
> > +   for (; p; p = rb_prev(p)) {
> > +   struct vfio_dma *dma = rb_entry(n,
> > +   struct vfio_dma, node);
> > +
> > +   kfree(dma->bitmap);
> > +   dma->bitmap = NULL;
> > +   }
> > +   return -ENOMEM;
> > +   }
> > +
> > +   if (RB_EMPTY_ROOT(&dma->pfn_list))
> > +   continue;
> > +
> > +   for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> > +   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> > +node);
> > +
> > +   bitmap_set(dma->bitmap,
> > +   (vpfn->iova - dma->iova) / pgsize, 1);
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +
> > +static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
> > +{
> > +   struct rb_node *n = rb_first(&iommu->dma_list);
> > +
> > +   for (; n; n = rb_next(n)) {
> > +   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> > +
> > +   kfree(dma->bitmap);
> > +   dma->bitmap = NULL;
> > +   }
> > +}
> > +
> >  /*
> >   * Helper Functions for host iova-pfn list
> >   */
> > @@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> > vfio_unpin_page_external(dma, iova, do_accounting);
> > goto pin_unwind;
> > }
> > +
> > +   if (iommu->dirty_page_tracking) {
> > +   unsigned long pgshift =
> > +__ffs(vfio_pgsize_bitmap(iommu));
> > +
> > +   bitmap_set(dma

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-18 Thread Alex Williamson

On Wed, 18 Mar 2020 21:17:03 -0400
Yan Zhao  wrote:

> On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:
> > - Defined MIGRATION region type and sub-type.
> > 
> > - Defined vfio_device_migration_info structure which will be placed at the
> >   0th offset of migration region to get/set VFIO device related
> >   information. Defined members of structure and usage on read/write access.
> > 
> > - Defined device states and state transition details.
> > 
> > - Defined sequence to be followed while saving and resuming VFIO device.
> > 
> > Signed-off-by: Kirti Wankhede 
> > Reviewed-by: Neo Jia 
> > ---
> >  include/uapi/linux/vfio.h | 227 
> > ++
> >  1 file changed, 227 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 9e843a147ead..d0021467af53 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
> >  #define VFIO_REGION_TYPE_GFX(1)
> >  #define VFIO_REGION_TYPE_CCW   (2)
> > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> >  
> >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> >  
> > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> >  /* sub-types for VFIO_REGION_TYPE_CCW */
> >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
> >  
> > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > +
> > +/*
> > + * The structure vfio_device_migration_info is placed at the 0th offset of
> > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device 
> > related
> > + * migration information. Field accesses from this structure are only 
> > supported
> > + * at their native width and alignment. Otherwise, the result is undefined 
> > and
> > + * vendor drivers should return an error.
> > + *
> > + * device_state: (read/write)
> > + *  - The user application writes to this field to inform the vendor 
> > driver
> > + *about the device state to be transitioned to.
> > + *  - The vendor driver should take the necessary actions to change the
> > + *device state. After successful transition to a given state, the
> > + *vendor driver should return success on write(device_state, state)
> > + *system call. If the device state transition fails, the vendor 
> > driver
> > + *should return an appropriate -errno for the fault condition.
> > + *  - On the user application side, if the device state transition 
> > fails,
> > + *   that is, if write(device_state, state) returns an error, read
> > + *   device_state again to determine the current state of the device from
> > + *   the vendor driver.
> > + *  - The vendor driver should return previous state of the device 
> > unless
> > + *the vendor driver has encountered an internal error, in which 
> > case
> > + *the vendor driver may report the device_state 
> > VFIO_DEVICE_STATE_ERROR.
> > + *  - The user application must use the device reset ioctl to recover 
> > the
> > + *device from VFIO_DEVICE_STATE_ERROR state. If the device is
> > + *indicated to be in a valid device state by reading device_state, 
> > the
> > + *user application may attempt to transition the device to any 
> > valid
> > + *state reachable from the current state or terminate itself.
> > + *
> > + *  device_state consists of 3 bits:
> > + *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is 
> > clear,
> > + *it indicates the _STOP state. When the device state is changed to
> > + *_STOP, driver should stop the device before write() returns.
> > + *  - If bit 1 is set, it indicates the _SAVING state, which means 
> > that the
> > + *driver should start gathering device state information that will 
> > be
> > + *provided to the VFIO user application to save the device's state.
> > + *  - If bit 2 is set, it indicates the _RESUMING state, which means 
> > that
> > + *the driver should prepare to resume the device. Data provided 
> > through
> > + *the migration region should be used to resume the device.
> > + *  Bits 3 - 31 are reserved for future use. To preserve them, the user
> > + *  application should perform a read-modify-write operation on this
> > + *  field when modifying the specified bits.
> > + *
> > + *  +--- _RESUMING
> > + *  |+-- _SAVING
> > + *  ||+- _RUNNING
> > + *  |||
> > + *  000b => Device Stopped, not saving or resuming
> > + *  001b => Device running, which is the default state
> > + *  010b => Stop the device & save the device state, stop-and-copy state
> > + *  011b => Device running and save the device state, pre-copy state
> > + *  100b => Device stopped and the device state is resuming
> > + *  101b => Invalid

Re: [PATCH 1/1] device_tree: Add info message when dumping dtb to file

2020-03-18 Thread Leonardo Bras

On Thu, 2020-03-19 at 00:32 -0300, Leonardo Bras wrote:
>  error_report("%s: Failed dumping dtb to %s", __func__, dumpdtb)

Sorry, missed ending ';'.
I will send a v2. 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-18 Thread Alex Williamson

On Thu, 19 Mar 2020 01:11:12 +0530
Kirti Wankhede  wrote:

> DMA mapped pages, including those pinned by mdev vendor drivers, might
> get unpinned and unmapped while migration is active and device is still
> running. For example, in pre-copy phase while guest driver could access
> those pages, host device or vendor driver can dirty these mapped pages.
> Such pages should be marked dirty so as to maintain memory consistency
> for a user making use of dirty page tracking.
> 
> To get bitmap during unmap, user should set flag
> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
> zeroed by user space application. Bitmap size and page size should be set
> by user application.

Looks good, but as mentioned we no longer require the user to zero the
bitmap.  It's mentioned in the commit log above and in the uapi
comment.  Thanks,

Alex

> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 55 
> ++---
>  include/uapi/linux/vfio.h   | 11 +
>  2 files changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d6417fb02174..aa1ac30f7854 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -939,7 +939,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
> bitmap_size)
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -  struct vfio_iommu_type1_dma_unmap *unmap)
> +  struct vfio_iommu_type1_dma_unmap *unmap,
> +  struct vfio_bitmap *bitmap)
>  {
>   uint64_t mask;
>   struct vfio_dma *dma, *dma_last = NULL;
> @@ -990,6 +991,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>* will be returned if these conditions are not met.  The v2 interface
>* will only return success and a size of zero if there were no
>* mappings within the range.
> +  *
> +  * When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
> +  * must be for single mapping. Multiple mappings with this flag set is
> +  * not supported.
>*/
>   if (iommu->v2) {
>   dma = vfio_find_dma(iommu, unmap->iova, 1);
> @@ -997,6 +1002,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   ret = -EINVAL;
>   goto unlock;
>   }
> +
> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> + (dma->iova != unmap->iova || dma->size != unmap->size)) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> +
>   dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>   if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>   ret = -EINVAL;
> @@ -1014,6 +1026,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   if (dma->task->mm != current->mm)
>   break;
>  
> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +  iommu->dirty_page_tracking)
> + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> + bitmap->pgsize,
> + (unsigned char __user *) bitmap->data);
> +
>   if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
>   struct vfio_iommu_type1_dma_unmap nb_unmap;
>  
> @@ -2369,17 +2387,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>   struct vfio_iommu_type1_dma_unmap unmap;
> - long ret;
> + struct vfio_bitmap bitmap = { 0 };
> + int ret;
>  
>   minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
>  
>   if (copy_from_user(&unmap, (void __user *)arg, minsz))
>   return -EFAULT;
>  
> - if (unmap.argsz < minsz || unmap.flags)
> + if (unmap.argsz < minsz ||
> + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
>   return -EINVAL;
>  
> - ret = vfio_dma_do_unmap(iommu, &unmap);
> + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
> + unsigned long pgshift;
> + uint64_t iommu_pgsize =
> +  1 << __ffs(vfio_pgsize_bitmap(iommu));
> +
> + if (unmap.argsz < (minsz + sizeof(bitmap)))
> + return -EINVAL;
> +
> + if (copy_from_user(&bitmap,
> +(void __user *)(arg + minsz),
> +sizeof(bitmap)))
> + return -EFAULT;
> +
> + /* allow only min s

Re: [PATCH v6 25/61] target/riscv: vector single-width averaging add and subtract

2020-03-18 Thread LIU Zhiwei




On 2020/3/17 23:06, LIU Zhiwei wrote:

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  17 
 target/riscv/insn32.decode  |   5 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   7 ++
 target/riscv/vector_helper.c| 100 
 4 files changed, 129 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index fd1c184852..311ce1322c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -715,3 +715,20 @@ DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vaadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c9a4050adc..e617d7bd60 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -417,6 +417,11 @@ vssubu_vv   100010 . . . 000 . 1010111 
@r_vm
 vssubu_vx   100010 . . . 100 . 1010111 @r_vm
 vssub_vv100011 . . . 000 . 1010111 @r_vm
 vssub_vx100011 . . . 100 . 1010111 @r_vm
+vaadd_vv100100 . . . 000 . 1010111 @r_vm
+vaadd_vx100100 . . . 100 . 1010111 @r_vm
+vaadd_vi100100 . . . 011 . 1010111 @r_vm
+vasub_vv100110 . . . 000 . 1010111 @r_vm
+vasub_vx100110 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm

 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index dd1a508a51..ba2e7d56f4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1636,3 +1636,10 @@ GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
 GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
+
+/* Vector Single-Width Averaging Add and Subtract */
+GEN_OPIVV_TRANS(vaadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vasub_vv, opivv_check)
+GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index b17cac7fd4..984a8e260f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2488,3 +2488,103 @@ GEN_VEXT_VX_RM(vssub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_RM(vssub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_RM(vssub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_RM(vssub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Averaging Add and Subtract */
+static inline uint8_t get_round(int vxrm, uint64_t v, uint8_t shift)
+{
+uint8_t d = extract64(v, shift, 1);
+uint8_t d1;
+uint64_t D1, D2;
+
+if (shift == 0 || shift > 64) {
+return 0;
+}
+
+d1 = extract64(v, shift - 1, 1);
+D1 = extract64(v, 0, shift);
+if (vxrm == 0) { /* round-to-nearest-up (add +0.5 LSB) */
+return d1;
+} else if (vxrm == 1) { /* round-to-nearest-even */
+if (shift > 1) {
+D2 = extract64(v, 0, shift - 1);
+return d1 & ((D2 != 0) | d);
+} else {
+return d1 & d;
+}
+} else if (vxrm == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
+return !d & (D1 != 0);
+}
+return 0; /* round-down (truncate) */
+}
+
+static inline int32_t aadd32(CPURISCVState *env, int vxrm, int32_t a, int32_t 
b)
+{
+int64_t res = (int64_t)a + b;
+uint8_t round = get_round(vxrm, res, 1);
+
+return (res >> 1) + round;
+}
+
+static inline int64_t aadd64(CPURISCVState *env, int vxrm, int64_t a, int64_t 
b)
+{
+int64_t res = a + b;
+uint8_t round = get_round(vxrm, res, 1);
+int64_t over = (res ^ a) & (res ^ b) & INT64_MIN;
+
+/* With signed overflow, bit 64 is inverse of bit 63. */
+

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Alex Williamson

On Thu, 19 Mar 2020 01:11:11 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 205 
> +++-
>  1 file changed, 203 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 70aeab921d0f..d6417fb02174 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;

We've made the bitmap a width invariant u64 else, should be here as
well.

>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,10 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(&iommu->domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(&old->node, &iommu->dma_list);
>  }
>  
> +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t pgsize)
> +{
> + struct rb_node *n = rb_first(&iommu->dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + struct rb_node *p;
> + unsigned long npages = dma->size / pgsize;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> + }
> + return -ENOMEM;
> + }
> +
> + if (RB_EMPTY_ROOT(&dma->pfn_list))
> + continue;
> +
> + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> +  node);
> +
> + bitmap_set(dma->bitmap,
> + (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> + }
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(&iommu->dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> +
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> + }
> +}
> +
>  /*
>   * Helper Functions for host iova-pfn list
>   */
> @@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>   vfio_unpin_page_external(dma, iova, do_accounting);
>   goto pin_unwind;
>   }
> +
> + if (iommu->dirty_page_tracking) {
> + unsigned long pgshift =
> +  __ffs(vfio_pgsize_bitmap(iommu));
> +
> + bitmap_set(dma->bitmap,
> +(vpfn->iova - dma->iova) >> pgshift, 1);
> + }
>   }
>  
>   ret = i

Re: [PATCH v14 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-18 Thread Alex Williamson

On Thu, 19 Mar 2020 01:11:14 +0530
Kirti Wankhede  wrote:

> Added a check such that only singleton IOMMU groups can pin pages.
> From the point when vendor driver pins any pages, consider IOMMU group
> dirty page scope to be limited to pinned pages.
> 
> To optimize to avoid walking list often, added flag
> pinned_page_dirty_scope to indicate if all of the vfio_groups for each
> vfio_domain in the domain_list dirty page scope is limited to pinned
> pages. This flag is updated on first pinned pages request for that IOMMU
> group and on attaching/detaching group.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio.c | 13 +--
>  drivers/vfio/vfio_iommu_type1.c | 77 
> +++--
>  include/linux/vfio.h|  4 ++-
>  3 files changed, 87 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 210fcf426643..311b5e4e111e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -85,6 +85,7 @@ struct vfio_group {
>   atomic_topened;
>   wait_queue_head_t   container_q;
>   boolnoiommu;
> + unsigned intdev_counter;
>   struct kvm  *kvm;
>   struct blocking_notifier_head   notifier;
>  };
> @@ -555,6 +556,7 @@ struct vfio_device *vfio_group_create_device(struct 
> vfio_group *group,
>  
>   mutex_lock(&group->device_lock);
>   list_add(&device->group_next, &group->device_list);
> + group->dev_counter++;
>   mutex_unlock(&group->device_lock);
>  
>   return device;
> @@ -567,6 +569,7 @@ static void vfio_device_release(struct kref *kref)
>   struct vfio_group *group = device->group;
>  
>   list_del(&device->group_next);
> + group->dev_counter--;
>   mutex_unlock(&group->device_lock);
>  
>   dev_set_drvdata(device->dev, NULL);
> @@ -1933,6 +1936,9 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   if (!group)
>   return -ENODEV;
>  
> + if (group->dev_counter > 1)
> + return -EINVAL;
> +
>   ret = vfio_group_add_container_user(group);
>   if (ret)
>   goto err_pin_pages;
> @@ -1940,7 +1946,8 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   container = group->container;
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
> - ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> + ret = driver->ops->pin_pages(container->iommu_data,
> +  group->iommu_group, user_pfn,
>npage, prot, phys_pfn);
>   else
>   ret = -ENOTTY;
> @@ -2038,8 +2045,8 @@ int vfio_group_pin_pages(struct vfio_group *group,
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
>   ret = driver->ops->pin_pages(container->iommu_data,
> -  user_iova_pfn, npage,
> -  prot, phys_pfn);
> +  group->iommu_group, user_iova_pfn,
> +  npage, prot, phys_pfn);
>   else
>   ret = -ENOTTY;
>  
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 912629320719..deec09f4b0f6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -72,6 +72,7 @@ struct vfio_iommu {
>   boolv2;
>   boolnesting;
>   booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
>  };
>  
>  struct vfio_domain {
> @@ -99,6 +100,7 @@ struct vfio_group {
>   struct iommu_group  *iommu_group;
>   struct list_headnext;
>   boolmdev_group; /* An mdev group */
> + boolpinned_page_dirty_scope;
>  };
>  
>  struct vfio_iova {
> @@ -132,6 +134,10 @@ struct vfio_regions {
>  static int put_pfn(unsigned long pfn, int prot);
>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
> +static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu 
> *iommu,
> +struct iommu_group *iommu_group);
> +
> +static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
>  /*
>   * This code handles mapping and unmapping of user data buffers
>   * into DMA'ble space using the IOMMU
> @@ -556,11 +562,13 @@ static int vfio_unpin_page_external(struct vfio_dma 
> *dma, dma_addr_t iova,
>  }
>  
>  static int vfio_iommu_type1_pin_pages(void *iommu_data,
> +   struct iommu_group *iommu_group,
> unsigned long *user

Re: [PATCH v14 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking.

2020-03-18 Thread Alex Williamson

On Thu, 19 Mar 2020 01:11:10 +0530
Kirti Wankhede  wrote:

> IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
> All pages pinned by vendor driver through this API should be considered as
> dirty during migration. When container consists of IOMMU capable device and
> all pages are pinned and mapped, then all pages are marked dirty.
> Added support to start/stop pinned and unpinned pages tracking and to get
> bitmap of all dirtied pages for requested IO virtual address range.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  include/uapi/linux/vfio.h | 55 
> +++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index d0021467af53..043e9eafb255 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -995,6 +995,12 @@ struct vfio_iommu_type1_dma_map {
>  
>  #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +struct vfio_bitmap {
> + __u64pgsize;/* page size for bitmap */
> + __u64size;  /* in bytes */
> + __u64 __user *data; /* one bit per page */
> +};
> +
>  /**
>   * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
>   *   struct vfio_dma_unmap)
> @@ -1021,6 +1027,55 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
> + * struct vfio_iommu_type1_dirty_bitmap)
> + * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size 
> of
> + * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which
> + * operation to perform, details as below:
> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
> + * migration is active and IOMMU module should track pages which are pinned 
> and
> + * could be dirtied by device.

"...should track" pages dirtied or potentially dirtied by devices.

As soon as we add support for Yan's DMA r/w the pinning requirement is
gone, besides pinning is an in-kernel implementation detail, the user
of this interface doesn't know or care which pages are pinned.

> + * Dirty pages are tracked until tracking is stopped by user application by
> + * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
> + * IOMMU should stop tracking pinned pages.

s/pinned/dirtied/

> + *
> + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
> + * IOCTL returns dirty pages bitmap for IOMMU container during migration for
> + * given IOVA range. User must provide data[] as the structure
> + * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range 
> and
> + * pgsize. This interface supports to get bitmap of smallest supported pgsize
> + * only and can be modified in future to get bitmap of specified pgsize.
> + * User must allocate memory for bitmap, zero the bitmap memory and set size
> + * of allocated memory in bitmap_size field. One bit is used to represent one
> + * page consecutively starting from iova offset. User should provide page 
> size
> + * in 'pgsize'. Bit set in bitmap indicates page at that offset from iova is
> + * dirty. Caller must set argsz including size of structure
> + * vfio_iommu_type1_dirty_bitmap_get.
> + *
> + * Only one flag should be set at a time.

"Only one of the flags _START, _STOP, and _GET maybe be specified at a
time."  IOW, let's not presume what yet undefined flags may do.
Hopefully this addresses Dave's concern.

> + *
> + */
> +struct vfio_iommu_type1_dirty_bitmap {
> + __u32argsz;
> + __u32flags;
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START(1 << 0)
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP (1 << 1)
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP   (1 << 2)
> + __u8 data[];
> +};
> +
> +struct vfio_iommu_type1_dirty_bitmap_get {
> + __u64  iova;/* IO virtual address */
> + __u64  size;/* Size of iova range */
> + struct vfio_bitmap bitmap;
> +};
> +
> +#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*

[PATCH 1/1] device_tree: Add info message when dumping dtb to file

2020-03-18 Thread Leonardo Bras

When dumping dtb to a file, qemu exits silently before starting the VM.

Add info message so user can easily track why the proccess exits.
Add error message if dtb dump failed.

Signed-off-by: Leonardo Bras 
---
 device_tree.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/device_tree.c b/device_tree.c
index f8b46b3c73..2e45c18c79 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -530,7 +530,12 @@ void qemu_fdt_dumpdtb(void *fdt, int size)
 
 if (dumpdtb) {
 /* Dump the dtb to a file and quit */
-exit(g_file_set_contents(dumpdtb, fdt, size, NULL) ? 0 : 1);
+if (g_file_set_contents(dumpdtb, fdt, size, NULL)) {
+info_report("dtb dumped to %s. Exiting.", dumpdtb);
+exit(0);
+}
+error_report("%s: Failed dumping dtb to %s", __func__, dumpdtb)
+exit(1);
 }
 }
 
-- 
2.24.1

Re: [PATCH v2] ppc/spapr: Set the effective address provided flag in mc error log.

2020-03-18 Thread Nicholas Piggin

Mahesh Salgaonkar's on March 18, 2020 5:34 pm:
> Per PAPR, it is expected to set effective address provided flag in
> sub_err_type member of mc extended error log (i.e
> rtas_event_log_v6_mc.sub_err_type). This somehow got missed in original
> fwnmi-mce patch series. The current code just updates the effective address
> but does not set the flag to indicate that it is available. Hence guest
> fails to extract effective address from mce rtas log. This patch fixes
> that.
> 
> Without this patch guest MCE logs fails print DAR value:
> 
> [   11.933608] Disabling lock debugging due to kernel taint
> [   11.933773] MCE: CPU0: machine check (Severe) Host TLB Multihit [Recovered]
> [   11.933979] MCE: CPU0: NIP: [c0090b34] 
> radix__flush_tlb_range_psize+0x194/0xf00
> [   11.934223] MCE: CPU0: Initiator CPU
> [   11.934341] MCE: CPU0: Unknown
> 
> After the change:
> 
> [   22.454149] Disabling lock debugging due to kernel taint
> [   22.454316] MCE: CPU0: machine check (Severe) Host TLB Multihit DAR: 
> deadbeefdeadbeef [Recovered]
> [   22.454605] MCE: CPU0: NIP: [c03e5804] kmem_cache_alloc+0x84/0x330
> [   22.454820] MCE: CPU0: Initiator CPU
> [   22.454944] MCE: CPU0: Unknown

Thanks, I was wondering why my MCEs weren't printing a DAR!

Reviewed-by: Nicholas Piggin 

> 
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
> Change in v2:
> - Fixed coding style issues.
> ---
>  hw/ppc/spapr_events.c |   26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 8b32b7eea5..cb6bfedc53 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -243,6 +243,14 @@ struct rtas_event_log_v6_mc {
>  #define RTAS_LOG_V6_MC_TLB_PARITY1
>  #define RTAS_LOG_V6_MC_TLB_MULTIHIT  2
>  #define RTAS_LOG_V6_MC_TLB_INDETERMINATE 3
> +/*
> + * Per PAPR,
> + * For UE error type, set bit 1 of sub_err_type to indicate effective addr is
> + * provided. For other error types (SLB/ERAT/TLB), set bit 0 to indicate
> + * same.
> + */
> +#define RTAS_LOG_V6_MC_UE_EA_ADDR_PROVIDED   0x40
> +#define RTAS_LOG_V6_MC_EA_ADDR_PROVIDED  0x80
>  uint8_t reserved_1[6];
>  uint64_t effective_address;
>  uint64_t logical_address;
> @@ -726,6 +734,22 @@ void 
> spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
>  RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
> +static void spapr_mc_set_ea_provided_flag(struct mc_extended_log *ext_elog)
> +{
> +switch (ext_elog->mc.error_type) {
> +case RTAS_LOG_V6_MC_TYPE_UE:
> +ext_elog->mc.sub_err_type |= RTAS_LOG_V6_MC_UE_EA_ADDR_PROVIDED;
> +break;
> +case RTAS_LOG_V6_MC_TYPE_SLB:
> +case RTAS_LOG_V6_MC_TYPE_ERAT:
> +case RTAS_LOG_V6_MC_TYPE_TLB:
> +ext_elog->mc.sub_err_type |= RTAS_LOG_V6_MC_EA_ADDR_PROVIDED;
> +break;
> +default:
> +break;
> +}
> +}
> +
>  static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, bool recovered,
>  struct mc_extended_log *ext_elog)
>  {
> @@ -751,6 +775,7 @@ static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, 
> bool recovered,
>  ext_elog->mc.sub_err_type = mc_derror_table[i].error_subtype;
>  if (mc_derror_table[i].dar_valid) {
>  ext_elog->mc.effective_address = 
> cpu_to_be64(env->spr[SPR_DAR]);
> +spapr_mc_set_ea_provided_flag(ext_elog);
>  }
>  
>  summary |= mc_derror_table[i].initiator
> @@ -769,6 +794,7 @@ static uint32_t spapr_mce_get_elog_type(PowerPCCPU *cpu, 
> bool recovered,
>  ext_elog->mc.sub_err_type = mc_ierror_table[i].error_subtype;
>  if (mc_ierror_table[i].nip_valid) {
>  ext_elog->mc.effective_address = cpu_to_be64(env->nip);
> +spapr_mc_set_ea_provided_flag(ext_elog);
>  }
>  
>  summary |= mc_ierror_table[i].initiator
> 
>

Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:
> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 205 
> +++-
>  1 file changed, 203 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 70aeab921d0f..d6417fb02174 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,10 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(&iommu->domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(&old->node, &iommu->dma_list);
>  }
>  
> +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t pgsize)
> +{
> + struct rb_node *n = rb_first(&iommu->dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + struct rb_node *p;
> + unsigned long npages = dma->size / pgsize;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> + }
> + return -ENOMEM;
> + }
> +
> + if (RB_EMPTY_ROOT(&dma->pfn_list))
> + continue;
> +
> + for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> +  node);
> +
> + bitmap_set(dma->bitmap,
> + (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> + }
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(&iommu->dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> +
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> + }
> +}
> +
>  /*
>   * Helper Functions for host iova-pfn list
>   */
> @@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>   vfio_unpin_page_external(dma, iova, do_accounting);
>   goto pin_unwind;
>   }
> +
> + if (iommu->dirty_page_tracking) {
> + unsigned long pgshift =
> +  __ffs(vfio_pgsize_bitmap(iommu));
> +
> + bitmap_set(dma->bitmap,
> +(vpfn->iova - dma->iova) >> pgshift, 1);
> + }
>   }
>  
>   ret = i;
> @@ -801,6 +863,7 @@ static void vfio_remove_dma(struct vfio_iommu *

Re: [EXTERNAL] [PATCH 2/2] target/ppc: Fix ISA v3.0 (POWER9) slbia implementation

2020-03-18 Thread Nicholas Piggin

Benjamin Herrenschmidt's on March 19, 2020 6:46 am:
> On Wed, 2020-03-18 at 18:08 +0100, Cédric Le Goater wrote:
>> On 3/18/20 5:41 AM, Nicholas Piggin wrote:
>> > Linux using the hash MMU ("disable_radix" command line) on a POWER9
>> > machine quickly hits translation bugs due to using v3.0 slbia
>> > features that are not implemented in TCG. Add them.
>> 
>> I checked the ISA books and this looks OK but you are also modifying
>> slbie.

That was a mistake that leaked in from debugging the crashes.

> For the same reason, I believe slbie needs to invalidate caches even if
> the entry isn't present.

I don't think it does per the ISA. If we overwrite it then we can only
invalidate with slbia. That's why there is that slb insertion cache for
pre-POWER9 that lets us use slbies to context switch so long as none 
have been overwritten.

> The kernel will under some circumstances overwrite SLB entries without
> invalidating (because the translation itself isn't invalid, it's just
> that the SLB is full, so anything cached in the ERAT is still
> technically ok).
> 
> However, when those things get really invalidated, they need to be
> taken out, even if they no longer have a corresponding SLB entry.

Yeah we track that and do slbia in that case.

Thanks,
Nick

Re: [PATCH 4/5] ppc/spapr: Don't kill the guest if a recovered FWNMI machine check delivery fails

2020-03-18 Thread Nicholas Piggin

Greg Kurz's on March 18, 2020 2:57 am:
> On Tue, 17 Mar 2020 15:02:14 +1000
> Nicholas Piggin  wrote:
> 
>> Try to be tolerant of errors if the machine check had been recovered
>> by the host.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
> 
> Same comment as previous patch on multi-line error strings and
> warn_report() in the !recovered case.
> 
>>  hw/ppc/spapr_events.c | 25 ++---
>>  1 file changed, 18 insertions(+), 7 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index d35151eeb0..3f524cb0ca 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -807,13 +807,20 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, 
>> bool recovered)
>>  /* get rtas addr from fdt */
>>  rtas_addr = spapr_get_rtas_addr();
>>  if (!rtas_addr) {
>> -warn_report("FWNMI: Unable to deliver machine check to guest: "
>> -"rtas_addr not found.");
>> -qemu_system_guest_panicked(NULL);
>> +if (!recovered) {
>> +warn_report("FWNMI: Unable to deliver machine check to guest: "
>> +"rtas_addr not found.");
>> +qemu_system_guest_panicked(NULL);
>> +} else {
>> +warn_report("FWNMI: Unable to deliver machine check to guest: "
>> +"rtas_addr not found. Machine check recovered.");
>> +}
>>  g_free(ext_elog);
>>  return;
>>  }
>>  
>> +spapr->fwnmi_machine_check_interlock = cpu->vcpu_id;
>> +
> 
> I don't understand this change.

If we bail out without delivering the interrupt, we can't take the
interlock otherwise the guest can never release it.

Thanks,
Nick

Re: [PATCH 3/5] ppc/spapr: Add FWNMI machine check delivery warnings

2020-03-18 Thread Nicholas Piggin

Greg Kurz's on March 17, 2020 10:20 pm:
> On Tue, 17 Mar 2020 15:02:13 +1000
> Nicholas Piggin  wrote:
> 
>> Add some messages which explain problems and guest misbehaviour that
>> may be difficult to diagnose in rare cases of machine checks.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  hw/ppc/spapr_events.c | 4 
>>  hw/ppc/spapr_rtas.c   | 4 
>>  2 files changed, 8 insertions(+)
>> 
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 05337f0671..d35151eeb0 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -807,6 +807,8 @@ static void spapr_mce_dispatch_elog(PowerPCCPU *cpu, 
>> bool recovered)
>>  /* get rtas addr from fdt */
>>  rtas_addr = spapr_get_rtas_addr();
>>  if (!rtas_addr) {
>> +warn_report("FWNMI: Unable to deliver machine check to guest: "
>> +"rtas_addr not found.");
> 
> Why a warning and not an error ?
> 
> Also maybe change the string to fit on one line ?

Not sure, I guess it should be.

Thanks,
Nick

Re: [PATCH 1/5] ppc/spapr: KVM FWNMI should not be enabled until guest requests it

2020-03-18 Thread Nicholas Piggin

Greg Kurz's on March 17, 2020 9:02 pm:
> On Tue, 17 Mar 2020 15:02:11 +1000
> Nicholas Piggin  wrote:
> 
>> The KVM FWNMI capability should be enabled with the "ibm,nmi-register"
>> rtas call. Although MCEs from KVM will be delivered as architected
>> interrupts to the guest before "ibm,nmi-register" is called, KVM has
>> different behaviour depending on whether the guest has enabled FWNMI
>> (it attempts to do more recovery on behalf of a non-FWNMI guest).
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  hw/ppc/spapr_caps.c  | 5 +++--
>>  hw/ppc/spapr_rtas.c  | 7 +++
>>  target/ppc/kvm.c | 7 +++
>>  target/ppc/kvm_ppc.h | 6 ++
>>  4 files changed, 23 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 679ae7959f..eb5521d0c2 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -517,9 +517,10 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
>> uint8_t val,
>>  }
>>  
>>  if (kvm_enabled()) {
>> -if (kvmppc_set_fwnmi() < 0) {
>> +if (!kvmppc_get_fwnmi()) {
>>  error_setg(errp, "Firmware Assisted Non-Maskable 
>> Interrupts(FWNMI) "
>> - "not supported by KVM");
>> + "not supported by KVM, "
>> + "try appending -machine cap-fwnmi=off");
> 
> It is usually preferred to keep error message strings on one
> line for easier grepping. Also hints should be specified with
> error_append_hint() because they are treated differently by
> QMP (ie. not printed).
> 
> Something like:
> 
> if (!kvmppc_get_fwnmi()) {
> error_setg(errp,
>"Firmware Assisted Non-Maskable Interrupts(FWNMI) not supported by 
> KVM");
> error_append_hint(errp, "Try appending -machine cap-fwnmi=off\n");
> }

Hmm, okay.

> 
> Note that the current error handling code has an issue that
> prevents hints to be printed when errp == &error_fatal, which
> is exactly what spapr_caps_apply() does. Since this affects
> a lot of locations in the code base, there's an on-going
> effort to fix that globally:
> 
> https://patchwork.ozlabs.org/project/qemu-devel/list/?series=163907
> 
> I don't know if this will make it for 5.0, but in any case I
> think you should call error_append_hint() in this patch anyway
> and the code will just work at some later point.
> 
> Rest looks good.

Thanks will do,
Nick

Re: [EXTERNAL] [PATCH 1/2] target/ppc: Fix slbia TLB invalidation gap

2020-03-18 Thread Nicholas Piggin

Cédric Le Goater's on March 19, 2020 2:45 am:
> On 3/18/20 5:41 AM, Nicholas Piggin wrote:
>> slbia must invalidate TLBs even if it does not remove a valid SLB
>> entry, because slbmte can overwrite valid entries without removing
>> their TLBs.
>> 
>> As the architecture says, slbia invalidates all lookaside information,
>> not conditionally based on if it removed valid entries.
>> 
>> It does not seem possible for POWER8 or earlier Linux kernels to hit
>> this bug because it never changes its kernel SLB translations, and it
>> should always have valid entries if any accesses are made to usespace
>> regions. However other operating systems which may modify SLB entry 0
>> or do more fancy things with segments might be affected.
> 
> Did you hit the bug on the other OS ? 

No, hit it when fixing POWER9 hash.

>  
>> When POWER9 slbia support is added in the next patch, this becomes a
>> real problem because some new slbia variants don't invalidate all
>> non-zero entries.
>> 
>> Signed-off-by: Nicholas Piggin 
> 
> Looks correct.
> 
> Reviewed-by: Cédric Le Goater 

Thanks,
Nick

[PATCH v4 2/2] virtio-gpu: add the ability to export resources

2020-03-18 Thread David Stevens

Signed-off-by: David Stevens 
---
 virtio-gpu.tex | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/virtio-gpu.tex b/virtio-gpu.tex
index af4ca61..e75aafa 100644
--- a/virtio-gpu.tex
+++ b/virtio-gpu.tex
@@ -35,6 +35,8 @@ \subsection{Feature bits}\label{sec:Device Types / GPU Device 
/ Feature bits}
 \begin{description}
 \item[VIRTIO_GPU_F_VIRGL (0)] virgl 3D mode is supported.
 \item[VIRTIO_GPU_F_EDID  (1)] EDID is supported.
+\item[VIRTIO_GPU_F_RESOURCE_UUID (2)] assigning resources UUIDs for export
+  to other virtio devices is supported.
 \end{description}
 
 \subsection{Device configuration layout}\label{sec:Device Types / GPU Device / 
Device configuration layout}
@@ -181,6 +183,7 @@ \subsubsection{Device Operation: Request 
header}\label{sec:Device Types / GPU De
 VIRTIO_GPU_CMD_GET_CAPSET_INFO,
 VIRTIO_GPU_CMD_GET_CAPSET,
 VIRTIO_GPU_CMD_GET_EDID,
+VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID,
 
 /* cursor commands */
 VIRTIO_GPU_CMD_UPDATE_CURSOR = 0x0300,
@@ -192,6 +195,7 @@ \subsubsection{Device Operation: Request 
header}\label{sec:Device Types / GPU De
 VIRTIO_GPU_RESP_OK_CAPSET_INFO,
 VIRTIO_GPU_RESP_OK_CAPSET,
 VIRTIO_GPU_RESP_OK_EDID,
+VIRTIO_GPU_RESP_OK_RESOURCE_UUID,
 
 /* error responses */
 VIRTIO_GPU_RESP_ERR_UNSPEC = 0x1200,
@@ -454,6 +458,32 @@ \subsubsection{Device Operation: 
controlq}\label{sec:Device Types / GPU Device /
 This detaches any backing pages from a resource, to be used in case of
 guest swapping or object destruction.
 
+\item[VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID] Creates an exported object from
+  a resource. Request data is \field{struct
+virtio_gpu_resource_assign_uuid}.  Response type is
+  VIRTIO_GPU_RESP_OK_RESOURCE_UUID, response data is \field{struct
+virtio_gpu_resp_resource_uuid}. Support is optional and negotiated
+using the VIRTIO_GPU_F_RESOURCE_UUID feature flag.
+
+\begin{lstlisting}
+struct virtio_gpu_resource_assign_uuid {
+struct virtio_gpu_ctrl_hdr hdr;
+le32 resource_id;
+le32 padding;
+};
+
+struct virtio_gpu_resp_resource_uuid {
+struct virtio_gpu_ctrl_hdr hdr;
+u8 uuid[16];
+};
+\end{lstlisting}
+
+The response contains a UUID which identifies the exported object created from
+the host private resource. Note that if the resource has an attached backing,
+modifications made to the host private resource through the exported object by
+other devices are not visible in the attached backing until they are 
transferred
+into the backing.
+
 \end{description}
 
 \subsubsection{Device Operation: cursorq}\label{sec:Device Types / GPU Device 
/ Device Operation / Device Operation: cursorq}
-- 
2.25.1.481.gfbce0eb801-goog

[PATCH v4 1/2] content: define what an exported object is

2020-03-18 Thread David Stevens

Define a mechanism for sharing objects between different virtio
devices.

Signed-off-by: David Stevens 
---
 content.tex  | 12 
 introduction.tex |  4 
 2 files changed, 16 insertions(+)

diff --git a/content.tex b/content.tex
index b1ea9b9..c8a367b 100644
--- a/content.tex
+++ b/content.tex
@@ -373,6 +373,18 @@ \section{Driver Notifications} \label{sec:Virtqueues / 
Driver notifications}
 
 \input{shared-mem.tex}
 
+\section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / 
Exporting Objects}
+
+When an object created by one virtio device needs to be
+shared with a seperate virtio device, the first device can
+export the object by generating a UUID which can then
+be passed to the second device to identify the object.
+
+What constitutes an object, how to export objects, and
+how to import objects are defined by the individual device
+types. It is RECOMMENDED that devices generate version 4
+UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
+
 \chapter{General Initialization And Device Operation}\label{sec:General 
Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
diff --git a/introduction.tex b/introduction.tex
index 40f16f8..fc2aa50 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -40,6 +40,10 @@ \section{Normative References}\label{sec:Normative 
References}
\phantomsection\label{intro:rfc2119}\textbf{[RFC2119]} &
 Bradner S., ``Key words for use in RFCs to Indicate Requirement
 Levels'', BCP 14, RFC 2119, March 1997. 
\newline\url{http://www.ietf.org/rfc/rfc2119.txt}\\
+   \phantomsection\label{intro:rfc4122}\textbf{[RFC4122]} &
+Leach, P., Mealling, M., and R. Salz, ``A Universally Unique
+IDentifier (UUID) URN Namespace'', RFC 4122, DOI 10.17487/RFC4122,
+July 2005. \newline\url{http://www.ietf.org/rfc/rfc4122.txt}\\
\phantomsection\label{intro:S390 PoP}\textbf{[S390 PoP]} & 
z/Architecture Principles of Operation, IBM Publication SA22-7832, 
\newline\url{http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf}, and any 
future revisions\\
\phantomsection\label{intro:S390 Common I/O}\textbf{[S390 Common I/O]} 
& ESA/390 Common I/O-Device and Self-Description, IBM Publication SA22-7204, 
\newline\url{http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar501/CCONTENTS},
 and any future revisions\\
\phantomsection\label{intro:PCI}\textbf{[PCI]} &
-- 
2.25.1.481.gfbce0eb801-goog

[PATCH v4 0/2] Cross-device resource sharing

2020-03-18 Thread David Stevens

Hi all,

This is the next iteration of patches for adding support for sharing
resources between different virtio devices. The corresponding Linux
implementation is [1].

In addition to these patches, the most recent virtio-video patchset
includes a patch for importing objects into that device [2].

[1] https://markmail.org/thread/bfy6uk4q4v4cus7h
[2] https://markmail.org/message/wxdne5re7aaugbjg

Changes v3 -> v4:
 - Add virtio-gpu feature bit
 - Move virtio-gpu assign uuid command into 2d command group
 - Rename virtio-gpu uuid response

David Stevens (2):
  content: define what an exported object is
  virtio-gpu: add the ability to export resources

 content.tex  | 12 
 introduction.tex |  4 
 virtio-gpu.tex   | 29 +
 3 files changed, 45 insertions(+)

-- 
2.25.1.481.gfbce0eb801-goog

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-18 Thread Yan Zhao

On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:
> - Defined MIGRATION region type and sub-type.
> 
> - Defined vfio_device_migration_info structure which will be placed at the
>   0th offset of migration region to get/set VFIO device related
>   information. Defined members of structure and usage on read/write access.
> 
> - Defined device states and state transition details.
> 
> - Defined sequence to be followed while saving and resuming VFIO device.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  include/uapi/linux/vfio.h | 227 
> ++
>  1 file changed, 227 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9e843a147ead..d0021467af53 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
>  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
>  #define VFIO_REGION_TYPE_GFX(1)
>  #define VFIO_REGION_TYPE_CCW (2)
> +#define VFIO_REGION_TYPE_MIGRATION  (3)
>  
>  /* sub-types for VFIO_REGION_TYPE_PCI_* */
>  
> @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
>  /* sub-types for VFIO_REGION_TYPE_CCW */
>  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
>  
> +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> +
> +/*
> + * The structure vfio_device_migration_info is placed at the 0th offset of
> + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device 
> related
> + * migration information. Field accesses from this structure are only 
> supported
> + * at their native width and alignment. Otherwise, the result is undefined 
> and
> + * vendor drivers should return an error.
> + *
> + * device_state: (read/write)
> + *  - The user application writes to this field to inform the vendor 
> driver
> + *about the device state to be transitioned to.
> + *  - The vendor driver should take the necessary actions to change the
> + *device state. After successful transition to a given state, the
> + *vendor driver should return success on write(device_state, state)
> + *system call. If the device state transition fails, the vendor 
> driver
> + *should return an appropriate -errno for the fault condition.
> + *  - On the user application side, if the device state transition fails,
> + * that is, if write(device_state, state) returns an error, read
> + * device_state again to determine the current state of the device from
> + * the vendor driver.
> + *  - The vendor driver should return previous state of the device unless
> + *the vendor driver has encountered an internal error, in which case
> + *the vendor driver may report the device_state 
> VFIO_DEVICE_STATE_ERROR.
> + *  - The user application must use the device reset ioctl to recover the
> + *device from VFIO_DEVICE_STATE_ERROR state. If the device is
> + *indicated to be in a valid device state by reading device_state, 
> the
> + *user application may attempt to transition the device to any valid
> + *state reachable from the current state or terminate itself.
> + *
> + *  device_state consists of 3 bits:
> + *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is 
> clear,
> + *it indicates the _STOP state. When the device state is changed to
> + *_STOP, driver should stop the device before write() returns.
> + *  - If bit 1 is set, it indicates the _SAVING state, which means that 
> the
> + *driver should start gathering device state information that will be
> + *provided to the VFIO user application to save the device's state.
> + *  - If bit 2 is set, it indicates the _RESUMING state, which means that
> + *the driver should prepare to resume the device. Data provided 
> through
> + *the migration region should be used to resume the device.
> + *  Bits 3 - 31 are reserved for future use. To preserve them, the user
> + *  application should perform a read-modify-write operation on this
> + *  field when modifying the specified bits.
> + *
> + *  +--- _RESUMING
> + *  |+-- _SAVING
> + *  ||+- _RUNNING
> + *  |||
> + *  000b => Device Stopped, not saving or resuming
> + *  001b => Device running, which is the default state
> + *  010b => Stop the device & save the device state, stop-and-copy state
> + *  011b => Device running and save the device state, pre-copy state
> + *  100b => Device stopped and the device state is resuming
> + *  101b => Invalid state
> + *  110b => Error state
> + *  111b => Invalid state
> + *
> + * State transitions:
> + *
> + *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
> + *(100b) (001b) (011b)(010b)   (000b)
> + * 0. Running or default st

Re: [PATCH v9 0/4] linux-user: generate syscall_nr.sh for RISC-V

2020-03-18 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/cover.1584571250.git.alistair.fran...@wdc.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v9 0/4]  linux-user: generate syscall_nr.sh for RISC-V
Message-id: cover.1584571250.git.alistair.fran...@wdc.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
cd20a33 linux-user/riscv: Update the syscall_nr's to the 5.5 kernel
e06767d linux-user: Support futex_time64
965d406 linux-user/syscall: Add support for clock_gettime64/clock_settime64
21e4b07 linux-user: Protect more syscalls

=== OUTPUT BEGIN ===
1/4 Checking commit 21e4b07f78b8 (linux-user: Protect more syscalls)
2/4 Checking commit 965d40690148 (linux-user/syscall: Add support for 
clock_gettime64/clock_settime64)
3/4 Checking commit e06767d54c9c (linux-user: Support futex_time64)
WARNING: architecture specific defines should be avoided
#23: FILE: linux-user/syscall.c:248:
+#if defined(__NR_futex)

WARNING: architecture specific defines should be avoided
#26: FILE: linux-user/syscall.c:251:
+#if defined(__NR_futex_time64)

WARNING: architecture specific defines should be avoided
#37: FILE: linux-user/syscall.c:303:
+#if (defined(TARGET_NR_futex) && defined(__NR_futex)) || \

WARNING: architecture specific defines should be avoided
#43: FILE: linux-user/syscall.c:309:
+#if defined(__NR_futex_time64)

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:OxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
 ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:OxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
  ^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
  ^

ERROR: space required after that ',' (ctx:OxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
   ^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
   ^

WARNING: architecture specific defines should be avoided
#55: FILE: linux-user/syscall.c:776:
+#if defined(__NR_futex)

WARNING: architecture specific defines should be avoided
#59: FILE: linux-user/syscall.c:780:
+#if defined(__NR_futex_time64)

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
  ^

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
   ^

ERROR: space required after that ',' (ctx:OxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
 ^

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
   ^

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/sys

Re: [PATCH v9 0/4] linux-user: generate syscall_nr.sh for RISC-V

2020-03-18 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/cover.1584571250.git.alistair.fran...@wdc.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v9 0/4]  linux-user: generate syscall_nr.sh for RISC-V
Message-id: cover.1584571250.git.alistair.fran...@wdc.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1d09c6a linux-user/riscv: Update the syscall_nr's to the 5.5 kernel
9d49c03 linux-user: Support futex_time64
d25d933 linux-user/syscall: Add support for clock_gettime64/clock_settime64
df4b9e6 linux-user: Protect more syscalls

=== OUTPUT BEGIN ===
1/4 Checking commit df4b9e6b1f0f (linux-user: Protect more syscalls)
2/4 Checking commit d25d933726c3 (linux-user/syscall: Add support for 
clock_gettime64/clock_settime64)
3/4 Checking commit 9d49c030a0f2 (linux-user: Support futex_time64)
WARNING: architecture specific defines should be avoided
#23: FILE: linux-user/syscall.c:248:
+#if defined(__NR_futex)

WARNING: architecture specific defines should be avoided
#26: FILE: linux-user/syscall.c:251:
+#if defined(__NR_futex_time64)

WARNING: architecture specific defines should be avoided
#37: FILE: linux-user/syscall.c:303:
+#if (defined(TARGET_NR_futex) && defined(__NR_futex)) || \

WARNING: architecture specific defines should be avoided
#43: FILE: linux-user/syscall.c:309:
+#if defined(__NR_futex_time64)

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:OxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
 ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
   ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:VxV)
#44: FILE: linux-user/syscall.c:310:
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
  ^

ERROR: space required after that ',' (ctx:OxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
  ^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
  ^

ERROR: space required after that ',' (ctx:OxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
   ^

ERROR: space required after that ',' (ctx:VxV)
#45: FILE: linux-user/syscall.c:311:
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
   ^

WARNING: architecture specific defines should be avoided
#55: FILE: linux-user/syscall.c:776:
+#if defined(__NR_futex)

WARNING: architecture specific defines should be avoided
#59: FILE: linux-user/syscall.c:780:
+#if defined(__NR_futex_time64)

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
  ^

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
   ^

ERROR: space required after that ',' (ctx:OxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
 ^

ERROR: space required after that ',' (ctx:VxV)
#60: FILE: linux-user/syscall.c:781:
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
   ^

ERROR: space required

Re: [PATCH v2 0/4] travis-ci: Add a KVM-only s390x job

2020-03-18 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200318222717.24676-1-phi...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTcheck-unit: tests/test-rcu-simpleq
  TESTiotest-qcow2: 024
**
ERROR:/tmp/qemu-test/src/tests/qtest/migration-test.c:1249:test_migrate_auto_converge:
 assertion failed (remaining < (expected_threshold + expected_threshold / 
100)): (103809024 < 10100)
ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/qtest/migration-test.c:1249:test_migrate_auto_converge:
 assertion failed (remaining < (expected_threshold + expected_threshold / 
100)): (103809024 < 10100)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-unit: tests/test-rcu-tailq
  TESTiotest-qcow2: 025
---
qemu-system-x86_64: -accel kvm: failed to initialize kvm: No such file or 
directory
qemu-system-x86_64: falling back to tcg
**
ERROR:/tmp/qemu-test/src/tests/qtest/migration-test.c:1249:test_migrate_auto_converge:
 assertion failed (remaining < (expected_threshold + expected_threshold / 
100)): (103809024 < 10100)
ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/qtest/migration-test.c:1249:test_migrate_auto_converge:
 assertion failed (remaining < (expected_threshold + expected_threshold / 
100)): (103809024 < 10100)
make: *** [check-qtest-x86_64] Error 1
  TESTiotest-qcow2: 041
  TESTiotest-qcow2: 042
  TESTiotest-qcow2: 043
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ff24ee2953f1454290305113cefa8f7e', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-qj2x29ta/src/docker-src.2020-03-18-20.12.13.29783:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ff24ee2953f1454290305113cefa8f7e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-qj2x29ta/src'
make: *** [docker-run-test-quick@centos7] Error 2

real11m32.251s
user0m8.589s


The full log is available at
http://patchew.org/logs/20200318222717.24676-1-phi...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v6 06/61] target/riscv: add vector stride load and store instructions

2020-03-18 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:19 AM LIU Zhiwei  wrote:
>
> Vector strided operations access the first memory element at the base address,
> and then access subsequent elements at address increments given by the byte
> offset contained in the x register specified by rs2.
>
> Vector unit-stride operations access elements stored contiguously in memory
> starting from the base effective address. It can been seen as a special
> case of strided operations.
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 105 ++
>  target/riscv/insn32.decode  |  32 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 346 
>  target/riscv/internals.h|   5 +
>  target/riscv/translate.c|   7 +
>  target/riscv/vector_helper.c| 406 
>  6 files changed, 901 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 3c28c7e407..87dfa90609 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
>  #endif
>  /* Vector functions */
>  DEF_HELPER_3(vsetvl, tl, env, tl, tl)
> +DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env,

Re: [PATCH v6 05/61] target/riscv: add an internals.h header

2020-03-18 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:17 AM LIU Zhiwei  wrote:
>
> The internals.h keeps things that are not relevant to the actual architecture,
> only to the implementation, separate.
>
> Signed-off-by: LIU Zhiwei 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/internals.h | 24 
>  1 file changed, 24 insertions(+)
>  create mode 100644 target/riscv/internals.h
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> new file mode 100644
> index 00..cabea18e1d
> --- /dev/null
> +++ b/target/riscv/internals.h
> @@ -0,0 +1,24 @@
> +/*
> + * QEMU RISC-V CPU -- internal functions and types
> + *
> + * Copyright (c) 2020 C-SKY Limited. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#ifndef RISCV_CPU_INTERNALS_H
> +#define RISCV_CPU_INTERNALS_H
> +
> +#include "hw/registerfields.h"
> +
> +#endif
> --
> 2.23.0
>

Re: [PATCH v9 2/4] linux-user/syscall: Add support for clock_gettime64/clock_settime64

2020-03-18 Thread Philippe Mathieu-Daudé


On 3/18/20 11:46 PM, Alistair Francis wrote:

Add support for the clock_gettime64/clock_settime64 syscalls.

If your host is 64-bit or is 32-bit with the *_time64 syscall then the
timespec will correctly be a 64-bit time_t. Otherwise the host will
return a 32-bit time_t which will be rounded to 64-bits. This will be
incorrect after y2038.

Signed-off-by: Alistair Francis 
Reviewed-by: Laurent Vivier 
---
  linux-user/syscall.c | 39 +++
  1 file changed, 39 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 909bec94a5..60fd775d9c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1229,6 +1229,22 @@ static inline abi_long target_to_host_timespec(struct 
timespec *host_ts,
  }
  #endif
  
+#if defined(TARGET_NR_clock_settime64)

+static inline abi_long target_to_host_timespec64(struct timespec *host_ts,
+ abi_ulong target_addr)
+{
+struct target__kernel_timespec *target_ts;
+
+if (!lock_user_struct(VERIFY_READ, target_ts, target_addr, 1)) {
+return -TARGET_EFAULT;
+}
+__get_user(host_ts->tv_sec, &target_ts->tv_sec);
+__get_user(host_ts->tv_nsec, &target_ts->tv_nsec);
+unlock_user_struct(target_ts, target_addr, 0);
+return 0;
+}
+#endif
+
  static inline abi_long host_to_target_timespec(abi_ulong target_addr,
 struct timespec *host_ts)
  {
@@ -11458,6 +11474,18 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
  return ret;
  }
  #endif
+#ifdef TARGET_NR_clock_settime64
+case TARGET_NR_clock_settime64:
+{
+struct timespec ts;
+
+ret = target_to_host_timespec64(&ts, arg2);
+if (!is_error(ret)) {
+ret = get_errno(clock_settime(arg1, &ts));
+}
+return ret;
+}
+#endif
  #ifdef TARGET_NR_clock_gettime
  case TARGET_NR_clock_gettime:
  {
@@ -11469,6 +11497,17 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
  return ret;
  }
  #endif
+#ifdef TARGET_NR_clock_gettime64
+case TARGET_NR_clock_gettime64:
+{
+struct timespec ts;
+ret = get_errno(clock_gettime(arg1, &ts));
+if (!is_error(ret)) {
+ret = host_to_target_timespec64(arg2, &ts);
+}
+return ret;
+}
+#endif
  #ifdef TARGET_NR_clock_getres
  case TARGET_NR_clock_getres:
  {



Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v9 1/4] linux-user: Protect more syscalls

2020-03-18 Thread Philippe Mathieu-Daudé


On 3/18/20 11:46 PM, Alistair Francis wrote:

New y2038 safe 32-bit architectures (like RISC-V) don't support old
syscalls with a 32-bit time_t. The kernel defines new *_time64 versions
of these syscalls. Add some more #ifdefs to syscall.c in linux-user to
allow us to compile without these old syscalls.

Signed-off-by: Alistair Francis 
Reviewed-by: Laurent Vivier 
---
  linux-user/strace.c  |  2 ++
  linux-user/syscall.c | 68 ++--


This patch is easier to review with 'git-diff --function-context'.


  2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4f7130b2ff..6420ccd97b 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -775,6 +775,7 @@ print_syscall_ret_newselect(const struct syscallname *name, 
abi_long ret)
  #define TARGET_TIME_OOP  3   /* leap second in progress */
  #define TARGET_TIME_WAIT 4   /* leap second has occurred */
  #define TARGET_TIME_ERROR5   /* clock not synchronized */
+#ifdef TARGET_NR_adjtimex
  static void
  print_syscall_ret_adjtimex(const struct syscallname *name, abi_long ret)
  {
@@ -813,6 +814,7 @@ print_syscall_ret_adjtimex(const struct syscallname *name, 
abi_long ret)
  
  qemu_log("\n");

  }
+#endif
  
  UNUSED static struct flags access_flags[] = {

  FLAG_GENERIC(F_OK),
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10807..909bec94a5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -742,21 +742,30 @@ safe_syscall3(ssize_t, read, int, fd, void *, buff, 
size_t, count)
  safe_syscall3(ssize_t, write, int, fd, const void *, buff, size_t, count)
  safe_syscall4(int, openat, int, dirfd, const char *, pathname, \
int, flags, mode_t, mode)
+#if defined(TARGET_NR_wait4) || defined(TARGET_NR_waitpid)
  safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, options, \
struct rusage *, rusage)
+#endif
  safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
int, options, struct rusage *, rusage)
  safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
+#if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
+defined(TARGET_NR_pselect6)
  safe_syscall6(int, pselect6, int, nfds, fd_set *, readfds, fd_set *, 
writefds, \
fd_set *, exceptfds, struct timespec *, timeout, void *, sig)
+#endif
+#if defined(TARGET_NR_ppoll) || defined(TARGET_NR_poll)
  safe_syscall5(int, ppoll, struct pollfd *, ufds, unsigned int, nfds,
struct timespec *, tsp, const sigset_t *, sigmask,
size_t, sigsetsize)
+#endif
  safe_syscall6(int, epoll_pwait, int, epfd, struct epoll_event *, events,
int, maxevents, int, timeout, const sigset_t *, sigmask,
size_t, sigsetsize)
+#ifdef TARGET_NR_futex
  safe_syscall6(int,futex,int *,uaddr,int,op,int,val, \
const struct timespec *,timeout,int *,uaddr2,int,val3)
+#endif
  safe_syscall2(int, rt_sigsuspend, sigset_t *, newset, size_t, sigsetsize)
  safe_syscall2(int, kill, pid_t, pid, int, sig)
  safe_syscall2(int, tkill, int, tid, int, sig)
@@ -776,12 +785,16 @@ safe_syscall6(ssize_t, recvfrom, int, fd, void *, buf, 
size_t, len,
  safe_syscall3(ssize_t, sendmsg, int, fd, const struct msghdr *, msg, int, 
flags)
  safe_syscall3(ssize_t, recvmsg, int, fd, struct msghdr *, msg, int, flags)
  safe_syscall2(int, flock, int, fd, int, operation)
+#ifdef TARGET_NR_rt_sigtimedwait
  safe_syscall4(int, rt_sigtimedwait, const sigset_t *, these, siginfo_t *, 
uinfo,
const struct timespec *, uts, size_t, sigsetsize)
+#endif
  safe_syscall4(int, accept4, int, fd, struct sockaddr *, addr, socklen_t *, 
len,
int, flags)
+#if defined(TARGET_NR_nanosleep)
  safe_syscall2(int, nanosleep, const struct timespec *, req,
struct timespec *, rem)
+#endif
  #ifdef TARGET_NR_clock_nanosleep
  safe_syscall4(int, clock_nanosleep, const clockid_t, clock, int, flags,
const struct timespec *, req, struct timespec *, rem)
@@ -802,9 +815,11 @@ safe_syscall5(int, msgrcv, int, msgid, void *, msgp, 
size_t, sz,
  safe_syscall4(int, semtimedop, int, semid, struct sembuf *, tsops,
unsigned, nsops, const struct timespec *, timeout)
  #endif
-#if defined(TARGET_NR_mq_open) && defined(__NR_mq_open)
+#ifdef TARGET_NR_mq_timedsend
  safe_syscall5(int, mq_timedsend, int, mqdes, const char *, msg_ptr,
size_t, len, unsigned, prio, const struct timespec *, timeout)
+#endif
+#ifdef TARGET_NR_mq_timedreceive
  safe_syscall5(int, mq_timedreceive, int, mqdes, char *, msg_ptr,
size_t, len, unsigned *, prio, const struct timespec *, timeout)
  #endif
@@ -946,6 +961,8 @@ abi_long do_brk(abi_ulong new_brk)
  return target_brk;
  }
  
+#if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \

+defined(TARGET_NR_pselect

Re: [PATCH] gdbstub: add support to Xfer:auxv:read: packet

2020-03-18 Thread Lirong Yuan

On Fri, Mar 6, 2020 at 5:01 PM Lirong Yuan  wrote:

> This allows gdb to access the target’s auxiliary vector,
> which can be helpful for telling system libraries important details
> about the hardware, operating system, and process.
>
> Signed-off-by: Lirong Yuan 
> ---
>  gdbstub.c | 55 +++
>  1 file changed, 55 insertions(+)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index 22a2d630cd..a946af7007 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -2105,6 +2105,12 @@ static void handle_query_supported(GdbCmdContext
> *gdb_ctx, void *user_ctx)
>  pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf),
>  ";qXfer:features:read+");
>  }
> +#ifdef CONFIG_USER_ONLY
> +if (gdb_ctx->s->c_cpu->opaque) {
> +pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf),
> +";qXfer:auxv:read+");
> +}
> +#endif
>
>  if (gdb_ctx->num_params &&
>  strstr(gdb_ctx->params[0].data, "multiprocess+")) {
> @@ -2166,6 +2172,47 @@ static void
> handle_query_xfer_features(GdbCmdContext *gdb_ctx, void *user_ctx)
>  put_packet_binary(gdb_ctx->s, gdb_ctx->str_buf, len + 1, true);
>  }
>
> +#ifdef CONFIG_USER_ONLY
> +static void handle_query_xfer_auxv(GdbCmdContext *gdb_ctx, void *user_ctx)
> +{
> +TaskState *ts;
> +unsigned long offset, len, saved_auxv, auxv_len;
> +const char *mem;
> +
> +if (gdb_ctx->num_params < 2) {
> +put_packet(gdb_ctx->s, "E22");
> +return;
> +}
> +
> +offset = gdb_ctx->params[0].val_ul;
> +len = gdb_ctx->params[1].val_ul;
> +
> +ts = gdb_ctx->s->c_cpu->opaque;
> +saved_auxv = ts->info->saved_auxv;
> +auxv_len = ts->info->auxv_len;
> +mem = (const char *)(saved_auxv + offset);
> +
> +if (offset >= auxv_len) {
> +put_packet(gdb_ctx->s, "E22");
> +return;
> +}
> +
> +if (len > (MAX_PACKET_LENGTH - 5) / 2) {
> +len = (MAX_PACKET_LENGTH - 5) / 2;
> +}
> +
> +if (len < auxv_len - offset) {
> +gdb_ctx->str_buf[0] = 'm';
> +len = memtox(gdb_ctx->str_buf + 1, mem, len);
> +} else {
> +gdb_ctx->str_buf[0] = 'l';
> +len = memtox(gdb_ctx->str_buf + 1, mem, auxv_len - offset);
> +}
> +
> +put_packet_binary(gdb_ctx->s, gdb_ctx->str_buf, len + 1, true);
> +}
> +#endif
> +
>  static void handle_query_attached(GdbCmdContext *gdb_ctx, void *user_ctx)
>  {
>  put_packet(gdb_ctx->s, GDB_ATTACHED);
> @@ -2271,6 +2318,14 @@ static GdbCmdParseEntry gdb_gen_query_table[] = {
>  .cmd_startswith = 1,
>  .schema = "s:l,l0"
>  },
> +#ifdef CONFIG_USER_ONLY
> +{
> +.handler = handle_query_xfer_auxv,
> +.cmd = "Xfer:auxv:read:",
> +.cmd_startswith = 1,
> +.schema = "l,l0"
> +},
> +#endif
>  {
>  .handler = handle_query_attached,
>  .cmd = "Attached:",
> --
> 2.25.1.481.gfbce0eb801-goog
>
>
Friendly ping~

Link to the patchwork page:
http://patchwork.ozlabs.org/patch/1250727/

Re: [PATCH v2] slirp: update submodule to v4.2.0 + mingw-fix

2020-03-18 Thread Samuel Thibault

Marc-André Lureau, le mar. 17 mars 2020 19:13:36 +0100, a ecrit:
> git shortlog
> 126c04acbabd7ad32c2b018fe10dfac2a3bc1210..7012a2c62e5b54eab88c119383022ec7ce86e9b2
> 
> 5eraph (1):
>   Use specific outbound IP address
> 
> Akihiro Suda (8):
>   remove confusing comment that exists from ancient slirp
>   add slirp_new(SlirpConfig *, SlirpCb *, void *)
>   allow custom MTU
>   add disable_host_loopback (prohibit connections to 127.0.0.1)
>   add SlirpConfig version
>   emu: remove dead code
>   emu: disable by default
>   fix a typo in a comment
> 
> Anders Waldenborg (1):
>   state: fix loading of guestfwd state
> 
> Giuseppe Scrivano (1):
>   socket: avoid getpeername after shutdown(SHUT_WR)
> 
> Jindrich Novy (1):
>   Don't leak memory when reallocation fails.
> 
> Jordi Pujol Palomer (1):
>   fork_exec: correctly parse command lines that contain spaces
> 
> Marc-André Lureau (54):
>   Merge branch 'AkihiroSuda/libslirp-slirp4netns'
>   Merge branch 'fix-typo' into 'master'
>   meson: make it subproject friendly
>   Merge branch 'meson' into 'master'
>   misc: fix compilation warnings
>   Merge branch 'fix-shutdown-wr' into 'master'
>   sbuf: remove unused and undefined sbcopy() path
>   sbuf: check more strictly sbcopy() bounds with offset
>   sbuf: replace a comment with a runtime warning
>   Replace remaining malloc/free user with glib
>   tcp_attach() can no longer fail
>   state: can't ENOMEM
>   sbuf: use unsigned types
>   sbuf: simplify sbreserve()
>   dnssearch: use g_strv_length()
>   vmstate: silence scan-build warning
>   gitlab-ci: run scan-build
>   Merge branch 'mem-cleanups' into 'master'
>   libslirp.map: bind slirp_new to SLIRP_4.1 version
>   meson: fix libtool versioning
>   Release v4.1.0
>   Merge branch '4.1.0' into 'master'
>   CHANGELOG: start unreleased section
>   Merge branch 'add-unix' into 'master'
>   util: add G_SIZEOF_MEMBER() macro
>   Check bootp_filename is not going to be truncated
>   bootp: remove extra cast
>   bootp: replace simple snprintf() with strcpy()
>   tftp: clarify what is actually OACK m_len
>   tcp_emu: add more fixme/warnings comments
>   util: add slirp_fmt() helpers
>   dhcpv6: use slirp_fmt()
>   misc: use slirp_fmt0()
>   tftp: use slirp_fmt0()
>   tcp_ctl: use slirp_fmt()
>   tcp_emu: fix unsafe snprintf() usages
>   misc: improve error report
>   Use g_snprintf()
>   util: add gnuc format function attribute to slirp_fmt*
>   Merge branch 'aw-guestfwd-state' into 'master'
>   Merge branch 'slirp-fmt' into 'master'
>   socket: remove extra label and variable
>   socket: factor out sotranslate ipv4/ipv6 handling
>   socket: remove need for extra scope_id variable
>   socket: do not fallback on host loopback if get_dns_addr() failed
>   socket: do not fallback on loopback addr for addresses in our 
> mask/prefix
>   Prepare for v4.2.0 release
>   Merge branch 'translate-fix' into 'master'
>   Merge branch 'release-v4.2.0' into 'master'
>   changelog: post-release
>   changelog: fix link
>   .gitlab-ci: add --werror, treat CI build warnings as errors
>   Revert "socket: remove need for extra scope_id variable"
>   Merge branch 'mingw-fix' into 'master'
> 
> PanNengyuan (1):
>   libslirp: fix NULL pointer dereference in tcp_sockclosed
> 
> Philippe Mathieu-Daudé (1):
>   Add a git-publish configuration file
> 
> Prasad J Pandit (4):
>   slirp: ncsi: compute checksum for valid data length
>   slirp: use correct size while emulating IRC commands
>   slirp: use correct size while emulating commands
>   slirp: tftp: restrict relative path access
> 
> Renzo Davoli (2):
>   Add slirp_remove_guestfwd()
>   Add slirp_add_unix()
> 
> Samuel Thibault (14):
>   ip_reass: explain why we should not always update the q pointer
>   Merge branch 'comment' into 'master'
>   Merge branch 'no-emu' into 'master'
>   Fix bogus indent, no source change
>   ip_reass: Fix use after free
>   Merge branch 'reass2' into 'master'
>   Make host receive broadcast packets
>   arp: Allow 0.0.0.0 destination address
>   Merge branch 'warnings' into 'master'
>   Merge branch 'arp_0' into 'master'
>   Merge branch 'broadcast' into 'master'
>   tcp_emu: Fix oob access
>   Merge branch 'oob' into 'master'
>   Merge branch 'master' into 'master'
> 
> Cc: Samuel Thibault 
> Signed-off-by: Marc-André Lureau 

Reviewed-by: Samuel Thibault 

> ---
>  slirp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/slirp b/slirp
> index 126c04acba..7012a2c62e 16
> --- a/slirp
> +++ b/slirp
> @@ -1 +1 @@
> -Subproject commit 126c04acbabd7ad32c2b018fe10dfac2a3bc1210
> +Subproject commit 7012a2c62e5b54eab88c119383022ec7ce86e9b2
>

[PATCH v9 3/4] linux-user: Support futex_time64

2020-03-18 Thread Alistair Francis

Add support for host and target futex_time64. If futex_time64 exists on
the host we try that first before falling back to the standard futux
syscall.

Signed-off-by: Alistair Francis 
---
 linux-user/syscall.c | 144 +++
 1 file changed, 131 insertions(+), 13 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 60fd775d9c..3354f41bb2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -245,7 +245,12 @@ static type name (type1 arg1,type2 arg2,type3 arg3,type4 
arg4,type5 arg5,  \
 #define __NR_sys_rt_sigqueueinfo __NR_rt_sigqueueinfo
 #define __NR_sys_rt_tgsigqueueinfo __NR_rt_tgsigqueueinfo
 #define __NR_sys_syslog __NR_syslog
-#define __NR_sys_futex __NR_futex
+#if defined(__NR_futex)
+# define __NR_sys_futex __NR_futex
+#endif
+#if defined(__NR_futex_time64)
+# define __NR_sys_futex_time64 __NR_futex_time64
+#endif
 #define __NR_sys_inotify_init __NR_inotify_init
 #define __NR_sys_inotify_add_watch __NR_inotify_add_watch
 #define __NR_sys_inotify_rm_watch __NR_inotify_rm_watch
@@ -295,10 +300,16 @@ _syscall1(int,exit_group,int,error_code)
 #if defined(TARGET_NR_set_tid_address) && defined(__NR_set_tid_address)
 _syscall1(int,set_tid_address,int *,tidptr)
 #endif
-#if defined(TARGET_NR_futex) && defined(__NR_futex)
+#if (defined(TARGET_NR_futex) && defined(__NR_futex)) || \
+(defined(TARGET_NR_futex_time64) && \
+(HOST_LONG_BITS == 64 && defined(__NR_futex)))
 _syscall6(int,sys_futex,int *,uaddr,int,op,int,val,
   const struct timespec *,timeout,int *,uaddr2,int,val3)
 #endif
+#if defined(__NR_futex_time64)
+_syscall6(int,sys_futex_time64,int *,uaddr,int,op,int,val,
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
+#endif
 #define __NR_sys_sched_getaffinity __NR_sched_getaffinity
 _syscall3(int, sys_sched_getaffinity, pid_t, pid, unsigned int, len,
   unsigned long *, user_mask_ptr);
@@ -762,10 +773,14 @@ safe_syscall5(int, ppoll, struct pollfd *, ufds, unsigned 
int, nfds,
 safe_syscall6(int, epoll_pwait, int, epfd, struct epoll_event *, events,
   int, maxevents, int, timeout, const sigset_t *, sigmask,
   size_t, sigsetsize)
-#ifdef TARGET_NR_futex
+#if defined(__NR_futex)
 safe_syscall6(int,futex,int *,uaddr,int,op,int,val, \
   const struct timespec *,timeout,int *,uaddr2,int,val3)
 #endif
+#if defined(__NR_futex_time64)
+safe_syscall6(int,futex_time64,int *,uaddr,int,op,int,val, \
+  const struct timespec *,timeout,int *,uaddr2,int,val3)
+#endif
 safe_syscall2(int, rt_sigsuspend, sigset_t *, newset, size_t, sigsetsize)
 safe_syscall2(int, kill, pid_t, pid, int, sig)
 safe_syscall2(int, tkill, int, tid, int, sig)
@@ -1229,7 +1244,7 @@ static inline abi_long target_to_host_timespec(struct 
timespec *host_ts,
 }
 #endif
 
-#if defined(TARGET_NR_clock_settime64)
+#if defined(TARGET_NR_clock_settime64) || defined(TARGET_NR_futex_time64)
 static inline abi_long target_to_host_timespec64(struct timespec *host_ts,
  abi_ulong target_addr)
 {
@@ -6890,6 +6905,55 @@ static inline abi_long host_to_target_statx(struct 
target_statx *host_stx,
 }
 #endif
 
+static int do_sys_futex(int *uaddr, int op, int val,
+ const struct timespec *timeout, int *uaddr2,
+ int val3)
+{
+#if HOST_LONG_BITS == 64
+#if defined(__NR_futex)
+/* always a 64-bit time_t, it doesn't define _time64 version  */
+return sys_futex(uaddr, op, val, timeout, uaddr2, val3);
+
+#endif
+#else /* HOST_LONG_BITS == 64 */
+#if defined(__NR_futex_time64)
+if (sizeof(timeout->tv_sec) == 8) {
+/* _time64 function on 32bit arch */
+return sys_futex_time64(uaddr, op, val, timeout, uaddr2, val3);
+}
+#endif
+#if defined(__NR_futex)
+/* old function on 32bit arch */
+return sys_futex(uaddr, op, val, timeout, uaddr2, val3);
+#endif
+#endif /* HOST_LONG_BITS == 64 */
+g_assert_not_reached();
+}
+
+static int do_safe_futex(int *uaddr, int op, int val,
+ const struct timespec *timeout, int *uaddr2,
+ int val3)
+{
+#if HOST_LONG_BITS == 64
+#if defined(__NR_futex)
+/* always a 64-bit time_t, it doesn't define _time64 version  */
+return get_errno(safe_futex(uaddr, op, val, timeout, uaddr2, val3));
+#endif
+#else /* HOST_LONG_BITS == 64 */
+#if defined(__NR_futex_time64)
+if (sizeof(timeout->tv_sec) == 8) {
+/* _time64 function on 32bit arch */
+return get_errno(safe_futex_time64(uaddr, op, val, timeout, uaddr2,
+   val3));
+}
+#endif
+#if defined(__NR_futex)
+/* old function on 32bit arch */
+return get_errno(safe_futex(uaddr, op, val, timeout, uaddr2, val3));
+#endif
+#endif /* HOST_LONG_BITS == 64 */
+return -TARGET_ENOSYS;
+}
 
 /* ??? Using host futex calls even when target atomic operations
are not really atomic probably

[PATCH v9 0/4] linux-user: generate syscall_nr.sh for RISC-V

2020-03-18 Thread Alistair Francis

This series updates the RISC-V syscall_nr.sh based on the 5.5 kernel.

There are two parts to this. One is just adding the new syscalls, the
other part is updating the RV32 syscalls to match the fact that RV32 is
a 64-bit time_t architectures (y2038) safe.
We need to make some changes to syscall.c to avoid warnings/errors
during compliling with the new syscall.

I did some RV32 user space testing after applying these patches. I ran the
glibc testsuite in userspace and I don't see any regressions.

v9:
 - Fix futex patch compile error
v8:
 - Add a g_assert_not_reached() in do_sys_futex
v7:
 - Update futuex_time64 support to work correctly
v6:
 - Split out futex patch and make it more robust
v5:
 - Addres comments raised on v4
   - Don't require 64-bit host for * _time64 functions

Alistair Francis (4):
  linux-user: Protect more syscalls
  linux-user/syscall: Add support for clock_gettime64/clock_settime64
  linux-user: Support futex_time64
  linux-user/riscv: Update the syscall_nr's to the 5.5 kernel

 linux-user/riscv/syscall32_nr.h | 295 +++
 linux-user/riscv/syscall64_nr.h | 301 
 linux-user/riscv/syscall_nr.h   | 294 +--
 linux-user/strace.c |   2 +
 linux-user/syscall.c| 247 --
 5 files changed, 834 insertions(+), 305 deletions(-)
 create mode 100644 linux-user/riscv/syscall32_nr.h
 create mode 100644 linux-user/riscv/syscall64_nr.h

-- 
2.25.1

[PATCH v9 4/4] linux-user/riscv: Update the syscall_nr's to the 5.5 kernel

2020-03-18 Thread Alistair Francis

Signed-off-by: Alistair Francis 
Reviewed-by: Laurent Vivier 
---
 linux-user/riscv/syscall32_nr.h | 295 +++
 linux-user/riscv/syscall64_nr.h | 301 
 linux-user/riscv/syscall_nr.h   | 294 +--
 3 files changed, 598 insertions(+), 292 deletions(-)
 create mode 100644 linux-user/riscv/syscall32_nr.h
 create mode 100644 linux-user/riscv/syscall64_nr.h

diff --git a/linux-user/riscv/syscall32_nr.h b/linux-user/riscv/syscall32_nr.h
new file mode 100644
index 00..4fef73e954
--- /dev/null
+++ b/linux-user/riscv/syscall32_nr.h
@@ -0,0 +1,295 @@
+/*
+ * This file contains the system call numbers.
+ */
+#ifndef LINUX_USER_RISCV_SYSCALL32_NR_H
+#define LINUX_USER_RISCV_SYSCALL32_NR_H
+
+#define TARGET_NR_io_setup 0
+#define TARGET_NR_io_destroy 1
+#define TARGET_NR_io_submit 2
+#define TARGET_NR_io_cancel 3
+#define TARGET_NR_setxattr 5
+#define TARGET_NR_lsetxattr 6
+#define TARGET_NR_fsetxattr 7
+#define TARGET_NR_getxattr 8
+#define TARGET_NR_lgetxattr 9
+#define TARGET_NR_fgetxattr 10
+#define TARGET_NR_listxattr 11
+#define TARGET_NR_llistxattr 12
+#define TARGET_NR_flistxattr 13
+#define TARGET_NR_removexattr 14
+#define TARGET_NR_lremovexattr 15
+#define TARGET_NR_fremovexattr 16
+#define TARGET_NR_getcwd 17
+#define TARGET_NR_lookup_dcookie 18
+#define TARGET_NR_eventfd2 19
+#define TARGET_NR_epoll_create1 20
+#define TARGET_NR_epoll_ctl 21
+#define TARGET_NR_epoll_pwait 22
+#define TARGET_NR_dup 23
+#define TARGET_NR_dup3 24
+#define TARGET_NR_fcntl64 25
+#define TARGET_NR_inotify_init1 26
+#define TARGET_NR_inotify_add_watch 27
+#define TARGET_NR_inotify_rm_watch 28
+#define TARGET_NR_ioctl 29
+#define TARGET_NR_ioprio_set 30
+#define TARGET_NR_ioprio_get 31
+#define TARGET_NR_flock 32
+#define TARGET_NR_mknodat 33
+#define TARGET_NR_mkdirat 34
+#define TARGET_NR_unlinkat 35
+#define TARGET_NR_symlinkat 36
+#define TARGET_NR_linkat 37
+#define TARGET_NR_umount2 39
+#define TARGET_NR_mount 40
+#define TARGET_NR_pivot_root 41
+#define TARGET_NR_nfsservctl 42
+#define TARGET_NR_statfs64 43
+#define TARGET_NR_fstatfs64 44
+#define TARGET_NR_truncate64 45
+#define TARGET_NR_ftruncate64 46
+#define TARGET_NR_fallocate 47
+#define TARGET_NR_faccessat 48
+#define TARGET_NR_chdir 49
+#define TARGET_NR_fchdir 50
+#define TARGET_NR_chroot 51
+#define TARGET_NR_fchmod 52
+#define TARGET_NR_fchmodat 53
+#define TARGET_NR_fchownat 54
+#define TARGET_NR_fchown 55
+#define TARGET_NR_openat 56
+#define TARGET_NR_close 57
+#define TARGET_NR_vhangup 58
+#define TARGET_NR_pipe2 59
+#define TARGET_NR_quotactl 60
+#define TARGET_NR_getdents64 61
+#define TARGET_NR_llseek 62
+#define TARGET_NR_read 63
+#define TARGET_NR_write 64
+#define TARGET_NR_readv 65
+#define TARGET_NR_writev 66
+#define TARGET_NR_pread64 67
+#define TARGET_NR_pwrite64 68
+#define TARGET_NR_preadv 69
+#define TARGET_NR_pwritev 70
+#define TARGET_NR_sendfile64 71
+#define TARGET_NR_signalfd4 74
+#define TARGET_NR_vmsplice 75
+#define TARGET_NR_splice 76
+#define TARGET_NR_tee 77
+#define TARGET_NR_readlinkat 78
+#define TARGET_NR_fstatat64 79
+#define TARGET_NR_fstat64 80
+#define TARGET_NR_sync 81
+#define TARGET_NR_fsync 82
+#define TARGET_NR_fdatasync 83
+#define TARGET_NR_sync_file_range 84
+#define TARGET_NR_timerfd_create 85
+#define TARGET_NR_acct 89
+#define TARGET_NR_capget 90
+#define TARGET_NR_capset 91
+#define TARGET_NR_personality 92
+#define TARGET_NR_exit 93
+#define TARGET_NR_exit_group 94
+#define TARGET_NR_waitid 95
+#define TARGET_NR_set_tid_address 96
+#define TARGET_NR_unshare 97
+#define TARGET_NR_set_robust_list 99
+#define TARGET_NR_get_robust_list 100
+#define TARGET_NR_getitimer 102
+#define TARGET_NR_setitimer 103
+#define TARGET_NR_kexec_load 104
+#define TARGET_NR_init_module 105
+#define TARGET_NR_delete_module 106
+#define TARGET_NR_timer_create 107
+#define TARGET_NR_timer_getoverrun 109
+#define TARGET_NR_timer_delete 111
+#define TARGET_NR_syslog 116
+#define TARGET_NR_ptrace 117
+#define TARGET_NR_sched_setparam 118
+#define TARGET_NR_sched_setscheduler 119
+#define TARGET_NR_sched_getscheduler 120
+#define TARGET_NR_sched_getparam 121
+#define TARGET_NR_sched_setaffinity 122
+#define TARGET_NR_sched_getaffinity 123
+#define TARGET_NR_sched_yield 124
+#define TARGET_NR_sched_get_priority_max 125
+#define TARGET_NR_sched_get_priority_min 126
+#define TARGET_NR_restart_syscall 128
+#define TARGET_NR_kill 129
+#define TARGET_NR_tkill 130
+#define TARGET_NR_tgkill 131
+#define TARGET_NR_sigaltstack 132
+#define TARGET_NR_rt_sigsuspend 133
+#define TARGET_NR_rt_sigaction 134
+#define TARGET_NR_rt_sigprocmask 135
+#define TARGET_NR_rt_sigpending 136
+#define TARGET_NR_rt_sigqueueinfo 138
+#define TARGET_NR_rt_sigreturn 139
+#define TARGET_NR_setpriority 140
+#define TARGET_NR_getpriority 141
+#define TARGET_NR_reboot 142
+#define TARGET_NR_setregid 143
+#define TARGET_NR_setgid 144
+#define TARGET_NR_setreuid 145
+#define TARGET_NR_set

[PATCH v9 2/4] linux-user/syscall: Add support for clock_gettime64/clock_settime64

2020-03-18 Thread Alistair Francis

Add support for the clock_gettime64/clock_settime64 syscalls.

If your host is 64-bit or is 32-bit with the *_time64 syscall then the
timespec will correctly be a 64-bit time_t. Otherwise the host will
return a 32-bit time_t which will be rounded to 64-bits. This will be
incorrect after y2038.

Signed-off-by: Alistair Francis 
Reviewed-by: Laurent Vivier 
---
 linux-user/syscall.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 909bec94a5..60fd775d9c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1229,6 +1229,22 @@ static inline abi_long target_to_host_timespec(struct 
timespec *host_ts,
 }
 #endif
 
+#if defined(TARGET_NR_clock_settime64)
+static inline abi_long target_to_host_timespec64(struct timespec *host_ts,
+ abi_ulong target_addr)
+{
+struct target__kernel_timespec *target_ts;
+
+if (!lock_user_struct(VERIFY_READ, target_ts, target_addr, 1)) {
+return -TARGET_EFAULT;
+}
+__get_user(host_ts->tv_sec, &target_ts->tv_sec);
+__get_user(host_ts->tv_nsec, &target_ts->tv_nsec);
+unlock_user_struct(target_ts, target_addr, 0);
+return 0;
+}
+#endif
+
 static inline abi_long host_to_target_timespec(abi_ulong target_addr,
struct timespec *host_ts)
 {
@@ -11458,6 +11474,18 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 }
 #endif
+#ifdef TARGET_NR_clock_settime64
+case TARGET_NR_clock_settime64:
+{
+struct timespec ts;
+
+ret = target_to_host_timespec64(&ts, arg2);
+if (!is_error(ret)) {
+ret = get_errno(clock_settime(arg1, &ts));
+}
+return ret;
+}
+#endif
 #ifdef TARGET_NR_clock_gettime
 case TARGET_NR_clock_gettime:
 {
@@ -11469,6 +11497,17 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 }
 #endif
+#ifdef TARGET_NR_clock_gettime64
+case TARGET_NR_clock_gettime64:
+{
+struct timespec ts;
+ret = get_errno(clock_gettime(arg1, &ts));
+if (!is_error(ret)) {
+ret = host_to_target_timespec64(arg2, &ts);
+}
+return ret;
+}
+#endif
 #ifdef TARGET_NR_clock_getres
 case TARGET_NR_clock_getres:
 {
-- 
2.25.1

[PATCH v9 1/4] linux-user: Protect more syscalls

2020-03-18 Thread Alistair Francis

New y2038 safe 32-bit architectures (like RISC-V) don't support old
syscalls with a 32-bit time_t. The kernel defines new *_time64 versions
of these syscalls. Add some more #ifdefs to syscall.c in linux-user to
allow us to compile without these old syscalls.

Signed-off-by: Alistair Francis 
Reviewed-by: Laurent Vivier 
---
 linux-user/strace.c  |  2 ++
 linux-user/syscall.c | 68 ++--
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4f7130b2ff..6420ccd97b 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -775,6 +775,7 @@ print_syscall_ret_newselect(const struct syscallname *name, 
abi_long ret)
 #define TARGET_TIME_OOP  3   /* leap second in progress */
 #define TARGET_TIME_WAIT 4   /* leap second has occurred */
 #define TARGET_TIME_ERROR5   /* clock not synchronized */
+#ifdef TARGET_NR_adjtimex
 static void
 print_syscall_ret_adjtimex(const struct syscallname *name, abi_long ret)
 {
@@ -813,6 +814,7 @@ print_syscall_ret_adjtimex(const struct syscallname *name, 
abi_long ret)
 
 qemu_log("\n");
 }
+#endif
 
 UNUSED static struct flags access_flags[] = {
 FLAG_GENERIC(F_OK),
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10807..909bec94a5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -742,21 +742,30 @@ safe_syscall3(ssize_t, read, int, fd, void *, buff, 
size_t, count)
 safe_syscall3(ssize_t, write, int, fd, const void *, buff, size_t, count)
 safe_syscall4(int, openat, int, dirfd, const char *, pathname, \
   int, flags, mode_t, mode)
+#if defined(TARGET_NR_wait4) || defined(TARGET_NR_waitpid)
 safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, options, \
   struct rusage *, rusage)
+#endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
 safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
+#if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
+defined(TARGET_NR_pselect6)
 safe_syscall6(int, pselect6, int, nfds, fd_set *, readfds, fd_set *, writefds, 
\
   fd_set *, exceptfds, struct timespec *, timeout, void *, sig)
+#endif
+#if defined(TARGET_NR_ppoll) || defined(TARGET_NR_poll)
 safe_syscall5(int, ppoll, struct pollfd *, ufds, unsigned int, nfds,
   struct timespec *, tsp, const sigset_t *, sigmask,
   size_t, sigsetsize)
+#endif
 safe_syscall6(int, epoll_pwait, int, epfd, struct epoll_event *, events,
   int, maxevents, int, timeout, const sigset_t *, sigmask,
   size_t, sigsetsize)
+#ifdef TARGET_NR_futex
 safe_syscall6(int,futex,int *,uaddr,int,op,int,val, \
   const struct timespec *,timeout,int *,uaddr2,int,val3)
+#endif
 safe_syscall2(int, rt_sigsuspend, sigset_t *, newset, size_t, sigsetsize)
 safe_syscall2(int, kill, pid_t, pid, int, sig)
 safe_syscall2(int, tkill, int, tid, int, sig)
@@ -776,12 +785,16 @@ safe_syscall6(ssize_t, recvfrom, int, fd, void *, buf, 
size_t, len,
 safe_syscall3(ssize_t, sendmsg, int, fd, const struct msghdr *, msg, int, 
flags)
 safe_syscall3(ssize_t, recvmsg, int, fd, struct msghdr *, msg, int, flags)
 safe_syscall2(int, flock, int, fd, int, operation)
+#ifdef TARGET_NR_rt_sigtimedwait
 safe_syscall4(int, rt_sigtimedwait, const sigset_t *, these, siginfo_t *, 
uinfo,
   const struct timespec *, uts, size_t, sigsetsize)
+#endif
 safe_syscall4(int, accept4, int, fd, struct sockaddr *, addr, socklen_t *, len,
   int, flags)
+#if defined(TARGET_NR_nanosleep)
 safe_syscall2(int, nanosleep, const struct timespec *, req,
   struct timespec *, rem)
+#endif
 #ifdef TARGET_NR_clock_nanosleep
 safe_syscall4(int, clock_nanosleep, const clockid_t, clock, int, flags,
   const struct timespec *, req, struct timespec *, rem)
@@ -802,9 +815,11 @@ safe_syscall5(int, msgrcv, int, msgid, void *, msgp, 
size_t, sz,
 safe_syscall4(int, semtimedop, int, semid, struct sembuf *, tsops,
   unsigned, nsops, const struct timespec *, timeout)
 #endif
-#if defined(TARGET_NR_mq_open) && defined(__NR_mq_open)
+#ifdef TARGET_NR_mq_timedsend
 safe_syscall5(int, mq_timedsend, int, mqdes, const char *, msg_ptr,
   size_t, len, unsigned, prio, const struct timespec *, timeout)
+#endif
+#ifdef TARGET_NR_mq_timedreceive
 safe_syscall5(int, mq_timedreceive, int, mqdes, char *, msg_ptr,
   size_t, len, unsigned *, prio, const struct timespec *, timeout)
 #endif
@@ -946,6 +961,8 @@ abi_long do_brk(abi_ulong new_brk)
 return target_brk;
 }
 
+#if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
+defined(TARGET_NR_pselect6)
 static inline abi_long copy_from_user_fdset(fd_set *fds,
 abi_ulong target_fds_addr,
 int

[Bug 1835865] Re: piix crashes on mips when accessing acpi-pci-hotplug

2020-03-18 Thread Philippe Mathieu-Daudé

Proposed fix:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg06080.html

** Changed in: qemu
   Status: New => In Progress

** Changed in: qemu
 Assignee: (unassigned) => Philippe Mathieu-Daudé (philmd)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1835865

Title:
  piix crashes on mips when accessing acpi-pci-hotplug

Status in QEMU:
  In Progress

Bug description:
  $ qemu-system-mips --version
  QEMU emulator version 4.0.50 (v4.0.0-1975-gf34edbc760)

  $ qemu-system-mips -machine malta -bios /dev/null -nodefaults -monitor stdio 
-S
  (qemu) o 0xaf00 0
  qemu-system-mips: hw/acpi/cpu.c:197: cpu_hotplug_hw_init: Assertion 
`mc->possible_cpu_arch_ids' failed.
  Aborted (core dumped)

  (gdb) bt
  #0  0x7f6fd748957f in raise () at /lib64/libc.so.6
  #1  0x7f6fd7473895 in abort () at /lib64/libc.so.6
  #2  0x7f6fd7473769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
  #3  0x7f6fd7481a26 in .annobin_assert.c_end () at /lib64/libc.so.6
  #4  0x5646d58ca7bd in cpu_hotplug_hw_init (as=0x5646d6ae3300, 
owner=0x5646d6fd5b10, state=0x5646d6fd7a30, base_addr=44800) at 
hw/acpi/cpu.c:197
  #5  0x5646d58c5284 in acpi_switch_to_modern_cphp (gpe_cpu=0x5646d6fd7910, 
cpuhp_state=0x5646d6fd7a30, io_port=44800) at hw/acpi/cpu_hotplug.c:107
  #6  0x5646d58c3431 in piix4_set_cpu_hotplug_legacy (obj=0x5646d6fd5b10, 
value=false, errp=0x5646d61cdb28 ) at hw/acpi/piix4.c:617
  #7  0x5646d5b00c70 in property_set_bool (obj=0x5646d6fd5b10, 
v=0x5646d7697d30, name=0x5646d5cf3a90 "cpu-hotplug-legacy", 
opaque=0x5646d707d110, errp=0x5646d61cdb28 ) at qom/object.c:2076
  #8  0x5646d5afeee6 in object_property_set (obj=0x5646d6fd5b10, 
v=0x5646d7697d30, name=0x5646d5cf3a90 "cpu-hotplug-legacy", errp=0x5646d61cdb28 
) at qom/object.c:1268
  #9  0x5646d5b01fb8 in object_property_set_qobject (obj=0x5646d6fd5b10, 
value=0x5646d75b5450, name=0x5646d5cf3a90 "cpu-hotplug-legacy", 
errp=0x5646d61cdb28 ) at qom/qom-qobject.c:26
  #10 0x5646d5aff1cb in object_property_set_bool (obj=0x5646d6fd5b10, 
value=false, name=0x5646d5cf3a90 "cpu-hotplug-legacy", errp=0x5646d61cdb28 
) at qom/object.c:1334
  #11 0x5646d58c4fce in cpu_status_write (opaque=0x5646d6fd7910, addr=0, 
data=0, size=1) at hw/acpi/cpu_hotplug.c:44
  #12 0x5646d569c707 in memory_region_write_accessor (mr=0x5646d6fd7920, 
addr=0, value=0x7ffc18053068, size=1, shift=0, mask=255, attrs=...) at 
memory.c:503
  #13 0x5646d569c917 in access_with_adjusted_size (addr=0, 
value=0x7ffc18053068, size=1, access_size_min=1, access_size_max=4, 
access_fn=0x5646d569c61e , mr=0x5646d6fd7920, 
attrs=...)
  at memory.c:569
  #14 0x5646d569f8f3 in memory_region_dispatch_write (mr=0x5646d6fd7920, 
addr=0, data=0, size=1, attrs=...) at memory.c:1497
  #15 0x5646d563e5c5 in flatview_write_continue (fv=0x5646d751b000, 
addr=44800, attrs=..., buf=0x7ffc180531d4 "", len=4, addr1=0, l=1, 
mr=0x5646d6fd7920) at exec.c:3324
  #16 0x5646d563e70a in flatview_write (fv=0x5646d751b000, addr=44800, 
attrs=..., buf=0x7ffc180531d4 "", len=4) at exec.c:3363
  #17 0x5646d563ea0f in address_space_write (as=0x5646d618abc0 
, addr=44800, attrs=..., buf=0x7ffc180531d4 "", len=4) at 
exec.c:3453
  #18 0x5646d5696ee5 in cpu_outl (addr=44800, val=0) at ioport.c:80
  #19 0x5646d57585d0 in hmp_ioport_write (mon=0x5646d6bc70e0, 
qdict=0x5646d6cf7140) at monitor/misc.c:1058
  #20 0x5646d5a77b99 in handle_hmp_command (mon=0x5646d6bc70e0, 
cmdline=0x5646d6bc2542 "0xaf00 0") at monitor/hmp.c:1082
  #21 0x5646d5a7540a in monitor_command_cb (opaque=0x5646d6bc70e0, 
cmdline=0x5646d6bc2540 "o 0xaf00 0", readline_opaque=0x0) at monitor/hmp.c:47
  #22 0x5646d5c71450 in readline_handle_byte (rs=0x5646d6bc2540, ch=13) at 
util/readline.c:408
  #23 0x5646d5a7858f in monitor_read (opaque=0x5646d6bc70e0, 
buf=0x7ffc180533d0 "\rtc\327FV", size=1) at monitor/hmp.c:1312
  #24 0x5646d5bc8d17 in qemu_chr_be_write_impl (s=0x5646d6add000, 
buf=0x7ffc180533d0 "\rtc\327FV", len=1) at chardev/char.c:177
  #25 0x5646d5bc8d7b in qemu_chr_be_write (s=0x5646d6add000, 
buf=0x7ffc180533d0 "\rtc\327FV", len=1) at chardev/char.c:189
  #26 0x5646d5bcb6bf in fd_chr_read (chan=0x5646d6a80d60, cond=G_IO_IN, 
opaque=0x5646d6add000) at chardev/char-fd.c:68
  #27 0x5646d5bec485 in qio_channel_fd_source_dispatch 
(source=0x5646d765a480, callback=0x5646d5bcb561 , 
user_data=0x5646d6add000) at io/channel-watch.c:84
  #28 0x7f6fd9c1606d in g_main_context_dispatch () at 
/lib64/libglib-2.0.so.0
  #29 0x5646d5c5323a in glib_pollfds_poll () at util/main-loop.c:213
  #30 0x5646d5c532b4 in os_host_main_loop_wait (timeout=29821719) at 
util/main-loop.c:236
  #31 0x5646d5c533b9 in main_loop_wait (nonblocking=0) at 
util/main-loop.c:512
  #32 0x5646d581d1a1 in main_loop () at vl.c:1791
  #33 0x5646d582485f in main (argc=11, a

Re: [PATCH 5/4] scripts/simplebench: fix python script ! headers

2020-03-18 Thread Philippe Mathieu-Daudé


On 3/18/20 6:02 PM, Vladimir Sementsov-Ogievskiy wrote:

- simplebench.py is not for executing by itself, so drop the header
- in bench_block_job.py fix python to python3

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  scripts/simplebench/bench_block_job.py | 2 +-
  scripts/simplebench/simplebench.py | 2 --
  2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/scripts/simplebench/bench_block_job.py 
b/scripts/simplebench/bench_block_job.py
index 9808d696cf..a0dda1dc4e 100755
--- a/scripts/simplebench/bench_block_job.py
+++ b/scripts/simplebench/bench_block_job.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
  #
  # Benchmark block jobs
  #
diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 59e7314ff6..7e25f3590b 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-#
  # Simple benchmarking framework
  #
  # Copyright (c) 2019 Virtuozzo International GmbH.



I'd rather see this squashed in patches 1 and 2, if not:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3] fdc/i8257: implement verify transfer mode

2020-03-18 Thread John Snow




On 11/1/19 12:55 PM, Sven Schnelle wrote:
> While working on the Tulip driver i tried to write some Teledisk images to
> a floppy image which didn't work. Turned out that Teledisk checks the written
> data by issuing a READ command to the FDC but running the DMA controller
> in VERIFY mode. As we ignored the DMA request in that case, the DMA transfer
> never finished, and Teledisk reported an error.
> 
> The i8257 spec says about verify transfers:
> 
> 3) DMA verify, which does not actually involve the transfer of data. When an
> 8257 channel is in the DMA verify mode, it will respond the same as described
> for transfer operations, except that no memory or I/O read/write control 
> signals
> will be generated.
> 
> Hervé proposed to remove all the dma_mode_ok stuff from fdc to have a more
> clear boundary between DMA and FDC, so this patch also does that.
> 
> Suggested-by: Hervé Poussineau 
> Signed-off-by: Sven Schnelle 

I know it's many many moons later; I was out of the country for a month
when this patch arrived, and I lost track of it under my email backlog.

It looks reviewed and good to go, so I am staging it locally and testing it.

Thanks,
--js

[PATCH v2 4/4] .travis.yml: Add a KVM-only s390x job

2020-03-18 Thread Philippe Mathieu-Daudé

Add a job to build QEMU on s390x with TCG disabled, so
this configuration won't bitrot over time.

This job is quick, running check-unit: Ran for 4 min 48 sec
https://travis-ci.org/github/philmd/qemu/jobs/663659486

Signed-off-by: Philippe Mathieu-Daudé 
---
 .travis.yml | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/.travis.yml b/.travis.yml
index b92798ac3b..0109f02ed4 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -524,6 +524,48 @@ jobs:
   $(exit $BUILD_RC);
   fi
 
+- name: "[s390x] GCC check (KVM)"
+  arch: s390x
+  dist: bionic
+  addons:
+apt_packages:
+  - libaio-dev
+  - libattr1-dev
+  - libbrlapi-dev
+  - libcap-ng-dev
+  - libgcrypt20-dev
+  - libgnutls28-dev
+  - libgtk-3-dev
+  - libiscsi-dev
+  - liblttng-ust-dev
+  - libncurses5-dev
+  - libnfs-dev
+  - libnss3-dev
+  - libpixman-1-dev
+  - libpng-dev
+  - librados-dev
+  - libsdl2-dev
+  - libseccomp-dev
+  - liburcu-dev
+  - libusb-1.0-0-dev
+  - libvdeplug-dev
+  - libvte-2.91-dev
+  # Tests dependencies
+  - genisoimage
+  env:
+- TEST_CMD="make check-unit"
+- CONFIG="--disable-containers --disable-tcg --enable-kvm 
--disable-tools"
+  script:
+- ( cd ${SRC_DIR} ; git submodule update --init roms/SLOF )
+- BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
+- |
+  if [ "$BUILD_RC" -eq 0 ] ; then
+  mv pc-bios/s390-ccw/*.img pc-bios/ ;
+  ${TEST_CMD} ;
+  else
+  $(exit $BUILD_RC);
+  fi
+
 # Release builds
 # The make-release script expect a QEMU version, so our tag must start 
with a 'v'.
 # This is the case when release candidate tags are created.
-- 
2.21.1

[PATCH-for-5.0 v2 3/4] tests/migration: Reduce autoconverge initial bandwidth

2020-03-18 Thread Philippe Mathieu-Daudé

When using max-bandwidth=~100Mb/s, this test fails on Travis-CI
s390x when configured with --disable-tcg:

  $ make check-qtest
TESTcheck-qtest-s390x: tests/qtest/boot-serial-test
  qemu-system-s390x: -accel tcg: invalid accelerator tcg
  qemu-system-s390x: falling back to KVM
TESTcheck-qtest-s390x: tests/qtest/pxe-test
TESTcheck-qtest-s390x: tests/qtest/test-netfilter
TESTcheck-qtest-s390x: tests/qtest/test-filter-mirror
TESTcheck-qtest-s390x: tests/qtest/test-filter-redirector
TESTcheck-qtest-s390x: tests/qtest/drive_del-test
TESTcheck-qtest-s390x: tests/qtest/device-plug-test
TESTcheck-qtest-s390x: tests/qtest/virtio-ccw-test
TESTcheck-qtest-s390x: tests/qtest/cpu-plug-test
TESTcheck-qtest-s390x: tests/qtest/migration-test
  **
  ERROR:tests/qtest/migration-test.c:1229:test_migrate_auto_converge: 
'got_stop' should be FALSE
  ERROR - Bail out! 
ERROR:tests/qtest/migration-test.c:1229:test_migrate_auto_converge: 'got_stop' 
should be FALSE
  make: *** [tests/Makefile.include:633: check-qtest-s390x] Error 1

Per David Gilbert, "it could just be the writing is slow on s390
and the migration thread fast; in which case the autocomplete
wouldn't be needed. Perhaps we just need to reduce the bandwidth
limit."

Tuning the threshold by reducing the initial bandwidth makes the
autoconverge test pass.

Suggested-by: Dr. David Alan Gilbert 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/qtest/migration-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 3d6cc83b88..727a97cf87 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1211,7 +1211,7 @@ static void test_migrate_auto_converge(void)
  * without throttling.
  */
 migrate_set_parameter_int(from, "downtime-limit", 1);
-migrate_set_parameter_int(from, "max-bandwidth", 1); /* ~100Mb/s */
+migrate_set_parameter_int(from, "max-bandwidth", 100); /* ~10Gb/s 
*/
 
 /* To check remaining size after precopy */
 migrate_set_capability(from, "pause-before-switchover", true);
-- 
2.21.1

[PATCH-for-5.0 v2 2/4] tests/test-util-sockets: Skip test on non-x86 Travis containers

2020-03-18 Thread Philippe Mathieu-Daudé

Similarly to commit 4f370b1098, test-util-sockets fails in
restricted non-x86 Travis containers since they apparently
blacklisted some required system calls there.
Let's simply skip the test if we detect such an environment.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/test-util-sockets.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/test-util-sockets.c b/tests/test-util-sockets.c
index 5fd947c7bf..046ebec8ba 100644
--- a/tests/test-util-sockets.c
+++ b/tests/test-util-sockets.c
@@ -231,11 +231,18 @@ static void test_socket_fd_pass_num_nocli(void)
 int main(int argc, char **argv)
 {
 bool has_ipv4, has_ipv6;
+char *travis_arch;
 
 socket_init();
 
 g_test_init(&argc, &argv, NULL);
 
+travis_arch = getenv("TRAVIS_CPU_ARCH");
+if (travis_arch && !g_str_equal(travis_arch, "x86_64")) {
+g_printerr("Test does not work on non-x86 Travis containers.");
+goto end;
+}
+
 /* We're creating actual IPv4/6 sockets, so we should
  * check if the host running tests actually supports
  * each protocol to avoid breaking tests on machines
-- 
2.21.1

[PATCH v2 0/4] travis-ci: Add a KVM-only s390x job

2020-03-18 Thread Philippe Mathieu-Daudé

Add a Travis job to build a KVM-only QEMU on s390x.

This series also contains few fixes for Travis/s390x.

Since v1:
- Do not disable autoconverge on s390x, but reduce the test
  initial bandwidth (dgilbert)
- Added danpb R-b tags

Philippe Mathieu-Daudé (4):
  tests/test-util-filemonitor: Fix Travis-CI $ARCH env variable name
  tests/test-util-sockets: Skip test on non-x86 Travis containers
  tests/migration: Reduce autoconverge initial bandwidth
  .travis.yml: Add a KVM-only s390x job

 tests/qtest/migration-test.c  |  2 +-
 tests/test-util-filemonitor.c |  2 +-
 tests/test-util-sockets.c |  7 ++
 .travis.yml   | 42 +++
 4 files changed, 51 insertions(+), 2 deletions(-)

-- 
2.21.1

[PATCH-for-5.0 v2 1/4] tests/test-util-filemonitor: Fix Travis-CI $ARCH env variable name

2020-03-18 Thread Philippe Mathieu-Daudé

While we can find reference of a 'TRAVIS_ARCH' variable in
the environment and source [1], per the Travis-CI multi-arch
documentation [2] the variable is named TRAVIS_CPU_ARCH.

[1] 
https://github.com/travis-ci/travis-build/blob/v10.0.0/lib/travis/build/bash/travis_setup_env.bash#L39
[2] 
https://docs.travis-ci.com/user/multi-cpu-architectures/#identifying-cpu-architecture-of-build-jobs

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/test-util-filemonitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/test-util-filemonitor.c b/tests/test-util-filemonitor.c
index 45009c69f4..e703a7f8fc 100644
--- a/tests/test-util-filemonitor.c
+++ b/tests/test-util-filemonitor.c
@@ -415,7 +415,7 @@ test_file_monitor_events(void)
  * This test does not work on Travis LXD containers since some
  * syscalls are blocked in that environment.
  */
-travis_arch = getenv("TRAVIS_ARCH");
+travis_arch = getenv("TRAVIS_CPU_ARCH");
 if (travis_arch && !g_str_equal(travis_arch, "x86_64")) {
 g_test_skip("Test does not work on non-x86 Travis containers.");
 return;
-- 
2.21.1

[PATCH-for-5.0] block: Assert BlockDriver::format_name is not NULL

2020-03-18 Thread Philippe Mathieu-Daudé

bdrv_do_find_format() calls strcmp() using BlockDriver::format_name
as argument, which must not be NULL. Assert this field is not null
when we register a block driver in bdrv_register().

Reported-by: Mansour Ahmadi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block.c b/block.c
index a2542c977b..6b984dc883 100644
--- a/block.c
+++ b/block.c
@@ -363,6 +363,7 @@ char *bdrv_get_full_backing_filename(BlockDriverState *bs, 
Error **errp)
 
 void bdrv_register(BlockDriver *bdrv)
 {
+assert(bdrv->format_name);
 QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
 
-- 
2.21.1

[PATCH-for-5.0 2/2] hw/acpi/piix4: Restrict system-hotplug-support to x86 i440fx PC machine

2020-03-18 Thread Philippe Mathieu-Daudé

The PC (i440fx) machine is the only one using the PIIX4 PM
specific system-hotplug-support feature.

Enable this feature in pc_init1(), and let the callers of
piix4_create() get a simple PIIX4 device.
This is the case of the MIPS Malta board.

Doing so we fix a bug on the Malta where a guest writing to
I/O port 0xaf00 crashes QEMU:

  qemu-system-mips: hw/acpi/cpu.c:197: cpu_hotplug_hw_init: Assertion 
`mc->possible_cpu_arch_ids' failed.
  Aborted (core dumped)
  (gdb) bt
  #0 0x7f6fd748957f in raise () at /lib64/libc.so.6
  #1 0x7f6fd7473895 in abort () at /lib64/libc.so.6
  #2 0x7f6fd7473769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
  #3 0x7f6fd7481a26 in .annobin_assert.c_end () at /lib64/libc.so.6
  #4 0x5646d58ca7bd in cpu_hotplug_hw_init (as=0x5646d6ae3300, 
owner=0x5646d6fd5b10, state=0x5646d6fd7a30, base_addr=44800) at 
hw/acpi/cpu.c:197
  #5 0x5646d58c5284 in acpi_switch_to_modern_cphp (gpe_cpu=0x5646d6fd7910, 
cpuhp_state=0x5646d6fd7a30, io_port=44800) at hw/acpi/cpu_hotplug.c:107
  #6 0x5646d58c3431 in piix4_set_cpu_hotplug_legacy (obj=0x5646d6fd5b10, 
value=false, errp=0x5646d61cdb28 ) at hw/acpi/piix4.c:617
  #7 0x5646d5b00c70 in property_set_bool (obj=0x5646d6fd5b10, 
v=0x5646d7697d30, name=0x5646d5cf3a90 "cpu-hotplug-legacy", 
opaque=0x5646d707d110, errp=0x5646d61cdb28 ) at qom/object.c:2076
  #8 0x5646d5afeee6 in object_property_set (obj=0x5646d6fd5b10, 
v=0x5646d7697d30, name=0x5646d5cf3a90 "cpu-hotplug-legacy", errp=0x5646d61cdb28 
) at qom/object.c:1268
  #9 0x5646d5b01fb8 in object_property_set_qobject (obj=0x5646d6fd5b10, 
value=0x5646d75b5450, name=0x5646d5cf3a90 "cpu-hotplug-legacy", 
errp=0x5646d61cdb28 ) at qom/qom-qobject.c:26
  #10 0x5646d5aff1cb in object_property_set_bool (obj=0x5646d6fd5b10, 
value=false, name=0x5646d5cf3a90 "cpu-hotplug-legacy", errp=0x5646d61cdb28 
) at qom/object.c:1334
  #11 0x5646d58c4fce in cpu_status_write (opaque=0x5646d6fd7910, addr=0, 
data=0, size=1) at hw/acpi/cpu_hotplug.c:44
  #12 0x5646d569c707 in memory_region_write_accessor (mr=0x5646d6fd7920, 
addr=0, value=0x7ffc18053068, size=1, shift=0, mask=255, attrs=...) at 
memory.c:503
  #13 0x5646d569c917 in access_with_adjusted_size (addr=0, 
value=0x7ffc18053068, size=1, access_size_min=1, access_size_max=4, 
access_fn=0x5646d569c61e , mr=0x5646d6fd7920, 
attrs=...) at memory.c:569
  #14 0x5646d569f8f3 in memory_region_dispatch_write (mr=0x5646d6fd7920, 
addr=0, data=0, size=1, attrs=...) at memory.c:1497
  #15 0x5646d563e5c5 in flatview_write_continue (fv=0x5646d751b000, 
addr=44800, attrs=..., buf=0x7ffc180531d4 "", len=4, addr1=0, l=1, 
mr=0x5646d6fd7920) at exec.c:3324
  #16 0x5646d563e70a in flatview_write (fv=0x5646d751b000, addr=44800, 
attrs=..., buf=0x7ffc180531d4 "", len=4) at exec.c:3363
  #17 0x5646d563ea0f in address_space_write (as=0x5646d618abc0 
, addr=44800, attrs=..., buf=0x7ffc180531d4 "", len=4) at 
exec.c:3453
  #18 0x5646d5696ee5 in cpu_outl (addr=44800, val=0) at ioport.c:80

Buglink: https://bugs.launchpad.net/qemu/+bug/1835865
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/southbridge/piix.h | 3 ++-
 hw/acpi/piix4.c   | 4 +++-
 hw/i386/pc_piix.c | 1 +
 hw/isa/piix4.c| 2 +-
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/hw/southbridge/piix.h b/include/hw/southbridge/piix.h
index 152628c6d9..3a54409cab 100644
--- a/include/hw/southbridge/piix.h
+++ b/include/hw/southbridge/piix.h
@@ -18,7 +18,8 @@
 
 I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
   qemu_irq sci_irq, qemu_irq smi_irq,
-  int smm_enabled, DeviceState **piix4_pm);
+  int smm_enabled, bool system_hotplug_enabled,
+  DeviceState **piix4_pm);
 
 /* PIRQRC[A:D]: PIRQx Route Control Registers */
 #define PIIX_PIRQCA 0x60
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 9c970336ac..ec4869452b 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -515,7 +515,8 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
 
 I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
   qemu_irq sci_irq, qemu_irq smi_irq,
-  int smm_enabled, DeviceState **piix4_pm)
+  int smm_enabled, bool system_hotplug_enabled,
+  DeviceState **piix4_pm)
 {
 DeviceState *dev;
 PIIX4PMState *s;
@@ -533,6 +534,7 @@ I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t 
smb_io_base,
 if (xen_enabled()) {
 s->use_acpi_pci_hotplug = false;
 }
+s->use_acpi_system_hotplug = system_hotplug_enabled;
 
 qdev_init_nofail(dev);
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index e2d98243bc..8441f44a14 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -283,6 +283,7 @@ else {
 pcms->smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,

[PATCH-for-5.0 0/2] hw/acpi/piix4: Restrict 'system hotplug' feature to i440fx PC machine

2020-03-18 Thread Philippe Mathieu-Daudé

This feature is not documented in the PIIX4 datasheet. It appears
to be only used on the i440fx PC machine. Add a property to the
PIIX4_PM device to restrict its use. This fixes MIPS guest crashing
QEMU when accessing ioport 0xaf00.

Philippe Mathieu-Daudé (2):
  hw/acpi/piix4: Add 'system-hotplug-support' property
  hw/acpi/piix4: Restrict system-hotplug-support to x86 i440fx PC
machine

 include/hw/southbridge/piix.h |  3 ++-
 hw/acpi/piix4.c   | 13 ++---
 hw/i386/pc_piix.c |  1 +
 hw/isa/piix4.c|  2 +-
 4 files changed, 14 insertions(+), 5 deletions(-)

-- 
2.21.1

[PATCH-for-5.0 1/2] hw/acpi/piix4: Add 'system-hotplug-support' property

2020-03-18 Thread Philippe Mathieu-Daudé

The I/O ranges registered by the piix4_acpi_system_hot_add_init()
function are not documented in the PIIX4 datasheet.
This appears to be a PC-only feature added in commit 5e3cb5347e
("initialize hot add system / acpi gpe") which was then moved
to the PIIX4 device model in commit 9d5e77a22f ("make
qemu_system_device_hot_add piix independent")
Add a property (default enabled, to not modify the current
behavior) to allow machines wanting to model a simple PIIX4
to disable this feature.

Signed-off-by: Philippe Mathieu-Daudé 
---
Should I squash this with the next patch and start with
default=false, which is closer to the hardware model?
---
 hw/acpi/piix4.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 964d6f5990..9c970336ac 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
 
 AcpiPciHpState acpi_pci_hotplug;
 bool use_acpi_pci_hotplug;
+bool use_acpi_system_hotplug;
 
 uint8_t disable_s3;
 uint8_t disable_s4;
@@ -503,8 +504,10 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
 s->machine_ready.notify = piix4_pm_machine_ready;
 qemu_add_machine_init_done_notifier(&s->machine_ready);
 
-piix4_acpi_system_hot_add_init(pci_address_space_io(dev),
-   pci_get_bus(dev), s);
+if (s->use_acpi_system_hotplug) {
+piix4_acpi_system_hot_add_init(pci_address_space_io(dev),
+   pci_get_bus(dev), s);
+}
 qbus_set_hotplug_handler(BUS(pci_get_bus(dev)), OBJECT(s), &error_abort);
 
 piix4_pm_add_propeties(s);
@@ -635,6 +638,8 @@ static Property piix4_pm_properties[] = {
  use_acpi_pci_hotplug, true),
 DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
  acpi_memory_hotplug.is_enabled, true),
+DEFINE_PROP_BOOL("system-hotplug-support", PIIX4PMState,
+ use_acpi_system_hotplug, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.21.1

Re: [PULL 00/28 for 5.0] testing and gdbstub updates

2020-03-18 Thread Peter Maydell

On Tue, 17 Mar 2020 at 17:50, Alex Bennée  wrote:
>
> The following changes since commit 6fb1603aa24d9212493e4819d7b685be9c9aad7a:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20200317' into staging (2020-03-17 
> 14:44:50 +)
>
> are available in the Git repository at:
>
>   https://github.com/stsquad/qemu.git tags/pull-testing-and-gdbstub-170320-1
>
> for you to fetch changes up to 3bc2609d478779be600fd66744eb4e3730ec5e33:
>
>   gdbstub: Fix single-step issue by confirming 'vContSupported+' feature to 
> gdb (2020-03-17 17:38:52 +)
>
> 
> Testing and gdbstub updates:
>
>   - docker updates for VirGL
>   - re-factor gdbstub for static GDBState
>   - re-factor gdbstub for dynamic arrays
>   - add SVE support to arm gdbstub
>   - add some guest debug tests to check-tcg
>   - add aarch64 userspace register tests
>   - remove packet size limit to gdbstub
>   - simplify gdbstub monitor code
>   - report vContSupported in gdbstub to use proper single-step
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM

Re: [EXTERNAL] [PATCH 2/2] target/ppc: Fix ISA v3.0 (POWER9) slbia implementation

2020-03-18 Thread Benjamin Herrenschmidt

On Wed, 2020-03-18 at 18:08 +0100, Cédric Le Goater wrote:
> On 3/18/20 5:41 AM, Nicholas Piggin wrote:
> > Linux using the hash MMU ("disable_radix" command line) on a POWER9
> > machine quickly hits translation bugs due to using v3.0 slbia
> > features that are not implemented in TCG. Add them.
> 
> I checked the ISA books and this looks OK but you are also modifying
> slbie.

For the same reason, I believe slbie needs to invalidate caches even if
the entry isn't present.

The kernel will under some circumstances overwrite SLB entries without
invalidating (because the translation itself isn't invalid, it's just
that the SLB is full, so anything cached in the ERAT is still
technically ok).

However, when those things get really invalidated, they need to be
taken out, even if they no longer have a corresponding SLB entry.

Cheers,
Ben.

> Thanks,
> 
> C. 
> 
> 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  target/ppc/helper.h |  2 +-
> >  target/ppc/mmu-hash64.c | 57 -
> > 
> >  target/ppc/translate.c  |  5 +++-
> >  3 files changed, 55 insertions(+), 9 deletions(-)
> > 
> > diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> > index ee1498050d..2dfa1c6942 100644
> > --- a/target/ppc/helper.h
> > +++ b/target/ppc/helper.h
> > @@ -615,7 +615,7 @@ DEF_HELPER_FLAGS_3(store_slb, TCG_CALL_NO_RWG,
> > void, env, tl, tl)
> >  DEF_HELPER_2(load_slb_esid, tl, env, tl)
> >  DEF_HELPER_2(load_slb_vsid, tl, env, tl)
> >  DEF_HELPER_2(find_slb_vsid, tl, env, tl)
> > -DEF_HELPER_FLAGS_1(slbia, TCG_CALL_NO_RWG, void, env)
> > +DEF_HELPER_FLAGS_2(slbia, TCG_CALL_NO_RWG, void, env, i32)
> >  DEF_HELPER_FLAGS_2(slbie, TCG_CALL_NO_RWG, void, env, tl)
> >  DEF_HELPER_FLAGS_2(slbieg, TCG_CALL_NO_RWG, void, env, tl)
> >  #endif
> > diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> > index 373d44de74..deb1c13a66 100644
> > --- a/target/ppc/mmu-hash64.c
> > +++ b/target/ppc/mmu-hash64.c
> > @@ -95,9 +95,10 @@ void dump_slb(PowerPCCPU *cpu)
> >  }
> >  }
> > 
> > -void helper_slbia(CPUPPCState *env)
> > +void helper_slbia(CPUPPCState *env, uint32_t ih)
> >  {
> >  PowerPCCPU *cpu = env_archcpu(env);
> > +int starting_entry;
> >  int n;
> > 
> >  /*
> > @@ -111,18 +112,59 @@ void helper_slbia(CPUPPCState *env)
> >   * expected that slbmte is more common than slbia, and slbia
> > is usually
> >   * going to evict valid SLB entries, so that tradeoff is
> > unlikely to be a
> >   * good one.
> > + *
> > + * ISA v2.05 introduced IH field with values 0,1,2,6. These
> > all invalidate
> > + * the same SLB entries (everything but entry 0), but differ
> > in what
> > + * "lookaside information" is invalidated. TCG can ignore this
> > and flush
> > + * everything.
> > + *
> > + * ISA v3.0 introduced additional values 3,4,7, which change
> > what SLBs are
> > + * invalidated.
> >   */
> > 
> > -/* XXX: Warning: slbia never invalidates the first segment */
> > -for (n = 1; n < cpu->hash64_opts->slb_size; n++) {
> > -ppc_slb_t *slb = &env->slb[n];
> > +env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> > +
> > +starting_entry = 1; /* default for IH=0,1,2,6 */
> > +
> > +if (env->mmu_model == POWERPC_MMU_3_00) {
> > +switch (ih) {
> > +case 0x7:
> > +/* invalidate no SLBs, but all lookaside information
> > */
> > +return;
> > 
> > -if (slb->esid & SLB_ESID_V) {
> > -slb->esid &= ~SLB_ESID_V;
> > +case 0x3:
> > +case 0x4:
> > +/* also considers SLB entry 0 */
> > +starting_entry = 0;
> > +break;
> > +
> > +case 0x5:
> > +/* treat undefined values as ih==0, and warn */
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "slbia undefined IH field %u.\n", ih);
> > +break;
> > +
> > +default:
> > +/* 0,1,2,6 */
> > +break;
> >  }
> >  }
> > 
> > -env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> > +for (n = starting_entry; n < cpu->hash64_opts->slb_size; n++)
> > {
> > +ppc_slb_t *slb = &env->slb[n];
> > +
> > +if (!(slb->esid & SLB_ESID_V)) {
> > +continue;
> > +}
> > +if (env->mmu_model == POWERPC_MMU_3_00) {
> > +if (ih == 0x3 && (slb->vsid & SLB_VSID_C) == 0) {
> > +/* preserves entries with a class value of 0 */
> > +continue;
> > +}
> > +}
> > +
> > +slb->esid &= ~SLB_ESID_V;
> > +}
> >  }
> > 
> >  static void __helper_slbie(CPUPPCState *env, target_ulong addr,
> > @@ -136,6 +178,7 @@ static void __helper_slbie(CPUPPCState *env,
> > target_ulong addr,
> >  return;
> >  }
> > 
> > +env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> >  if (slb->esid & SLB_ESID_V) {
> >  slb->esid &= ~SLB_ESID_V;
> > 
> > diff --git a/target/ppc/tra

Re: [PULL v2 00/37] Linux user for 5.0 patches

2020-03-18 Thread Richard Henderson

On 3/18/20 1:23 PM, Laurent Vivier wrote:
> Le 18/03/2020 à 21:17, Richard Henderson a écrit :
>> On 3/18/20 12:58 PM, Laurent Vivier wrote:
 However, from the error message above, it's clear that cpu_loop.o has not 
 been
 rebuilt properly.

>>>
>>> In the series merged here syscall_nr.h are moved from source directory
>>> to build directory.
>>>
>>> The include path of the files is based on the dependecy files (*.d), and
>>> to force the update of this path PATCH 13 removes all the .d files that
>>> have a dependecy on the syscall_nr.h file in the source path.
>>>
>>> This is added in configure:
>>>
>>> --- a/configure
>>> +++ b/configure
>>> @@ -1887,6 +1887,17 @@ fi
>>>  # Remove old dependency files to make sure that they get properly
>>> regenerated
>>>  rm -f */config-devices.mak.d
>>>
>>> +# Remove syscall_nr.h to be sure they will be regenerated in the build
>>> +# directory, not in the source directory
>>> +for arch in ; do
>>> +# remove the file if it has been generated in the source directory
>>> +rm -f "${source_path}/linux-user/${arch}/syscall_nr.h"
>>> +# remove the dependency files
>>> +find . -name "*.d" \
>>> +   -exec grep -q
>>> "${source_path}/linux-user/${arch}/syscall_nr.h" {} \; \
>>> +   -exec rm {} \;
>>> +done
>> ...
>>> Perhaps it removes a dependency that should trigger the rebuild of
>>> cpu_loop.o?
>>
>> Ah, yes indeed. It removes *all* dependencies for cpu_loop.o, so unless we
>> touch the cpu_loop.c source file, nothing gets done.
>>
>> I think you're trying to be too fine grained here, since the *.o file has to 
>> go
>> away with the *.d file.  Why not just
>>
>>   make ${arch}-linux-user/clean
>>
>> ?
> 
> The idea was to be able to bisect the series as the syscall_nr.h were
> added incrementally without rebuilding all the files.
> 
> If I remove the loop in the configure where to add the "make
> ${arch}-linux-user/clean"?

I don't know.  Can you get an exit status out of the find?

Another option might be

for f in $(find ${arch}-linux-user -name '*.d' \
   -exec grep -q ${arch_syscall} \
   -print); do
  rm -f $(basename $f .d).*
done

But frankly I don't care if all of every file gets rebuilt while bisecting, it
just needs to work.


r~

[PULL v2 10/11] nbd/server: use bdrv_dirty_bitmap_next_dirty_area

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Use bdrv_dirty_bitmap_next_dirty_area for bitmap_to_extents. Since
bdrv_dirty_bitmap_next_dirty_area is very accurate in its interface,
we'll never exceed requested region with last chunk. So, we don't need
dont_fragment, and bitmap_to_extents() interface becomes clean enough
to not require any comment.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Message-id: 20200205112041.6003-10-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 nbd/server.c | 59 +---
 1 file changed, 19 insertions(+), 40 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index f90bb33a75..02b1ed0801 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2068,57 +2068,36 @@ static int nbd_co_send_block_status(NBDClient *client, 
uint64_t handle,
 return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
 }
 
-/*
- * Populate @ea from a dirty bitmap. Unless @dont_fragment, the
- * final extent may exceed the original @length.
- */
+/* Populate @ea from a dirty bitmap. */
 static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
   uint64_t offset, uint64_t length,
-  NBDExtentArray *ea, bool dont_fragment)
+  NBDExtentArray *es)
 {
-uint64_t begin = offset, end = offset;
-uint64_t overall_end = offset + length;
-BdrvDirtyBitmapIter *it;
-bool dirty;
+int64_t start, dirty_start, dirty_count;
+int64_t end = offset + length;
+bool full = false;
 
 bdrv_dirty_bitmap_lock(bitmap);
 
-it = bdrv_dirty_iter_new(bitmap);
-dirty = bdrv_dirty_bitmap_get_locked(bitmap, offset);
-
-while (begin < overall_end) {
-bool next_dirty = !dirty;
-
-if (dirty) {
-end = bdrv_dirty_bitmap_next_zero(bitmap, begin, INT64_MAX);
-} else {
-bdrv_set_dirty_iter(it, begin);
-end = bdrv_dirty_iter_next(it);
-}
-if (end == -1 || end - begin > UINT32_MAX) {
-/* Cap to an aligned value < 4G beyond begin. */
-end = MIN(bdrv_dirty_bitmap_size(bitmap),
-  begin + UINT32_MAX + 1 -
-  bdrv_dirty_bitmap_granularity(bitmap));
-next_dirty = dirty;
-}
-if (dont_fragment && end > overall_end) {
-end = overall_end;
-}
-
-if (nbd_extent_array_add(ea, end - begin,
- dirty ? NBD_STATE_DIRTY : 0) < 0) {
+for (start = offset;
+ bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX,
+   &dirty_start, &dirty_count);
+ start = dirty_start + dirty_count)
+{
+if ((nbd_extent_array_add(es, dirty_start - start, 0) < 0) ||
+(nbd_extent_array_add(es, dirty_count, NBD_STATE_DIRTY) < 0))
+{
+full = true;
 break;
 }
-begin = end;
-dirty = next_dirty;
 }
 
-bdrv_dirty_iter_free(it);
+if (!full) {
+/* last non dirty extent */
+nbd_extent_array_add(es, end - start, 0);
+}
 
 bdrv_dirty_bitmap_unlock(bitmap);
-
-assert(offset < end);
 }
 
 static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
@@ -2129,7 +2108,7 @@ static int nbd_co_send_bitmap(NBDClient *client, uint64_t 
handle,
 unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
 g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
 
-bitmap_to_extents(bitmap, offset, length, ea, dont_fragment);
+bitmap_to_extents(bitmap, offset, length, ea);
 
 return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
 }
-- 
2.21.1

[PULL v2 09/11] nbd/server: introduce NBDExtentArray

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Introduce NBDExtentArray class, to handle extents list creation in more
controlled way and with fewer OUT parameters in functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Message-id: 20200205112041.6003-9-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 nbd/server.c | 210 +--
 1 file changed, 118 insertions(+), 92 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 3106aaf3b4..f90bb33a75 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1909,27 +1909,98 @@ static int coroutine_fn 
nbd_co_send_sparse_read(NBDClient *client,
 return ret;
 }
 
+typedef struct NBDExtentArray {
+NBDExtent *extents;
+unsigned int nb_alloc;
+unsigned int count;
+uint64_t total_length;
+bool can_add;
+bool converted_to_be;
+} NBDExtentArray;
+
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+{
+NBDExtentArray *ea = g_new0(NBDExtentArray, 1);
+
+ea->nb_alloc = nb_alloc;
+ea->extents = g_new(NBDExtent, nb_alloc);
+ea->can_add = true;
+
+return ea;
+}
+
+static void nbd_extent_array_free(NBDExtentArray *ea)
+{
+g_free(ea->extents);
+g_free(ea);
+}
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free);
+
+/* Further modifications of the array after conversion are abandoned */
+static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
+{
+int i;
+
+assert(!ea->converted_to_be);
+ea->can_add = false;
+ea->converted_to_be = true;
+
+for (i = 0; i < ea->count; i++) {
+ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+}
+}
+
 /*
- * Populate @extents from block status. Update @bytes to be the actual
- * length encoded (which may be smaller than the original), and update
- * @nb_extents to the number of extents used.
- *
- * Returns zero on success and -errno on bdrv_block_status_above failure.
+ * Add extent to NBDExtentArray. If extent can't be added (no available space),
+ * return -1.
+ * For safety, when returning -1 for the first time, .can_add is set to false,
+ * further call to nbd_extent_array_add() will crash.
+ * (to avoid the situation, when after failing to add an extent (returned -1),
+ * user miss this failure and add another extent, which is successfully added
+ * (array is full, but new extent may be squashed into the last one), then we
+ * have invalid array with skipped extent)
  */
+static int nbd_extent_array_add(NBDExtentArray *ea,
+uint32_t length, uint32_t flags)
+{
+assert(ea->can_add);
+
+if (!length) {
+return 0;
+}
+
+/* Extend previous extent if flags are the same */
+if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
+uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+
+if (sum <= UINT32_MAX) {
+ea->extents[ea->count - 1].length = sum;
+ea->total_length += length;
+return 0;
+}
+}
+
+if (ea->count >= ea->nb_alloc) {
+ea->can_add = false;
+return -1;
+}
+
+ea->total_length += length;
+ea->extents[ea->count] = (NBDExtent) {.length = length, .flags = flags};
+ea->count++;
+
+return 0;
+}
+
 static int blockstatus_to_extents(BlockDriverState *bs, uint64_t offset,
-  uint64_t *bytes, NBDExtent *extents,
-  unsigned int *nb_extents)
+  uint64_t bytes, NBDExtentArray *ea)
 {
-uint64_t remaining_bytes = *bytes;
-NBDExtent *extent = extents, *extents_end = extents + *nb_extents;
-bool first_extent = true;
-
-assert(*nb_extents);
-while (remaining_bytes) {
+while (bytes) {
 uint32_t flags;
 int64_t num;
-int ret = bdrv_block_status_above(bs, NULL, offset, remaining_bytes,
-  &num, NULL, NULL);
+int ret = bdrv_block_status_above(bs, NULL, offset, bytes, &num,
+  NULL, NULL);
 
 if (ret < 0) {
 return ret;
@@ -1938,60 +2009,37 @@ static int blockstatus_to_extents(BlockDriverState *bs, 
uint64_t offset,
 flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
 (ret & BDRV_BLOCK_ZERO  ? NBD_STATE_ZERO : 0);
 
-if (first_extent) {
-extent->flags = flags;
-extent->length = num;
-first_extent = false;
-} else if (flags == extent->flags) {
-/* extend current extent */
-extent->length += num;
-} else {
-if (extent + 1 == extents_end) {
-break;
-}
-
-/* start new extent */
-extent++;
-extent->flags = flags;
-extent->length = num;
+if (nbd_extent_arr

[PULL v2 11/11] block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

store_bitmap_data() loop does bdrv_set_dirty_iter() on each iteration,
which means that we actually don't need iterator itself and we can use
simpler bitmap API.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-11-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 block/qcow2-bitmap.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 82c9f3..cb06954b4a 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1288,7 +1288,6 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
 uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
 const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
 uint8_t *buf = NULL;
-BdrvDirtyBitmapIter *dbi;
 uint64_t *tb;
 uint64_t tb_size =
 size_to_clusters(s,
@@ -1307,12 +1306,14 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
 return NULL;
 }
 
-dbi = bdrv_dirty_iter_new(bitmap);
 buf = g_malloc(s->cluster_size);
 limit = bytes_covered_by_bitmap_cluster(s, bitmap);
 assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
 
-while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
+offset = 0;
+while ((offset = bdrv_dirty_bitmap_next_dirty(bitmap, offset, INT64_MAX))
+   >= 0)
+{
 uint64_t cluster = offset / limit;
 uint64_t end, write_size;
 int64_t off;
@@ -1355,23 +1356,17 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
 goto fail;
 }
 
-if (end >= bm_size) {
-break;
-}
-
-bdrv_set_dirty_iter(dbi, end);
+offset = end;
 }
 
 *bitmap_table_size = tb_size;
 g_free(buf);
-bdrv_dirty_iter_free(dbi);
 
 return tb;
 
 fail:
 clear_bitmap_table(bs, tb, tb_size);
 g_free(buf);
-bdrv_dirty_iter_free(dbi);
 g_free(tb);
 
 return NULL;
-- 
2.21.1

[PULL v2 08/11] block/dirty-bitmap: improve _next_dirty_area API

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Firstly, _next_dirty_area is for scenarios when we may contiguously
search for next dirty area inside some limited region, so it is more
comfortable to specify "end" which should not be recalculated on each
iteration.

Secondly, let's add a possibility to limit resulting area size, not
limiting searching area. This will be used in NBD code in further
commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-8-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/block/dirty-bitmap.h |  3 ++-
 include/qemu/hbitmap.h   | 23 ++
 block/dirty-bitmap.c |  6 +++--
 tests/test-hbitmap.c | 45 +++-
 util/hbitmap.c   | 44 +--
 5 files changed, 75 insertions(+), 46 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index b1f0de12db..8a10029418 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -110,7 +110,8 @@ int64_t bdrv_dirty_bitmap_next_dirty(BdrvDirtyBitmap 
*bitmap, int64_t offset,
 int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 int64_t bytes);
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
-   int64_t *offset, int64_t *bytes);
+int64_t start, int64_t end, int64_t max_dirty_count,
+int64_t *dirty_start, int64_t *dirty_count);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 6e9ae51ed3..5e71b6d6f7 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -324,18 +324,21 @@ int64_t hbitmap_next_zero(const HBitmap *hb, int64_t 
start, int64_t count);
 
 /* hbitmap_next_dirty_area:
  * @hb: The HBitmap to operate on
- * @start: in-out parameter.
- * in: the offset to start from
- * out: (if area found) start of found area
- * @count: in-out parameter.
- * in: length of requested region
- * out: length of found area
+ * @start: the offset to start from
+ * @end: end of requested area
+ * @max_dirty_count: limit for out parameter dirty_count
+ * @dirty_start: on success: start of found area
+ * @dirty_count: on success: length of found area
  *
- * If dirty area found within [@start, @start + @count), returns true and sets
- * @offset and @bytes appropriately. Otherwise returns false and leaves @offset
- * and @bytes unchanged.
+ * If dirty area found within [@start, @end), returns true and sets
+ * @dirty_start and @dirty_count appropriately. @dirty_count will not exceed
+ * @max_dirty_count.
+ * If dirty area was not found, returns false and leaves @dirty_start and
+ * @dirty_count unchanged.
  */
-bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t *start, int64_t 
*count);
+bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
+ int64_t max_dirty_count,
+ int64_t *dirty_start, int64_t *dirty_count);
 
 /**
  * hbitmap_iter_next:
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 1b14c8eb26..063793e316 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -873,9 +873,11 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap 
*bitmap, int64_t offset,
 }
 
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
-   int64_t *offset, int64_t *bytes)
+int64_t start, int64_t end, int64_t max_dirty_count,
+int64_t *dirty_start, int64_t *dirty_count)
 {
-return hbitmap_next_dirty_area(bitmap->bitmap, offset, bytes);
+return hbitmap_next_dirty_area(bitmap->bitmap, start, end, max_dirty_count,
+   dirty_start, dirty_count);
 }
 
 /**
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 8905b8a351..b6726cf76b 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -920,18 +920,19 @@ static void 
test_hbitmap_next_x_after_truncate(TestHBitmapData *data,
 test_hbitmap_next_x_check(data, 0);
 }
 
-static void test_hbitmap_next_dirty_area_check(TestHBitmapData *data,
-   int64_t offset,
-   int64_t count)
+static void test_hbitmap_next_dirty_area_check_limited(TestHBitmapData *data,
+   int64_t offset,
+   int64_t count,
+   int64_t max_dirty)
 {
 int64_t off1, off2;
 int64_t len1 = 0, len2;
 bool ret1, ret2;
 int64_t end;
 
-off1 = offset;
-len1 = count;
-

Re: [PATCH v10 01/16] s390x: Move diagnose 308 subcodes and rcs into ipl.h

2020-03-18 Thread Christian Borntraeger




On 18.03.20 15:30, Janosch Frank wrote:
> They are part of the IPL process, so let's put them into the ipl
> header.
> 
> Signed-off-by: Janosch Frank 

Reviewed-by: Christian Borntraeger 
> ---
>  hw/s390x/ipl.h  | 11 +++
>  target/s390x/diag.c | 11 ---
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
> index 3e44abe1c651d8a0..a5665e6bfde2e8cf 100644
> --- a/hw/s390x/ipl.h
> +++ b/hw/s390x/ipl.h
> @@ -159,6 +159,17 @@ struct S390IPLState {
>  typedef struct S390IPLState S390IPLState;
>  QEMU_BUILD_BUG_MSG(offsetof(S390IPLState, iplb) & 3, "alignment of iplb 
> wrong");
>  
> +#define DIAG_308_RC_OK  0x0001
> +#define DIAG_308_RC_NO_CONF 0x0102
> +#define DIAG_308_RC_INVALID 0x0402
> +
> +#define DIAG308_RESET_MOD_CLR   0
> +#define DIAG308_RESET_LOAD_NORM 1
> +#define DIAG308_LOAD_CLEAR  3
> +#define DIAG308_LOAD_NORMAL_DUMP4
> +#define DIAG308_SET 5
> +#define DIAG308_STORE   6
> +
>  #define S390_IPL_TYPE_FCP 0x00
>  #define S390_IPL_TYPE_CCW 0x02
>  #define S390_IPL_TYPE_QEMU_SCSI 0xff
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index 54e5670b3fd6d960..8aba6341f94848e1 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -49,17 +49,6 @@ int handle_diag_288(CPUS390XState *env, uint64_t r1, 
> uint64_t r3)
>  return diag288_class->handle_timer(diag288, func, timeout);
>  }
>  
> -#define DIAG_308_RC_OK  0x0001
> -#define DIAG_308_RC_NO_CONF 0x0102
> -#define DIAG_308_RC_INVALID 0x0402
> -
> -#define DIAG308_RESET_MOD_CLR   0
> -#define DIAG308_RESET_LOAD_NORM 1
> -#define DIAG308_LOAD_CLEAR  3
> -#define DIAG308_LOAD_NORMAL_DUMP4
> -#define DIAG308_SET 5
> -#define DIAG308_STORE   6
> -
>  static int diag308_parm_check(CPUS390XState *env, uint64_t r1, uint64_t addr,
>uintptr_t ra, bool write)
>  {
>

[PULL v2 04/11] hbitmap: unpublish hbitmap_iter_skip_words

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Function is internal and even commented as internal. Drop its
definition from .h file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-4-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/qemu/hbitmap.h | 7 ---
 util/hbitmap.c | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index ab227b117f..15837a0e2d 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -297,13 +297,6 @@ void hbitmap_free(HBitmap *hb);
  */
 void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
 
-/* hbitmap_iter_skip_words:
- * @hbi: HBitmapIter to operate on.
- *
- * Internal function used by hbitmap_iter_next and hbitmap_iter_next_word.
- */
-unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
-
 /* hbitmap_next_zero:
  *
  * Find next not dirty bit within selected range. If not found, return -1.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index a368dc5ef7..26145d4b9e 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -104,7 +104,7 @@ struct HBitmap {
 /* Advance hbi to the next nonzero word and return it.  hbi->pos
  * is updated.  Returns zero if we reach the end of the bitmap.
  */
-unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
+static unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
 {
 size_t pos = hbi->pos;
 const HBitmap *hb = hbi->hb;
-- 
2.21.1

[PULL v2 06/11] block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.

Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.

So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-6-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/block/dirty-bitmap.h |  6 +++---
 include/qemu/hbitmap.h   |  7 +++
 block/dirty-bitmap.c |  6 +++---
 nbd/server.c |  2 +-
 tests/test-hbitmap.c | 36 ++--
 util/hbitmap.c   | 13 -
 6 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index e2b20ecab9..27c72cc56a 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -105,10 +105,10 @@ for (bitmap = bdrv_dirty_bitmap_first(bs); bitmap; \
  bitmap = bdrv_dirty_bitmap_next(bitmap))
 
 char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
-int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset,
-uint64_t bytes);
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
+int64_t bytes);
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
-   uint64_t *offset, uint64_t *bytes);
+   int64_t *offset, int64_t *bytes);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index df922d8517..b6e85f3d5d 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -304,10 +304,10 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap 
*hb, uint64_t first);
  * @hb: The HBitmap to operate on
  * @start: The bit to start from.
  * @count: Number of bits to proceed. If @start+@count > bitmap size, the whole
- * bitmap is looked through. You can use UINT64_MAX as @count to search up to
+ * bitmap is looked through. You can use INT64_MAX as @count to search up to
  * the bitmap end.
  */
-int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, uint64_t count);
+int64_t hbitmap_next_zero(const HBitmap *hb, int64_t start, int64_t count);
 
 /* hbitmap_next_dirty_area:
  * @hb: The HBitmap to operate on
@@ -322,8 +322,7 @@ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t 
start, uint64_t count);
  * @offset and @bytes appropriately. Otherwise returns false and leaves @offset
  * and @bytes unchanged.
  */
-bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *start,
- uint64_t *count);
+bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t *start, int64_t 
*count);
 
 /**
  * hbitmap_iter_next:
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 7039e82520..af9f5411a6 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -860,14 +860,14 @@ char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap 
*bitmap, Error **errp)
 return hbitmap_sha256(bitmap->bitmap, errp);
 }
 
-int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset,
-uint64_t bytes)
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
+int64_t bytes)
 {
 return hbitmap_next_zero(bitmap->bitmap, offset, bytes);
 }
 
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
-   uint64_t *offset, uint64_t *bytes)
+   int64_t *offset, int64_t *bytes)
 {
 return hbitmap_next_dirty_area(bitmap->bitmap, offset, bytes);
 }
diff --git a/nbd/server.c b/nbd/server.c
index 11a31094ff..3106aaf3b4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2055,7 +2055,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 bool next_dirty = !dirty;
 
 if (dirty) {
-end = bdrv_dirty_bitmap_next_zero(bitmap, begin, UINT64_MAX);
+end = bdrv_dirty_bitmap_next_zero(bitmap, begin, INT64_MAX);
 } else {
 bdrv_set_dirty_iter(it, begin);
 end = bdrv_dirty_iter_next(it);
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index aeaa0b3f22..9d210dc18c 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -817,8 +817,8 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 }
 
 static void test_hbitmap_next_zero_check_range(Test

[PULL v2 01/11] build: Silence clang warning on older glib autoptr usage

2020-03-18 Thread John Snow

From: Eric Blake 

glib's G_DEFINE_AUTOPTR_CLEANUP_FUNC() macro defines several static
inline functions, often with some of them unused, but prior to 2.57.2
did not mark the functions as such.  As a result, clang (but not gcc)
fails to build with older glib unless -Wno-unused-function is enabled.

Reported-by: Peter Maydell 
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Message-id: 20200317175534.196295-1-ebl...@redhat.com
Signed-off-by: John Snow 
---
 configure | 20 
 1 file changed, 20 insertions(+)

diff --git a/configure b/configure
index 06fcd070fb..479336bf6e 100755
--- a/configure
+++ b/configure
@@ -3851,6 +3851,26 @@ if ! compile_prog "$glib_cflags -Werror" "$glib_libs" ; 
then
 fi
 fi
 
+# Silence clang warnings triggered by glib < 2.57.2
+cat > $TMPC << EOF
+#include 
+typedef struct Foo {
+int i;
+} Foo;
+static void foo_free(Foo *f)
+{
+g_free(f);
+}
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(Foo, foo_free);
+int main(void) { return 0; }
+EOF
+if ! compile_prog "$glib_cflags -Werror" "$glib_libs" ; then
+if cc_has_warning_flag "-Wno-unused-function"; then
+glib_cflags="$glib_cflags -Wno-unused-function"
+CFLAGS="$CFLAGS -Wno-unused-function"
+fi
+fi
+
 #
 # zlib check
 
-- 
2.21.1

[PULL v2 05/11] hbitmap: drop meta bitmaps as they are unused

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-5-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/qemu/hbitmap.h |  21 
 tests/test-hbitmap.c   | 115 -
 util/hbitmap.c |  16 --
 3 files changed, 152 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 15837a0e2d..df922d8517 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -325,27 +325,6 @@ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t 
start, uint64_t count);
 bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *start,
  uint64_t *count);
 
-/* hbitmap_create_meta:
- * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
- * The caller owns the created bitmap and must call hbitmap_free_meta(hb) to
- * free it.
- *
- * Currently, we only guarantee that if a bit in the hbitmap is changed it
- * will be reflected in the meta bitmap, but we do not yet guarantee the
- * opposite.
- *
- * @hb: The HBitmap to operate on.
- * @chunk_size: How many bits in @hb does one bit in the meta track.
- */
-HBitmap *hbitmap_create_meta(HBitmap *hb, int chunk_size);
-
-/* hbitmap_free_meta:
- * Free the meta bitmap of @hb.
- *
- * @hb: The HBitmap whose meta bitmap should be freed.
- */
-void hbitmap_free_meta(HBitmap *hb);
-
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index e1f867085f..aeaa0b3f22 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -22,7 +22,6 @@
 
 typedef struct TestHBitmapData {
 HBitmap   *hb;
-HBitmap   *meta;
 unsigned long *bits;
 size_t size;
 size_t old_size;
@@ -94,14 +93,6 @@ static void hbitmap_test_init(TestHBitmapData *data,
 }
 }
 
-static void hbitmap_test_init_meta(TestHBitmapData *data,
-   uint64_t size, int granularity,
-   int meta_chunk)
-{
-hbitmap_test_init(data, size, granularity);
-data->meta = hbitmap_create_meta(data->hb, meta_chunk);
-}
-
 static inline size_t hbitmap_test_array_size(size_t bits)
 {
 size_t n = DIV_ROUND_UP(bits, BITS_PER_LONG);
@@ -144,9 +135,6 @@ static void hbitmap_test_teardown(TestHBitmapData *data,
   const void *unused)
 {
 if (data->hb) {
-if (data->meta) {
-hbitmap_free_meta(data->hb);
-}
 hbitmap_free(data->hb);
 data->hb = NULL;
 }
@@ -648,96 +636,6 @@ static void 
test_hbitmap_truncate_shrink_large(TestHBitmapData *data,
 hbitmap_test_truncate(data, size, -diff, 0);
 }
 
-static void hbitmap_check_meta(TestHBitmapData *data,
-   int64_t start, int count)
-{
-int64_t i;
-
-for (i = 0; i < data->size; i++) {
-if (i >= start && i < start + count) {
-g_assert(hbitmap_get(data->meta, i));
-} else {
-g_assert(!hbitmap_get(data->meta, i));
-}
-}
-}
-
-static void hbitmap_test_meta(TestHBitmapData *data,
-  int64_t start, int count,
-  int64_t check_start, int check_count)
-{
-hbitmap_reset_all(data->hb);
-hbitmap_reset_all(data->meta);
-
-/* Test "unset" -> "unset" will not update meta. */
-hbitmap_reset(data->hb, start, count);
-hbitmap_check_meta(data, 0, 0);
-
-/* Test "unset" -> "set" will update meta */
-hbitmap_set(data->hb, start, count);
-hbitmap_check_meta(data, check_start, check_count);
-
-/* Test "set" -> "set" will not update meta */
-hbitmap_reset_all(data->meta);
-hbitmap_set(data->hb, start, count);
-hbitmap_check_meta(data, 0, 0);
-
-/* Test "set" -> "unset" will update meta */
-hbitmap_reset_all(data->meta);
-hbitmap_reset(data->hb, start, count);
-hbitmap_check_meta(data, check_start, check_count);
-}
-
-static void hbitmap_test_meta_do(TestHBitmapData *data, int chunk_size)
-{
-uint64_t size = chunk_size * 100;
-hbitmap_test_init_meta(data, size, 0, chunk_size);
-
-hbitmap_test_meta(data, 0, 1, 0, chunk_size);
-hbitmap_test_meta(data, 0, chunk_size, 0, chunk_size);
-hbitmap_test_meta(data, chunk_size - 1, 1, 0, chunk_size);
-hbitmap_test_meta(data, chunk_size - 1, 2, 0, chunk_size * 2);
-hbitmap_test_meta(data, chunk_size - 1, chunk_size + 1, 0, chunk_size * 2);
-hbitmap_test_meta(data, chunk_size - 1, chunk_size + 2, 0, chunk_size * 3);
-hbitmap_test_meta(data, 7 * chunk_size - 1, chunk_size + 2,
-  6 * chunk_size, chunk_size * 3);
-hbitmap_test_meta(data, size - 1, 1, size - chunk_size, chunk_size);
-hbitmap_test_meta(data, 0, size, 0, size);
-}
-
-static void test_hbitmap_meta_byte(TestHBitmapData *data, const void *unused)
-{
-

[PULL v2 02/11] hbitmap: assert that we don't create bitmap larger than INT64_MAX

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

We have APIs which returns signed int64_t, to be able to return error.
Therefore we can't handle bitmaps with absolute size larger than
(INT64_MAX+1). Still, keep maximum to be INT64_MAX which is a bit
safer.

Note, that bitmaps are used to represent disk images, which can't
exceed INT64_MAX anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-2-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 util/hbitmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/util/hbitmap.c b/util/hbitmap.c
index 242c6e519c..7f9b3e0cd7 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -716,6 +716,7 @@ HBitmap *hbitmap_alloc(uint64_t size, int granularity)
 HBitmap *hb = g_new0(struct HBitmap, 1);
 unsigned i;
 
+assert(size <= INT64_MAX);
 hb->orig_size = size;
 
 assert(granularity >= 0 && granularity < 64);
@@ -746,6 +747,7 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size)
 uint64_t num_elements = size;
 uint64_t old;
 
+assert(size <= INT64_MAX);
 hb->orig_size = size;
 
 /* Size comes in as logical elements, adjust for granularity. */
-- 
2.21.1

[PULL v2 07/11] block/dirty-bitmap: add _next_dirty API

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

We have bdrv_dirty_bitmap_next_zero, let's add corresponding
bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than
bitmap iterators in some cases.

For test modify test_hbitmap_next_zero_check_range to check both
next_zero and next_dirty and add some new checks.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-7-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/block/dirty-bitmap.h |   2 +
 include/qemu/hbitmap.h   |  13 
 block/dirty-bitmap.c |   6 ++
 tests/test-hbitmap.c | 130 ---
 util/hbitmap.c   |  60 
 5 files changed, 126 insertions(+), 85 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 27c72cc56a..b1f0de12db 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -105,6 +105,8 @@ for (bitmap = bdrv_dirty_bitmap_first(bs); bitmap; \
  bitmap = bdrv_dirty_bitmap_next(bitmap))
 
 char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
+int64_t bdrv_dirty_bitmap_next_dirty(BdrvDirtyBitmap *bitmap, int64_t offset,
+ int64_t bytes);
 int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 int64_t bytes);
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index b6e85f3d5d..6e9ae51ed3 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -297,6 +297,19 @@ void hbitmap_free(HBitmap *hb);
  */
 void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
 
+/*
+ * hbitmap_next_dirty:
+ *
+ * Find next dirty bit within selected range. If not found, return -1.
+ *
+ * @hb: The HBitmap to operate on
+ * @start: The bit to start from.
+ * @count: Number of bits to proceed. If @start+@count > bitmap size, the whole
+ * bitmap is looked through. You can use INT64_MAX as @count to search up to
+ * the bitmap end.
+ */
+int64_t hbitmap_next_dirty(const HBitmap *hb, int64_t start, int64_t count);
+
 /* hbitmap_next_zero:
  *
  * Find next not dirty bit within selected range. If not found, return -1.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index af9f5411a6..1b14c8eb26 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -860,6 +860,12 @@ char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap 
*bitmap, Error **errp)
 return hbitmap_sha256(bitmap->bitmap, errp);
 }
 
+int64_t bdrv_dirty_bitmap_next_dirty(BdrvDirtyBitmap *bitmap, int64_t offset,
+ int64_t bytes)
+{
+return hbitmap_next_dirty(bitmap->bitmap, offset, bytes);
+}
+
 int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 int64_t bytes)
 {
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 9d210dc18c..8905b8a351 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -816,92 +816,108 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 hbitmap_iter_next(&hbi);
 }
 
-static void test_hbitmap_next_zero_check_range(TestHBitmapData *data,
-   int64_t start,
-   int64_t count)
+static void test_hbitmap_next_x_check_range(TestHBitmapData *data,
+int64_t start,
+int64_t count)
 {
-int64_t ret1 = hbitmap_next_zero(data->hb, start, count);
-int64_t ret2 = start;
+int64_t next_zero = hbitmap_next_zero(data->hb, start, count);
+int64_t next_dirty = hbitmap_next_dirty(data->hb, start, count);
+int64_t next;
 int64_t end = start >= data->size || data->size - start < count ?
 data->size : start + count;
+bool first_bit = hbitmap_get(data->hb, start);
 
-for ( ; ret2 < end && hbitmap_get(data->hb, ret2); ret2++) {
+for (next = start;
+ next < end && hbitmap_get(data->hb, next) == first_bit;
+ next++)
+{
 ;
 }
-if (ret2 == end) {
-ret2 = -1;
+
+if (next == end) {
+next = -1;
 }
 
-g_assert_cmpint(ret1, ==, ret2);
+g_assert_cmpint(next_dirty, ==, first_bit ? start : next);
+g_assert_cmpint(next_zero, ==, first_bit ? next : start);
 }
 
-static void test_hbitmap_next_zero_check(TestHBitmapData *data, int64_t start)
+static void test_hbitmap_next_x_check(TestHBitmapData *data, int64_t start)
 {
-test_hbitmap_next_zero_check_range(data, start, INT64_MAX);
+test_hbitmap_next_x_check_range(data, start, INT64_MAX);
 }
 
-static void test_hbitmap_next_zero_do(TestHBitmapData *data, int granularity)
+static void test_hbitmap_next_x_do(TestHBitmapData *data, int granularity)
 {
 hbitmap_test_init(data, L3

[PULL v2 03/11] hbitmap: move hbitmap_iter_next_word to hbitmap.c

2020-03-18 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

The function is definitely internal (it's not used by third party and
it has complicated interface). Move it to .c file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-id: 20200205112041.6003-3-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 include/qemu/hbitmap.h | 30 --
 util/hbitmap.c | 29 +
 2 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 1bf944ca3d..ab227b117f 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -362,34 +362,4 @@ void hbitmap_free_meta(HBitmap *hb);
  */
 int64_t hbitmap_iter_next(HBitmapIter *hbi);
 
-/**
- * hbitmap_iter_next_word:
- * @hbi: HBitmapIter to operate on.
- * @p_cur: Location where to store the next non-zero word.
- *
- * Return the index of the next nonzero word that is set in @hbi's
- * associated HBitmap, and set *p_cur to the content of that word
- * (bits before the index that was passed to hbitmap_iter_init are
- * trimmed on the first call).  Return -1, and set *p_cur to zero,
- * if all remaining words are zero.
- */
-static inline size_t hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long 
*p_cur)
-{
-unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
-
-if (cur == 0) {
-cur = hbitmap_iter_skip_words(hbi);
-if (cur == 0) {
-*p_cur = 0;
-return -1;
-}
-}
-
-/* The next call will resume work from the next word.  */
-hbi->cur[HBITMAP_LEVELS - 1] = 0;
-*p_cur = cur;
-return hbi->pos;
-}
-
-
 #endif
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 7f9b3e0cd7..a368dc5ef7 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -298,6 +298,35 @@ uint64_t hbitmap_count(const HBitmap *hb)
 return hb->count << hb->granularity;
 }
 
+/**
+ * hbitmap_iter_next_word:
+ * @hbi: HBitmapIter to operate on.
+ * @p_cur: Location where to store the next non-zero word.
+ *
+ * Return the index of the next nonzero word that is set in @hbi's
+ * associated HBitmap, and set *p_cur to the content of that word
+ * (bits before the index that was passed to hbitmap_iter_init are
+ * trimmed on the first call).  Return -1, and set *p_cur to zero,
+ * if all remaining words are zero.
+ */
+static size_t hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long *p_cur)
+{
+unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+
+if (cur == 0) {
+cur = hbitmap_iter_skip_words(hbi);
+if (cur == 0) {
+*p_cur = 0;
+return -1;
+}
+}
+
+/* The next call will resume work from the next word.  */
+hbi->cur[HBITMAP_LEVELS - 1] = 0;
+*p_cur = cur;
+return hbi->pos;
+}
+
 /* Count the number of set bits between start and end, not accounting for
  * the granularity.  Also an example of how to use hbitmap_iter_next_word.
  */
-- 
2.21.1

Re: [PULL 0/4] Error reporting patches for 2020-03-17

2020-03-18 Thread Peter Maydell

On Tue, 17 Mar 2020 at 16:38, Markus Armbruster  wrote:
>
> The following changes since commit 40c67636f67c2a89745f2e698522fe917326a952:
>
>   Merge remote-tracking branch 
> 'remotes/kraxel/tags/usb-20200317-pull-request' into staging (2020-03-17 
> 14:00:56 +)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-error-2020-03-17
>
> for you to fetch changes up to 709dfb64925ed2b2978bf4c17ab98b2c7a9a05c1:
>
>   hw/sd/ssi-sd: fix error handling in ssi_sd_realize (2020-03-17 17:30:03 
> +0100)
>
> 
> Error reporting patches for 2020-03-17
>
> 
> Markus Armbruster (3):
>   Use &error_abort instead of separate assert()
>   hw/misc/ivshmem: Use one Error * variable instead of two
>   xen-block: Use one Error * variable instead of two
>
> Vladimir Sementsov-Ogievskiy (1):
>   hw/sd/ssi-sd: fix error handling in ssi_sd_realize
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM

[PULL v2 00/11] Bitmaps patches

2020-03-18 Thread John Snow

The following changes since commit d649689a8ecb2e276cc20d3af6d416e3c299cb17:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2020-03-17 18:33:05 +)

are available in the Git repository at:

  https://github.com/jnsnow/qemu.git tags/bitmaps-pull-request

for you to fetch changes up to 2d00cbd8e222a4adc08f415c399e84590ee8ff9a:

  block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty (2020-03-18 14:03:46 
-0400)


Pull request



Eric Blake (1):
  build: Silence clang warning on older glib autoptr usage

Vladimir Sementsov-Ogievskiy (10):
  hbitmap: assert that we don't create bitmap larger than INT64_MAX
  hbitmap: move hbitmap_iter_next_word to hbitmap.c
  hbitmap: unpublish hbitmap_iter_skip_words
  hbitmap: drop meta bitmaps as they are unused
  block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
  block/dirty-bitmap: add _next_dirty API
  block/dirty-bitmap: improve _next_dirty_area API
  nbd/server: introduce NBDExtentArray
  nbd/server: use bdrv_dirty_bitmap_next_dirty_area
  block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty

 configure|  20 +++
 include/block/dirty-bitmap.h |   9 +-
 include/qemu/hbitmap.h   |  95 +++
 block/dirty-bitmap.c |  16 +-
 block/qcow2-bitmap.c |  15 +-
 nbd/server.c | 251 ++--
 tests/test-hbitmap.c | 316 +--
 util/hbitmap.c   | 134 +--
 8 files changed, 395 insertions(+), 461 deletions(-)

-- 
2.21.1

Re: [PULL v2 00/37] Linux user for 5.0 patches

2020-03-18 Thread Laurent Vivier

Le 18/03/2020 à 21:17, Richard Henderson a écrit :
> On 3/18/20 12:58 PM, Laurent Vivier wrote:
>>> However, from the error message above, it's clear that cpu_loop.o has not 
>>> been
>>> rebuilt properly.
>>>
>>
>> In the series merged here syscall_nr.h are moved from source directory
>> to build directory.
>>
>> The include path of the files is based on the dependecy files (*.d), and
>> to force the update of this path PATCH 13 removes all the .d files that
>> have a dependecy on the syscall_nr.h file in the source path.
>>
>> This is added in configure:
>>
>> --- a/configure
>> +++ b/configure
>> @@ -1887,6 +1887,17 @@ fi
>>  # Remove old dependency files to make sure that they get properly
>> regenerated
>>  rm -f */config-devices.mak.d
>>
>> +# Remove syscall_nr.h to be sure they will be regenerated in the build
>> +# directory, not in the source directory
>> +for arch in ; do
>> +# remove the file if it has been generated in the source directory
>> +rm -f "${source_path}/linux-user/${arch}/syscall_nr.h"
>> +# remove the dependency files
>> +find . -name "*.d" \
>> +   -exec grep -q
>> "${source_path}/linux-user/${arch}/syscall_nr.h" {} \; \
>> +   -exec rm {} \;
>> +done
> ...
>> Perhaps it removes a dependency that should trigger the rebuild of
>> cpu_loop.o?
> 
> Ah, yes indeed. It removes *all* dependencies for cpu_loop.o, so unless we
> touch the cpu_loop.c source file, nothing gets done.
> 
> I think you're trying to be too fine grained here, since the *.o file has to 
> go
> away with the *.d file.  Why not just
> 
>   make ${arch}-linux-user/clean
> 
> ?

The idea was to be able to bisect the series as the syscall_nr.h were
added incrementally without rebuilding all the files.

If I remove the loop in the configure where to add the "make
${arch}-linux-user/clean"?

Thanks,
Laurent

Re: [Bug 1743191] Re: Interacting with NetBSD serial console boot blocks no longer works

2020-03-18 Thread Ottavio Caruso

On Fri, 6 Mar 2020 at 13:24, Gerd Hoffmann <1743...@bugs.launchpad.net> wrote:
> So one option is to turn off seabios sercon: "qemu -nographic -machine
> graphics=on".

This works for me, but only if I turn off "q35", therefore changing
from a sata disk to a plain ide:

qemu-system-x86_64 \
-drive if=virtio,file=/home/oc/VM/img/netbsd.image,index=0,media=disk \
-drive if=virtio,file=/home/oc/VM/img/newdisk2.img,index=1,media=disk \
-m 300M -cpu host -smp $(nproc) \
-nic user,hostfwd=tcp::6665-:22,model=virtio-net-pci,ipv6=off \
-nographic -machine accel=kvm,graphics=on

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1743191

Title:
  Interacting with NetBSD serial console boot blocks no longer works

Status in QEMU:
  New

Bug description:
  The NetBSD boot blocks display a menu allowing the user to make a
  selection using the keyboard.  For example, when booting a NetBSD
  installation CD-ROM, the menu looks like this:

   1. Install NetBSD
   2. Install NetBSD (no ACPI)
   3. Install NetBSD (no ACPI, no SMP)
   4. Drop to boot prompt

  Choose an option; RETURN for default; SPACE to stop countdown.
  Option 1 will be chosen in 30 seconds.

  When booting NetBSD in a recent qemu using an emulated serial console,
  making this menu selection no longer works: when you type the selected
  number, the keyboard input is ignored, and the 30-second countdown
  continues.  In older versions of qemu, it works.

  To reproduce the problem, run:

 wget 
http://ftp.netbsd.org/pub/NetBSD/NetBSD-7.1.1/amd64/installation/cdrom/boot-com.iso
 qemu-system-x86_64 -nographic -cdrom boot-com.iso

  During the 30-second countdown, press 4

  Expected behavior: The countdown stops and you get a ">" prompt

  Incorrect behavior: The countdown continues

  There may also be some corruption of the terminal output; for example,
  "Option 1 will be chosen in 30 seconds" may be displayed as "Option 1
  will be chosen in p0 seconds".

  Using bisection, I have determined that the problem appeared with qemu
  commit 083fab0290f2c40d3d04f7f22eed9c8f2d5b6787, in which seabios was
  updated to 1.11 prerelease, and the problem is still there as of
  commit 7398166ddf7c6dbbc9cae6ac69bb2feda14b40ac.  The host operating
  system used for the tests was Debian 9 x86_64.

  Credit for discovering this bug goes to Paul Goyette.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1743191/+subscriptions

[PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-18 Thread Kirti Wankhede

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should set flag
VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
zeroed by user space application. Bitmap size and page size should be set
by user application.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 55 ++---
 include/uapi/linux/vfio.h   | 11 +
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d6417fb02174..aa1ac30f7854 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -939,7 +939,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
bitmap_size)
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+struct vfio_bitmap *bitmap)
 {
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -990,6 +991,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 * will be returned if these conditions are not met.  The v2 interface
 * will only return success and a size of zero if there were no
 * mappings within the range.
+*
+* When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
+* must be for single mapping. Multiple mappings with this flag set is
+* not supported.
 */
if (iommu->v2) {
dma = vfio_find_dma(iommu, unmap->iova, 1);
@@ -997,6 +1002,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
ret = -EINVAL;
goto unlock;
}
+
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+   (dma->iova != unmap->iova || dma->size != unmap->size)) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
@@ -1014,6 +1026,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
if (dma->task->mm != current->mm)
break;
 
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+iommu->dirty_page_tracking)
+   vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+   bitmap->pgsize,
+   (unsigned char __user *) bitmap->data);
+
if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
struct vfio_iommu_type1_dma_unmap nb_unmap;
 
@@ -2369,17 +2387,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
struct vfio_iommu_type1_dma_unmap unmap;
-   long ret;
+   struct vfio_bitmap bitmap = { 0 };
+   int ret;
 
minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
if (copy_from_user(&unmap, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (unmap.argsz < minsz || unmap.flags)
+   if (unmap.argsz < minsz ||
+   unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
return -EINVAL;
 
-   ret = vfio_dma_do_unmap(iommu, &unmap);
+   if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+   unsigned long pgshift;
+   uint64_t iommu_pgsize =
+1 << __ffs(vfio_pgsize_bitmap(iommu));
+
+   if (unmap.argsz < (minsz + sizeof(bitmap)))
+   return -EINVAL;
+
+   if (copy_from_user(&bitmap,
+  (void __user *)(arg + minsz),
+  sizeof(bitmap)))
+   return -EFAULT;
+
+   /* allow only min supported pgsize */
+   if (bitmap.pgsize != iommu_pgsize)
+   return -EINVAL;
+   if (!access_ok((void __user *)bitmap.data, bitmap.size))
+   return -EINVAL;
+
+   pgshift = __ffs(bitmap.pgsize);

[PATCH v14 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-18 Thread Kirti Wankhede

Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio.c | 13 +--
 drivers/vfio/vfio_iommu_type1.c | 77 +++--
 include/linux/vfio.h|  4 ++-
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 210fcf426643..311b5e4e111e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -85,6 +85,7 @@ struct vfio_group {
atomic_topened;
wait_queue_head_t   container_q;
boolnoiommu;
+   unsigned intdev_counter;
struct kvm  *kvm;
struct blocking_notifier_head   notifier;
 };
@@ -555,6 +556,7 @@ struct vfio_device *vfio_group_create_device(struct 
vfio_group *group,
 
mutex_lock(&group->device_lock);
list_add(&device->group_next, &group->device_list);
+   group->dev_counter++;
mutex_unlock(&group->device_lock);
 
return device;
@@ -567,6 +569,7 @@ static void vfio_device_release(struct kref *kref)
struct vfio_group *group = device->group;
 
list_del(&device->group_next);
+   group->dev_counter--;
mutex_unlock(&group->device_lock);
 
dev_set_drvdata(device->dev, NULL);
@@ -1933,6 +1936,9 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
if (!group)
return -ENODEV;
 
+   if (group->dev_counter > 1)
+   return -EINVAL;
+
ret = vfio_group_add_container_user(group);
if (ret)
goto err_pin_pages;
@@ -1940,7 +1946,8 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
container = group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
-   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+   ret = driver->ops->pin_pages(container->iommu_data,
+group->iommu_group, user_pfn,
 npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -2038,8 +2045,8 @@ int vfio_group_pin_pages(struct vfio_group *group,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
-user_iova_pfn, npage,
-prot, phys_pfn);
+group->iommu_group, user_iova_pfn,
+npage, prot, phys_pfn);
else
ret = -ENOTTY;
 
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 912629320719..deec09f4b0f6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,6 +72,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_domain {
@@ -99,6 +100,7 @@ struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
boolmdev_group; /* An mdev group */
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_iova {
@@ -132,6 +134,10 @@ struct vfio_regions {
 static int put_pfn(unsigned long pfn, int prot);
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
+static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
+  struct iommu_group *iommu_group);
+
+static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
 /*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
@@ -556,11 +562,13 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, 
dma_addr_t iova,
 }
 
 static int vfio_iommu_type1_pin_pages(void *iommu_data,
+ struct iommu_group *iommu_group,
  unsigned long *user_pfn,
  int npage, int prot,
  unsigned long *phys_pfn)
 {
struct vfio_iommu *iommu = iommu_data;
+   struct vfi

Re: [PULL v2 00/37] Linux user for 5.0 patches

2020-03-18 Thread Richard Henderson

On 3/18/20 12:58 PM, Laurent Vivier wrote:
>> However, from the error message above, it's clear that cpu_loop.o has not 
>> been
>> rebuilt properly.
>>
> 
> In the series merged here syscall_nr.h are moved from source directory
> to build directory.
> 
> The include path of the files is based on the dependecy files (*.d), and
> to force the update of this path PATCH 13 removes all the .d files that
> have a dependecy on the syscall_nr.h file in the source path.
> 
> This is added in configure:
> 
> --- a/configure
> +++ b/configure
> @@ -1887,6 +1887,17 @@ fi
>  # Remove old dependency files to make sure that they get properly
> regenerated
>  rm -f */config-devices.mak.d
> 
> +# Remove syscall_nr.h to be sure they will be regenerated in the build
> +# directory, not in the source directory
> +for arch in ; do
> +# remove the file if it has been generated in the source directory
> +rm -f "${source_path}/linux-user/${arch}/syscall_nr.h"
> +# remove the dependency files
> +find . -name "*.d" \
> +   -exec grep -q
> "${source_path}/linux-user/${arch}/syscall_nr.h" {} \; \
> +   -exec rm {} \;
> +done
...
> Perhaps it removes a dependency that should trigger the rebuild of
> cpu_loop.o?

Ah, yes indeed. It removes *all* dependencies for cpu_loop.o, so unless we
touch the cpu_loop.c source file, nothing gets done.

I think you're trying to be too fine grained here, since the *.o file has to go
away with the *.d file.  Why not just

  make ${arch}-linux-user/clean

?

r~

[PATCH v14 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages

2020-03-18 Thread Kirti Wankhede

vfio_pfn.ref_count is always updated by holding iommu->lock, using atomic
variable is overkill.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9fdfae1cb17a..70aeab921d0f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -112,7 +112,7 @@ struct vfio_pfn {
struct rb_node  node;
dma_addr_t  iova;   /* Device address */
unsigned long   pfn;/* Host pfn */
-   atomic_tref_count;
+   unsigned intref_count;
 };
 
 struct vfio_regions {
@@ -233,7 +233,7 @@ static int vfio_add_to_pfn_list(struct vfio_dma *dma, 
dma_addr_t iova,
 
vpfn->iova = iova;
vpfn->pfn = pfn;
-   atomic_set(&vpfn->ref_count, 1);
+   vpfn->ref_count = 1;
vfio_link_pfn(dma, vpfn);
return 0;
 }
@@ -251,7 +251,7 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct 
vfio_dma *dma,
struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova);
 
if (vpfn)
-   atomic_inc(&vpfn->ref_count);
+   vpfn->ref_count++;
return vpfn;
 }
 
@@ -259,7 +259,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
 {
int ret = 0;
 
-   if (atomic_dec_and_test(&vpfn->ref_count)) {
+   vpfn->ref_count--;
+   if (!vpfn->ref_count) {
ret = put_pfn(vpfn->pfn, dma->prot);
vfio_remove_from_pfn_list(dma, vpfn);
}
-- 
2.7.0

[PATCH v14 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-03-18 Thread Kirti Wankhede

Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver
support dirty pages tracking.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 3 ++-
 include/uapi/linux/vfio.h   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index aa1ac30f7854..912629320719 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2340,7 +2340,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.cap_offset = 0; /* output, no-recopy necessary */
}
 
-   info.flags = VFIO_IOMMU_INFO_PGSIZES;
+   info.flags = VFIO_IOMMU_INFO_PGSIZES |
+VFIO_IOMMU_INFO_DIRTY_PGS;
 
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index a704e5380f04..893ae7517735 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -947,8 +947,9 @@ struct vfio_device_ioeventfd {
 struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
-#define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
-#define VFIO_IOMMU_INFO_CAPS   (1 << 1)/* Info supports caps */
+#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
+#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */
__u64   iova_pgsizes;   /* Bitmap of supported page sizes */
__u32   cap_offset; /* Offset within info struct of first cap */
 };
-- 
2.7.0

[PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-18 Thread Kirti Wankhede

VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 205 +++-
 1 file changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..d6417fb02174 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
 };
 
 struct vfio_domain {
@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
 };
 
 struct vfio_group {
@@ -125,7 +127,10 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(&iommu->domain_list))
 
+#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
+
 static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
 /*
  * This code handles mapping and unmapping of user data buffers
@@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(&old->node, &iommu->dma_list);
 }
 
+static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(&iommu->dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   struct rb_node *p;
+   unsigned long npages = dma->size / pgsize;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+   return -ENOMEM;
+   }
+
+   if (RB_EMPTY_ROOT(&dma->pfn_list))
+   continue;
+
+   for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
+node);
+
+   bitmap_set(dma->bitmap,
+   (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(&iommu->dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+}
+
 /*
  * Helper Functions for host iova-pfn list
  */
@@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
vfio_unpin_page_external(dma, iova, do_accounting);
goto pin_unwind;
}
+
+   if (iommu->dirty_page_tracking) {
+   unsigned long pgshift =
+__ffs(vfio_pgsize_bitmap(iommu));
+
+   bitmap_set(dma->bitmap,
+  (vpfn->iova - dma->iova) >> pgshift, 1);
+   }
}
 
ret = i;
@@ -801,6 +863,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, 
struct vfio_dma *dma)
vfio_unmap_unpin(iommu, dma, true);
vfio_unlink_dma(iommu, dma);
put_task_struct(dma->task);
+   kfree(dma->bitmap);
kfree(dma);
iommu->dma_avail++;

[PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-18 Thread Kirti Wankhede

- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
  0th offset of migration region to get/set VFIO device related
  information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 227 ++
 1 file changed, 227 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..d0021467af53 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2. Save the state or suspend
+ * |->|-->|
+ *
+ * 3. Save

[PATCH v14 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking.

2020-03-18 Thread Kirti Wankhede

IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop pinned and unpinned pages tracking and to get
bitmap of all dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 55 +++
 1 file changed, 55 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d0021467af53..043e9eafb255 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -995,6 +995,12 @@ struct vfio_iommu_type1_dma_map {
 
 #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
 
+struct vfio_bitmap {
+   __u64pgsize;/* page size for bitmap */
+   __u64size;  /* in bytes */
+   __u64 __user *data; /* one bit per page */
+};
+
 /**
  * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
  * struct vfio_dma_unmap)
@@ -1021,6 +1027,55 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size of
+ * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which
+ * operation to perform, details as below:
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
+ * migration is active and IOMMU module should track pages which are pinned and
+ * could be dirtied by device.
+ * Dirty pages are tracked until tracking is stopped by user application by
+ * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
+ * IOMMU should stop tracking pinned pages.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
+ * IOCTL returns dirty pages bitmap for IOMMU container during migration for
+ * given IOVA range. User must provide data[] as the structure
+ * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
+ * pgsize. This interface supports to get bitmap of smallest supported pgsize
+ * only and can be modified in future to get bitmap of specified pgsize.
+ * User must allocate memory for bitmap, zero the bitmap memory and set size
+ * of allocated memory in bitmap_size field. One bit is used to represent one
+ * page consecutively starting from iova offset. User should provide page size
+ * in 'pgsize'. Bit set in bitmap indicates page at that offset from iova is
+ * dirty. Caller must set argsz including size of structure
+ * vfio_iommu_type1_dirty_bitmap_get.
+ *
+ * Only one flag should be set at a time.
+ *
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START  (1 << 0)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP   (1 << 1)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2)
+   __u8 data[];
+};
+
+struct vfio_iommu_type1_dirty_bitmap_get {
+   __u64  iova;/* IO virtual address */
+   __u64  size;/* Size of iova range */
+   struct vfio_bitmap bitmap;
+};
+
+#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0

[PATCH v14 Kernel 0/7] KABIs to support migration for VFIO devices

2020-03-18 Thread Kirti Wankhede

Hi,

This patch set adds:
* New IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with
  respect to IOMMU container rather than per device. All pages pinned by
  vendor driver through vfio_pin_pages external API has to be marked as
  dirty during  migration. When IOMMU capable device is present in the
  container and all pages are pinned and mapped, then all pages are marked
  dirty.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned should be considered
  as dirty.
  This ioctl is also used to start/stop dirty pages tracking for pinned and
  unpinned pages while migration is active.

* Updated IOCTL VFIO_IOMMU_UNMAP_DMA to get dirty pages bitmap before
  unmapping IO virtual address range.
  With vIOMMU, during pre-copy phase of migration, while CPUs are still
  running, IO virtual address unmap can happen while device still keeping
  reference of guest pfns. Those pages should be reported as dirty before
  unmap, so that VFIO user space application can copy content of those
  pages from source to destination.

* Patch 7 detect if IOMMU capable device driver is smart to report pages
  to be marked dirty by pinning pages using vfio_pin_pages() API.


Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required such that vendor
driver could report dirty pages to VFIO module during migration phases.

Adding revision history from previous QEMU patch set to understand KABI
changes done till now

v13 -> v14
- Added struct vfio_bitmap to kabi. updated structure
  vfio_iommu_type1_dirty_bitmap_get and vfio_iommu_type1_dma_unmap.
- All small changes suggested by Alex.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v12 -> v13
- Changed bitmap allocation in vfio_iommu_type1 to per vfio_dma
- Changed VFIO_IOMMU_DIRTY_PAGES ioctl behaviour to be per vfio_dma range.
- Changed vfio_iommu_type1_dirty_bitmap structure to have separate data
  field.

v11 -> v12
- Changed bitmap allocation in vfio_iommu_type1.
- Remove atomicity of ref_count.
- Updated comments for migration device state structure about error
  reporting.
- Nit picks from v11 reviews

v10 -> v11
- Fix pin pages API to free vpfn if it is marked as unpinned tracking page.
- Added proposal to detect if IOMMU capable device calls external pin pages
  API to mark pages dirty.
- Nit picks from v10 reviews

v9 -> v10:
- Updated existing VFIO_IOMMU_UNMAP_DMA ioctl to get dirty pages bitmap
  during unmap while migration is active
- Added flag in VFIO_IOMMU_GET_INFO to indicate driver support dirty page
  tracking.
- If iommu_mapped, mark all pages dirty.
- Added unpinned pages tracking while migration is active.
- Updated comments for migration device state structure with bit
  combination table and state transition details.

v8 -> v9:
- Split patch set in 2 sets, Kernel and QEMU.
- Dirty pages bitmap is queried from IOMMU container rather than from
  vendor driver for per device. Added 2 ioctls to achieve this.

v7 -> v8:
- Updated comments for KABI
- Added BAR address validation check during PCI device's config space load
  as suggested by Dr. David Alan Gilbert.
- Changed vfio_migration_set_state() to set or clear device state flags.
- Some nit fixes.

v6 -> v7:
- Fix build failures.

v5 -> v6:
- Fix build failure.

v4 -> v5:
- Added decriptive comment about the sequence of access of members of
  structure vfio_device_migration_info to be followed based on Alex's
  suggestion
- Updated get dirty pages sequence.
- As per Cornelia Huck's suggestion, added callbacks to VFIODeviceOps to
  get_object, save_config and load_config.
- Fixed multiple nit picks.
- Tested live migration with multiple vfio device assigned to a VM.

v3 -> v4:
- Added one more bit for _RESUMING flag to be set explicitly.
- data_offset field is read-only for user space application.
- data_size is read for every iteration before reading data from migration,
  that is removed assumption that data will be till end of migration
  region.
- If vendor driver supports mappable sparsed region, map those region
  during setup state of save/load, similarly unmap those from cleanup
  routines.
- Handles race condition that causes data corruption in migration region
  during save device state by adding mutex and serialiaing save_buffer and
  get_dirty_pages routines.
- Skip called get_dirty_pages routine for mapped MMIO region of device.
- Added trace events.
- Split into multiple functional patches.

v2 -> v3:
- Removed enum of VFIO device states.

[PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-18 Thread Andrzej Jakowski

This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to 
HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will 
stay
persistent across system reboot.

Signed-off-by: Andrzej Jakowski 
---
v2:
 - reworked PMR to use HostMemoryBackend instead of directly mapping PMR
   backend file into qemu [1] (Stefan)

v1:
 - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
   improved performance in virtualized environment [2] (Stefan)

 - added check if pmr size is power of two in size [3] (David)

 - addressed cross compilation build problems reported by CI environment

[1]: 
https://lore.kernel.org/qemu-devel/20200306223853.37958-1-andrzej.jakow...@linux.intel.com/
[2]: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
[3]: 
https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
---
Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
specification. This patch implements initial support for it in NVMe driver.
---
 hw/block/nvme.c   | 117 +++-
 hw/block/nvme.h   |   2 +
 hw/block/trace-events |   5 ++
 include/block/nvme.h  | 172 ++
 4 files changed, 294 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..70fd09d293 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,10 +19,18 @@
  *  -drive file=,if=none,id=
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
+ *  [pmrdev=,] \
  *  num_queues=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ *
+ * Either cmb or pmr - due to limitation in available BAR indexes.
+ * pmr_file file needs to be power of two in size.
+ * Enabling pmr emulation can be achieved by pointing to memory-backend-file.
+ * For example:
+ * -object memory-backend-file,id=,share=on,mem-path=, \
+ *  size=  -device nvme,...,pmrdev=
  */
 
 #include "qemu/osdep.h"
@@ -35,7 +43,9 @@
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include "sysemu/hostmem.h"
 #include "sysemu/block-backend.h"
+#include "exec/ramblock.h"
 
 #include "qemu/log.h"
 #include "qemu/module.h"
@@ -1141,6 +1151,26 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
"invalid write to read only CMBSZ, ignored");
 return;
+case 0xE00: /* PMRCAP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
+   "invalid write to PMRCAP register, ignored");
+return;
+case 0xE04: /* TODO PMRCTL */
+break;
+case 0xE08: /* PMRSTS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
+   "invalid write to PMRSTS register, ignored");
+return;
+case 0xE0C: /* PMREBS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
+   "invalid write to PMREBS register, ignored");
+return;
+case 0xE10: /* PMRSWTP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
+   "invalid write to PMRSWTP register, ignored");
+return;
+case 0xE14: /* TODO PMRMSC */
+ break;
 default:
 NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -1169,6 +1199,23 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 }
 
 if (addr < sizeof(n->bar)) {
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure prior writes
+ * made it to persistent media
+ */
+if (addr == 0xE08 &&
+(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) {
+int status;
+
+status = qemu_msync((void *)n->pmrdev->mr.ram_block->host,
+n->pmrdev->size,
+n->pmrdev->mr.ram_block->fd);
+if (!status) {
+NVME_GUEST_ERR(nvme_ub_mmiord_pmrread_barrier,
+   "error while persisting data");
+}
+}
 memcpy(&val, ptr + addr, size);
 } else {
 NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs,
@@ -1332,6 +1379,23 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 error_setg(errp, "serial property not set");
 return;
 }
+
+if (!n->cmb_size_mb && n->pmrdev) {
+if (host_memory_backend_is_mapped(n->pmrdev)) {
+char *path = 
object_get_canonical_path_component(OBJECT(n->pmrdev));
+error_setg(errp, "can't use already busy memdev: %s",

Re: [PULL v2 00/37] Linux user for 5.0 patches

2020-03-18 Thread Laurent Vivier

Le 18/03/2020 à 20:46, Richard Henderson a écrit :
> On 3/18/20 6:57 AM, Peter Maydell wrote:
>> My set of "run ls for various architectures" linux-user tests
>> https://people.linaro.org/~peter.maydell/linux-user-test-pmm-20200114.tgz
>> fails with this pullreq:
>>
>> e104462:bionic:linux-user-test-0.3$
>> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/x86_64-linux-user/qemu-x86_64
>> -L ./gnemul/qemu-x86_64 x86_64/ls -l dummyfile
>> qemu: 0x40008117e9: unhandled CPU exception 0x101 - aborting
> 
> 
> I replicated this on aarch64 host, with an existing build tree and merging in
> the pull request.  It does not occur when building the same merged tree from
> scratch.
> 
> I have no idea what the reason for this is.  Laurent suggested a file in the
> build tree that is shadowed by one in the source tree, but to me that makes no
> sense for this case:
> 
> It's target/i386/cpu.h that defines EXCP_SYSCALL (renumbered in this series
> from 0x100 to 0x101), which is not in the build tree.  It is
> linux-user/i386/cpu_loop.c that consumes EXCP_SYSCALL, and it is also not in
> the build tree.
> 
> However, from the error message above, it's clear that cpu_loop.o has not been
> rebuilt properly.
> 

In the series merged here syscall_nr.h are moved from source directory
to build directory.

The include path of the files is based on the dependecy files (*.d), and
to force the update of this path PATCH 13 removes all the .d files that
have a dependecy on the syscall_nr.h file in the source path.

This is added in configure:

--- a/configure
+++ b/configure
@@ -1887,6 +1887,17 @@ fi
 # Remove old dependency files to make sure that they get properly
regenerated
 rm -f */config-devices.mak.d

+# Remove syscall_nr.h to be sure they will be regenerated in the build
+# directory, not in the source directory
+for arch in ; do
+# remove the file if it has been generated in the source directory
+rm -f "${source_path}/linux-user/${arch}/syscall_nr.h"
+# remove the dependency files
+find . -name "*.d" \
+   -exec grep -q
"${source_path}/linux-user/${arch}/syscall_nr.h" {} \; \
+   -exec rm {} \;
+done
+
 if test -z "$python"
 then
 error_exit "Python not found. Use --python=/path/to/python"

For the use of the dependency see for instance PATCH 14:

--- a/configure
+++ b/configure
@@ -1889,7 +1889,7 @@ rm -f */config-devices.mak.d

 # Remove syscall_nr.h to be sure they will be regenerated in the build
 # directory, not in the source directory
-for arch in ; do
+for arch in alpha ; do
 # remove the file if it has been generated in the source directory
 rm -f "${source_path}/linux-user/${arch}/syscall_nr.h"
 # remove the dependency file
+++ b/linux-user/alpha/Makefile.objs
@@ -0,0 +1,5 @@
+generated-files-y += linux-user/alpha/syscall_nr.h
+
+syshdr := $(SRC_PATH)/linux-user/alpha/syscallhdr.sh
+%/syscall_nr.h: $(SRC_PATH)/linux-user/alpha/syscall.tbl $(syshdr)
+   $(call quiet-command, sh $(syshdr) $< $@ 
$(TARGET_SYSTBL_ABI),"GEN","$@")


%/syscall_nr.h is expanded with the absolute path found in the .d file.

Perhaps it removes a dependency that should trigger the rebuild of
cpu_loop.o?

Thanks,
Laurent

Re: [PATCH v2 1/1] target/ppc: don't byte swap ELFv2 signal handler

2020-03-18 Thread Richard Henderson

On 3/18/20 10:01 AM, Vincent Fazio wrote:
> From: Vincent Fazio 
> 
> Previously, the signal handler would be byte swapped if the target and
> host CPU used different endianness. This would cause a SIGSEGV when
> attempting to translate the opcode pointed to by the swapped address.
> 
>  Thread 1 "qemu-ppc64" received signal SIGSEGV, Segmentation fault.
>  0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:351
>  351__builtin_memcpy(&r, ptr, sizeof(r));
> 
>  #0  0x600a9257 in ldl_he_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:351
>  #1  0x600a92fe in ldl_be_p (ptr=0x4c2c0610) at 
> qemu/include/qemu/bswap.h:449
>  #2  0x600c0790 in translator_ldl_swap at 
> qemu/include/exec/translator.h:201
>  #3  0x6011c1ab in ppc_tr_translate_insn at 
> qemu/target/ppc/translate.c:7856
>  #4  0x6005ae70 in translator_loop at qemu/accel/tcg/translator.c:102
> 
> Now, no swap is performed and execution continues properly.
> 
> Signed-off-by: Vincent Fazio 
> Reviewed-by: Laurent Vivier 
> ---
> Changes since v1:
> - Drop host/target endianness callouts
> - Drop unnecessary pointer cast
> - Clarify commit message

Reviewed-by: Richard Henderson 


r~

Re: Qemu API documentation

2020-03-18 Thread John Snow

On 3/18/20 7:09 AM, Peter Maydell wrote:
> On Wed, 18 Mar 2020 at 09:55, Priyamvad Acharya
>  wrote:
>>
>> Hello developer community,
>>
>> I am working on implementing a custom device in Qemu, so to implement it I 
>> need documentation of functions which are used to emulate a hardware model 
>> in Qemu.
>>
>> What are the references to get it ?
> 
> QEMU has very little documentation of its internals;
> the usual practice is to figure things out by
> reading the source code. What we do have is in
> docs/devel. There are also often documentation comments
> for specific functions in the include files where
> those functions are declared, which form the API
> documentation for them.
> 

^ Unfortunately true. One thing you can do is try to pick an existing
device that's close to yours -- some donor PCI, USB etc device and start
using that as a reference.

If you can share (broad) details of what device you are trying to
implement, we might be able to point you to relevant examples to use as
a reference.

--js

Re: Missing Null check

2020-03-18 Thread Mansour Ahmadi

Thanks for the fix!

Best,
Mansour


On Wed, Mar 18, 2020 at 4:14 AM Philippe Mathieu-Daudé 
wrote:

> On 3/17/20 9:40 PM, Mansour Ahmadi wrote:
> > Is a NULL check on 'drv1->format_name' missing here?
> >
> https://github.com/qemu/qemu/blob/cc818a2148c5f321bdeb8e5564bdb2914e824600/block.c#L400-L403
> >
> > if(!strcmp(drv1->format_name, format_name)) {
>
> This could be NULL indeed. I'd rather assertions in the entry function,
> bdrv_register():
>
> -- >8 --
> diff --git a/block.c b/block.c
> index a2542c977b..6b984dc883 100644
> --- a/block.c
> +++ b/block.c
> @@ -363,6 +363,7 @@ char
> *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
>
>   void bdrv_register(BlockDriver *bdrv)
>   {
> +assert(bdrv->format_name);
>   QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
>   }
>
> ---
>
> > While it is checked in similar case:
> >
> https://github.com/qemu/qemu/blob/cc818a2148c5f321bdeb8e5564bdb2914e824600/block.c#L797-L800
> >
> > if(drv1->protocol_name&& !strcmp(drv1->protocol_name, protocol)) {
>
> Because 'protocol_name' is optional.
>
> Regards,
>
> Phil.
>
>

Re: [PULL v2 00/37] Linux user for 5.0 patches

2020-03-18 Thread Richard Henderson

On 3/18/20 6:57 AM, Peter Maydell wrote:
> My set of "run ls for various architectures" linux-user tests
> https://people.linaro.org/~peter.maydell/linux-user-test-pmm-20200114.tgz
> fails with this pullreq:
> 
> e104462:bionic:linux-user-test-0.3$
> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/x86_64-linux-user/qemu-x86_64
> -L ./gnemul/qemu-x86_64 x86_64/ls -l dummyfile
> qemu: 0x40008117e9: unhandled CPU exception 0x101 - aborting

I replicated this on aarch64 host, with an existing build tree and merging in
the pull request.  It does not occur when building the same merged tree from
scratch.

I have no idea what the reason for this is.  Laurent suggested a file in the
build tree that is shadowed by one in the source tree, but to me that makes no
sense for this case:

It's target/i386/cpu.h that defines EXCP_SYSCALL (renumbered in this series
from 0x100 to 0x101), which is not in the build tree.  It is
linux-user/i386/cpu_loop.c that consumes EXCP_SYSCALL, and it is also not in
the build tree.

However, from the error message above, it's clear that cpu_loop.o has not been
rebuilt properly.

r~

[Bug 1866870] Re: KVM Guest pauses after upgrade to Ubuntu 20.04

2020-03-18 Thread Christian Ehrhardt 

I've tested smoe more cmbinations and found that I van have v4.2 work on focal.
Eventually I have realized that when I install start the qemu from Ubuntu not 
only that but also the formerly working build of v4.2.0 from git start to fail 
(without rebuilding).

A bit of package bisect later I found seabios to be related.
Focal is at 1.13.0-1
Eoan is at 1.12.0-1

After I knew that I verified and found it really only triggers on
seabios 1.13.0.

With 1.13 I was also able to break the qemu v4.0.0 git build on eoan.
As well as the packaged qemu in Eoan.

So it seems we are actually looking for a problem of seabios (instead of
qemu) with the Penryn chip.

I'll look at their changelog and bisect that tomorrow as time permits

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866870

Title:
  KVM Guest pauses after upgrade to Ubuntu 20.04

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Symptom:
  Error unpausing domain: internal error: unable to execute QEMU command 
'cont': Resetting the Virtual Machine is required

  Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in 
cb_wrapper
  callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
  callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 
66, in newfn
  ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/domain.py", line 1311, in 
resume
  self._backend.resume()
File "/usr/lib/python3/dist-packages/libvirt.py", line 2174, in resume
  if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self)
  libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': 
Resetting the Virtual Machine is required

  
  ---

  As outlined here:
  https://bugs.launchpad.net/qemu/+bug/1813165/comments/15

  After upgrade, all KVM guests are in a default pause state. Even after
  forcing them off via virsh, and restarting them the guests are paused.

  These Guests are not nested.

  A lot of diganostic information are outlined in the previous bug
  report link provided. The solution mentioned in previous report had
  been allegedly integrated into the downstream updates.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1866870/+subscriptions

[Bug 1867786] Re: Qemu PPC64 freezes with multi-core CPU

2020-03-18 Thread carlosedp

Hi Laurent, I'm on a MacOS Mojave running Qemu installed by homebrew
from master branch on the day I've opened the bug.

The option to install was: `brew install --HEAD qemu -s --verbose`.

Maybe it's a Mac related problem?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1867786

Title:
  Qemu PPC64 freezes with multi-core CPU

Status in QEMU:
  New

Bug description:
  I installed Debian 10 on a Qemu PPC64 VM running with the following
  flags:

  qemu-system-ppc64 \
   -nographic -nodefaults -monitor pty -serial stdio \
   -M pseries -cpu POWER9 -smp cores=4,threads=1 -m 4G \
   -drive file=debian-ppc64el-qemu.qcow2,format=qcow2,if=virtio \
   -netdev user,id=network01,$ports -device rtl8139,netdev=network01 \

  
  Within a couple minutes on any operation (could be a Go application or simply 
changing the hostname with hostnamectl, the VM freezes and prints this on the 
console:

  ```
  root@debian:~# [  950.428255] rcu: INFO: rcu_sched self-detected stall on CPU
  [  950.428453] rcu: 3-: (5318 ticks this GP) 
idle=8e2/1/0x4004 softirq=5957/5960 fqs=2544
  [  976.244481] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [zsh:462]

  Message from syslogd@debian at Mar 17 11:35:24 ...
   kernel:[  976.244481] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! 
[zsh:462]
  [  980.110018] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: 
{ 3-... } 5276 jiffies s: 93 root: 0x8/.
  [  980.77] rcu: blocking rcu_node structures:
  [ 1013.442268] rcu: INFO: rcu_sched self-detected stall on CPU
  [ 1013.442365] rcu: 3-: (21071 ticks this GP) 
idle=8e2/1/0x4004 softirq=5957/5960 fqs=9342
  ```

  If I change to 1 core on the command line, I haven't seen these
  freezes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1867786/+subscriptions

Re: [PATCH 04/11] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-18 Thread Cameron Esfahani via

Please add me to the HVF maintainers as well.

Cameron Esfahani
di...@apple.com

"In the elder days of Art, Builders wrought with greatest care each minute and 
unseen part; For the gods see everywhere."

"The Builders", H. W. Longfellow



> On Mar 16, 2020, at 5:00 AM, Philippe Mathieu-Daudé  wrote:
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Reviewed-by: Nikita Leshenko 
> Cc: Sergio Andres Gomez Del Real 
> Cc: Roman Bolshakov 
> Cc: Patrick Colp 
> Cc: Cameron Esfahani 
> Cc: Liran Alon 
> Cc: Heiher 
> ---
> MAINTAINERS | 6 ++
> 1 file changed, 6 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7ec42a18f7..bcf40afb85 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -420,6 +420,12 @@ F: accel/stubs/hax-stub.c
> F: target/i386/hax-all.c
> F: include/sysemu/hax.h
> 
> +HVF Accelerator
> +S: Orphan
> +F: accel/stubs/hvf-stub.c
> +F: target/i386/hvf/hvf.c
> +F: include/sysemu/hvf.h
> +
> WHPX CPUs
> M: Sunil Muthuswamy 
> S: Supported
> -- 
> 2.21.1
> 
>

1 2 3 >

1 - 100 of 258 matches

Mail list logo