date:20200504

[PATCH] iotests/055: Use cache.no-flush for vmdk target

2020-05-04 Thread Kevin Wolf

055 uses the backup block job to create a compressed backup of an
$IMGFMT image with both qcow2 and vmdk targets. However, cluster
allocation in vmdk is very slow because it flushes the image file after
each L2 update.

There is no reason why we need this level of safety in this test, so
let's disable flushes for vmdk. For the blockdev-backup tests this is
achieved by simply adding the cache.no-flush=on to the drive_add() for
the target. For drive-backup, the caching flags are copied from the
source node, so we'll also add the flag to the source node, even though
it is not vmdk.

This can make the test run significantly faster (though it doesn't make
a difference on tmpfs). In my usual setup it goes from ~45s to ~15s.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/055 | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index ab90062b99..002706c114 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -490,7 +490,7 @@ class TestSingleTransaction(iotests.QMPTestCase):
 
 class TestCompressedToQcow2(iotests.QMPTestCase):
 image_len = 64 * 1024 * 1024 # MB
-target_fmt = {'type': 'qcow2', 'args': ()}
+target_fmt = {'type': 'qcow2', 'args': (), 'drive-opts': ''}
 
 def tearDown(self):
 self.vm.shutdown()
@@ -501,13 +501,15 @@ class TestCompressedToQcow2(iotests.QMPTestCase):
 pass
 
 def do_prepare_drives(self, attach_target):
-self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
+self.vm = iotests.VM().add_drive('blkdebug::' + test_img,
+ opts=self.target_fmt['drive-opts'])
 
 qemu_img('create', '-f', self.target_fmt['type'], blockdev_target_img,
  str(self.image_len), *self.target_fmt['args'])
 if attach_target:
 self.vm.add_drive(blockdev_target_img,
-  format=self.target_fmt['type'], interface="none")
+  format=self.target_fmt['type'], interface="none",
+  opts=self.target_fmt['drive-opts'])
 
 self.vm.launch()
 
@@ -601,7 +603,8 @@ class TestCompressedToQcow2(iotests.QMPTestCase):
 
 
 class TestCompressedToVmdk(TestCompressedToQcow2):
-target_fmt = {'type': 'vmdk', 'args': ('-o', 'subformat=streamOptimized')}
+target_fmt = {'type': 'vmdk', 'args': ('-o', 'subformat=streamOptimized'),
+  'drive-opts': 'cache.no-flush=on'}
 
 @iotests.skip_if_unsupported(['vmdk'])
 def setUp(self):
-- 
2.25.3

Re: [PATCH qemu] spapr: Add PVR setting capability

2020-05-04 Thread Alexey Kardashevskiy




On 05/05/2020 15:50, David Gibson wrote:
> On Tue, May 05, 2020 at 10:56:17AM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 04/05/2020 21:30, Greg Kurz wrote:
>>> On Fri, 17 Apr 2020 14:11:05 +1000
>>> Alexey Kardashevskiy  wrote:
>>>
 At the moment the VCPU init sequence includes setting PVR which in case of
 KVM-HV only checks if it matches the hardware PVR mask as PVR cannot be
 virtualized by the hardware. In order to cope with various CPU revisions
 only top 16bit of PVR are checked which works for minor revision updates.

 However in every CPU generation starting POWER7 (at least) there were CPUs
 supporting the (almost) same POWER ISA level but having different top
 16bits of PVR - POWER7+, POWER8E, POWER8NVL; this time we got POWER9+
 with a new PVR family. We would normally add the PVR mask for the new one
 too, the problem with it is that although the physical machines exist,
 P9+ is not going to be released as a product, and this situation is likely
 to repeat in the future.

 Instead of adding every new CPU family in QEMU, this adds a new sPAPR
 machine capability to force PVR setting/checking. It is "on" by default
 to preserve the existing behavior. When "off", it is the user's
 responsibility to specify the correct CPU.

>>>
>>> I don't quite understand the motivation for this... what does this
>>> buy us ?
>>
>> I answered that part in another mail in this thread, shortly this is to
>> make QEMU work with HV KVM on unknown-to-QEMU CPU family (0x004f).
>>
>>
>>>
 Signed-off-by: Alexey Kardashevskiy 
 ---
  include/hw/ppc/spapr.h |  5 -
  hw/ppc/spapr.c |  1 +
  hw/ppc/spapr_caps.c| 18 ++
  target/ppc/kvm.c   | 16 ++--
  4 files changed, 37 insertions(+), 3 deletions(-)

 diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
 index e579eaf28c05..5ccac4d56871 100644
 --- a/include/hw/ppc/spapr.h
 +++ b/include/hw/ppc/spapr.h
 @@ -81,8 +81,10 @@ typedef enum {
  #define SPAPR_CAP_CCF_ASSIST0x09
  /* Implements PAPR FWNMI option */
  #define SPAPR_CAP_FWNMI 0x0A
 +/* Implements PAPR PVR option */
 +#define SPAPR_CAP_PVR   0x0B
  /* Num Caps */
 -#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
 +#define SPAPR_CAP_NUM   (SPAPR_CAP_PVR + 1)
  
  /*
   * Capability Values
 @@ -912,6 +914,7 @@ extern const VMStateDescription 
 vmstate_spapr_cap_nested_kvm_hv;
  extern const VMStateDescription vmstate_spapr_cap_large_decr;
  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
 +extern const VMStateDescription vmstate_spapr_cap_pvr;
  
  static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
  {
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 841b5ec59b12..ecc74c182b9f 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -4535,6 +4535,7 @@ static void spapr_machine_class_init(ObjectClass 
 *oc, void *data)
  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
 +smc->default_caps.caps[SPAPR_CAP_PVR] = SPAPR_CAP_ON;
  spapr_caps_add_properties(smc, &error_abort);
  smc->irq = &spapr_irq_dual;
  smc->dr_phb_enabled = true;
 diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
 index eb54f9422722..398b72b77f9f 100644
 --- a/hw/ppc/spapr_caps.c
 +++ b/hw/ppc/spapr_caps.c
 @@ -525,6 +525,14 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
 uint8_t val,
  }
  }
  
 +static void cap_pvr_apply(SpaprMachineState *spapr, uint8_t val, Error 
 **errp)
 +{
 +if (val) {
 +return;
 +}
 +warn_report("If you're uing kvm-hv.ko, only \"-cpu host\" is 
 supported");
 +}
 +
  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
  [SPAPR_CAP_HTM] = {
  .name = "htm",
 @@ -633,6 +641,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = 
 {
  .type = "bool",
  .apply = cap_fwnmi_apply,
  },
 +[SPAPR_CAP_PVR] = {
 +.name = "pvr",
 +.description = "Enforce PVR in KVM",
 +.index = SPAPR_CAP_PVR,
 +.get = spapr_cap_get_bool,
 +.set = spapr_cap_set_bool,
 +.type = "bool",
 +.apply = cap_pvr_apply,
 +},
  };
  
  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
 @@ -773,6 +790,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, 
 SPAPR_CAP_NESTED_KVM_HV);
  SPAPR_CAP_MIG_STATE(lar

Re: [PATCH v18 QEMU 02/18] vfio: Add function to unmap VFIO region

2020-05-04 Thread Philippe Mathieu-Daudé


Hi Kirti,

On 5/5/20 12:44 AM, Kirti Wankhede wrote:

This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
---
  hw/vfio/common.c  | 20 
  hw/vfio/trace-events  |  1 +
  include/hw/vfio/vfio-common.h |  1 +
  3 files changed, 22 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0c4..4a2f0d6a2233 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -983,6 +983,26 @@ int vfio_region_mmap(VFIORegion *region)
  return 0;
  }
  
+void vfio_region_unmap(VFIORegion *region)

+{
+int i;
+
+if (!region->mem) {
+return;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {


I'd refactor this  block <...

+trace_vfio_region_unmap(memory_region_name(®ion->mmaps[i].mem),
+region->mmaps[i].offset,
+region->mmaps[i].offset +
+region->mmaps[i].size - 1);
+memory_region_del_subregion(region->mem, ®ion->mmaps[i].mem);
+munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+object_unparent(OBJECT(®ion->mmaps[i].mem));
+region->mmaps[i].mmap = NULL;


...> into a helper and reuse it in vfio_region_mmap(). Well, actually 
I'd factor it out from vfio_region_mmap() then reuse it here. Anyway 
this is v18 so can be done later on top.


Reviewed-by: Philippe Mathieu-Daudé 


+}
+}
+
  void vfio_region_exit(VFIORegion *region)
  {
  int i;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a33ffd..8cdc27946cb8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -111,6 +111,7 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
  vfio_region_exit(const char *name, int index) "Device %s, region %d"
  vfio_region_finalize(const char *name, int index) "Device %s, region %d"
  vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) 
"Region %s unmap [0x%lx - 0x%lx]"
  vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device 
%s region %d: %d sparse mmap entries"
  vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
  vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) 
"%s index %d, %08x/%0x8"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd564209ac71..8d7a0fbb1046 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -171,6 +171,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
int index, const char *name);
  int vfio_region_mmap(VFIORegion *region);
  void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_unmap(VFIORegion *region);
  void vfio_region_exit(VFIORegion *region);
  void vfio_region_finalize(VFIORegion *region);
  void vfio_reset_handler(void *opaque);

Re: [PATCH v3 3/3] target/arm: Use clear_vec_high more effectively

2020-05-04 Thread Philippe Mathieu-Daudé


On 5/4/20 9:23 PM, Richard Henderson wrote:

Do not explicitly store zero to the NEON high part
when we can pass !is_q to clear_vec_high.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---


Patch easier to review with 'git-diff --function-context'.


  target/arm/translate-a64.c | 59 +++---
  1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 729e746e25..d1c9150c4f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -939,11 +939,10 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
  {
  /* This always zero-extends and writes to a full 128 bit wide vector */
  TCGv_i64 tmplo = tcg_temp_new_i64();
-TCGv_i64 tmphi;
+TCGv_i64 tmphi = NULL;
  
  if (size < 4) {

  MemOp memop = s->be_data + size;
-tmphi = tcg_const_i64(0);
  tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), memop);
  } else {
  bool be = s->be_data == MO_BE;
@@ -961,12 +960,13 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
  }
  
  tcg_gen_st_i64(tmplo, cpu_env, fp_reg_offset(s, destidx, MO_64));

-tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
-
  tcg_temp_free_i64(tmplo);
-tcg_temp_free_i64(tmphi);
  
-clear_vec_high(s, true, destidx);

+if (tmphi) {
+tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
+tcg_temp_free_i64(tmphi);
+}
+clear_vec_high(s, tmphi != NULL, destidx);


OK.


  }
  
  /*

@@ -6960,8 +6960,8 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
  return;
  }
  
-tcg_resh = tcg_temp_new_i64();

  tcg_resl = tcg_temp_new_i64();
+tcg_resh = NULL;
  
  /* Vd gets bits starting at pos bits into Vm:Vn. This is

   * either extracting 128 bits from a 128:128 concatenation, or
@@ -6973,7 +6973,6 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
  read_vec_element(s, tcg_resh, rm, 0, MO_64);


   but then  tcg_resh is NULL...


  do_ext64(s, tcg_resh, tcg_resl, pos);
  }
-tcg_gen_movi_i64(tcg_resh, 0);
  } else {
  TCGv_i64 tcg_hh;
  typedef struct {
@@ -6988,6 +6987,7 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
  pos -= 64;
  }
  
+tcg_resh = tcg_temp_new_i64();

  read_vec_element(s, tcg_resl, elt->reg, elt->elt, MO_64);
  elt++;
  read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
@@ -7003,9 +7003,12 @@ static void disas_simd_ext(DisasContext *s, uint32_t 
insn)
  
  write_vec_element(s, tcg_resl, rd, 0, MO_64);

  tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
  }
  
  /* TBL/TBX

@@ -7042,17 +7045,21 @@ static void disas_simd_tb(DisasContext *s, uint32_t 
insn)
   * the input.
   */
  tcg_resl = tcg_temp_new_i64();
-tcg_resh = tcg_temp_new_i64();
+tcg_resh = NULL;
  
  if (is_tblx) {

  read_vec_element(s, tcg_resl, rd, 0, MO_64);
  } else {
  tcg_gen_movi_i64(tcg_resl, 0);
  }
-if (is_tblx && is_q) {
-read_vec_element(s, tcg_resh, rd, 1, MO_64);
-} else {
-tcg_gen_movi_i64(tcg_resh, 0);
+
+if (is_q) {
+tcg_resh = tcg_temp_new_i64();
+if (is_tblx) {
+read_vec_element(s, tcg_resh, rd, 1, MO_64);
+} else {
+tcg_gen_movi_i64(tcg_resh, 0);
+}
  }
  
  tcg_idx = tcg_temp_new_i64();

@@ -7072,9 +7079,12 @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
  
  write_vec_element(s, tcg_resl, rd, 0, MO_64);

  tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);


OK.


  }
  
  /* ZIP/UZP/TRN

@@ -7111,7 +7121,7 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)
  }
  
  tcg_resl = tcg_const_i64(0);

-tcg_resh = tcg_const_i64(0);
+tcg_resh = is_q ? tcg_const_i64(0) : NULL;
  tcg_res = tcg_temp_new_i64();
  
  for (i = 0; i < elements; i++) {

@@ -7162,9 +7172,12 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)


More context:

   ...
   ofs = i * esize;
   if (ofs < 64) {
   tcg_gen_shli_i64(tcg_res, tcg_res, ofs);
   tcg_gen_or_i64(tcg_resl, tcg_resl, tcg_res);
   } else {
   tcg_gen_shl

Re: [PATCH] aspeed: Support AST2600A1 silicon revision

2020-05-04 Thread Cédric Le Goater

On 5/4/20 11:37 AM, Joel Stanley wrote:
> There are minimal differences from Qemu's point of view between the A0
> and A1 silicon revisions.
> 
> As the A1 exercises different code paths in u-boot it is desirable to
> emulate that instead.
> 
> Signed-off-by: Joel Stanley 

Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
>  hw/arm/aspeed.c  |  8 
>  hw/arm/aspeed_ast2600.c  |  6 +++---
>  hw/misc/aspeed_scu.c | 11 +--
>  include/hw/misc/aspeed_scu.h |  1 +
>  4 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 99a0f3fcf36e..91301efab32d 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -93,7 +93,7 @@ struct AspeedBoardState {
>  
>  /* Tacoma hardware value */
>  #define TACOMA_BMC_HW_STRAP1  0x
> -#define TACOMA_BMC_HW_STRAP2  0x
> +#define TACOMA_BMC_HW_STRAP2  0x0040
>  
>  /*
>   * The max ram region is for firmwares that scan the address space
> @@ -585,7 +585,7 @@ static void 
> aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
>  AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
>  
>  mc->desc   = "Aspeed AST2600 EVB (Cortex A7)";
> -amc->soc_name  = "ast2600-a0";
> +amc->soc_name  = "ast2600-a1";
>  amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
>  amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
>  amc->fmc_model = "w25q512jv";
> @@ -600,8 +600,8 @@ static void aspeed_machine_tacoma_class_init(ObjectClass 
> *oc, void *data)
>  MachineClass *mc = MACHINE_CLASS(oc);
>  AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
>  
> -mc->desc   = "Aspeed AST2600 EVB (Cortex A7)";
> -amc->soc_name  = "ast2600-a0";
> +mc->desc   = "OpenPOWER Tacoma BMC (Cortex A7)";
> +amc->soc_name  = "ast2600-a1";
>  amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
>  amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
>  amc->fmc_model = "mx66l1g45g";
> diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
> index 1a869e09b96a..c6e0ab84ac86 100644
> --- a/hw/arm/aspeed_ast2600.c
> +++ b/hw/arm/aspeed_ast2600.c
> @@ -557,9 +557,9 @@ static void aspeed_soc_ast2600_class_init(ObjectClass 
> *oc, void *data)
>  
>  dc->realize  = aspeed_soc_ast2600_realize;
>  
> -sc->name = "ast2600-a0";
> +sc->name = "ast2600-a1";
>  sc->cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
> -sc->silicon_rev  = AST2600_A0_SILICON_REV;
> +sc->silicon_rev  = AST2600_A1_SILICON_REV;
>  sc->sram_size= 0x1;
>  sc->spis_num = 2;
>  sc->ehcis_num= 2;
> @@ -571,7 +571,7 @@ static void aspeed_soc_ast2600_class_init(ObjectClass 
> *oc, void *data)
>  }
>  
>  static const TypeInfo aspeed_soc_ast2600_type_info = {
> -.name   = "ast2600-a0",
> +.name   = "ast2600-a1",
>  .parent = TYPE_ASPEED_SOC,
>  .instance_size  = sizeof(AspeedSoCState),
>  .instance_init  = aspeed_soc_ast2600_init,
> diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
> index 9d7482a9df19..ec4fef900e27 100644
> --- a/hw/misc/aspeed_scu.c
> +++ b/hw/misc/aspeed_scu.c
> @@ -431,6 +431,7 @@ static uint32_t aspeed_silicon_revs[] = {
>  AST2500_A0_SILICON_REV,
>  AST2500_A1_SILICON_REV,
>  AST2600_A0_SILICON_REV,
> +AST2600_A1_SILICON_REV,
>  };
>  
>  bool is_supported_silicon_rev(uint32_t silicon_rev)
> @@ -649,12 +650,10 @@ static const MemoryRegionOps aspeed_ast2600_scu_ops = {
>  .valid.unaligned = false,
>  };
>  
> -static const uint32_t ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = {
> -[AST2600_SILICON_REV]   = AST2600_SILICON_REV,
> -[AST2600_SILICON_REV2]  = AST2600_SILICON_REV,
> -[AST2600_SYS_RST_CTRL]  = 0xF7CFFEDC | 0x100,
> +static const uint32_t ast2600_a1_resets[ASPEED_AST2600_SCU_NR_REGS] = {
> +[AST2600_SYS_RST_CTRL]  = 0xF7C3FED8,
>  [AST2600_SYS_RST_CTRL2] = 0xFFFC,
> -[AST2600_CLK_STOP_CTRL] = 0xEFF43E8B,
> +[AST2600_CLK_STOP_CTRL] = 0x7F8A,
>  [AST2600_CLK_STOP_CTRL2]= 0xFFF0FFF0,
>  [AST2600_SDRAM_HANDSHAKE]   = 0x0040,  /* SoC completed DRAM init */
>  [AST2600_HPLL_PARAM]= 0x1000405F,
> @@ -684,7 +683,7 @@ static void aspeed_2600_scu_class_init(ObjectClass 
> *klass, void *data)
>  
>  dc->desc = "ASPEED 2600 System Control Unit";
>  dc->reset = aspeed_ast2600_scu_reset;
> -asc->resets = ast2600_a0_resets;
> +asc->resets = ast2600_a1_resets;
>  asc->calc_hpll = aspeed_2500_scu_calc_hpll; /* No change since AST2500 */
>  asc->apb_divider = 4;
>  asc->nr_regs = ASPEED_AST2600_SCU_NR_REGS;
> diff --git a/include/hw/misc/aspeed_scu.h b/include/hw/misc/aspeed_scu.h
> index 1d7f7ffc1598..a6739bb846b6 100644
> --- a/include/hw/misc/aspeed_scu.h
> +++ b/include/hw/misc/aspeed_scu.h
> @@ -41,6 +41,7 @@ typedef struct AspeedSCUState {
>  #define AST2500_A0_SILICON_REV   0x04000303U
>  #define AST2500_A1_SILICON_REV   0x04

Re: [PATCH qemu] spapr: Add PVR setting capability

2020-05-04 Thread David Gibson

On Tue, May 05, 2020 at 10:56:17AM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 04/05/2020 21:30, Greg Kurz wrote:
> > On Fri, 17 Apr 2020 14:11:05 +1000
> > Alexey Kardashevskiy  wrote:
> > 
> >> At the moment the VCPU init sequence includes setting PVR which in case of
> >> KVM-HV only checks if it matches the hardware PVR mask as PVR cannot be
> >> virtualized by the hardware. In order to cope with various CPU revisions
> >> only top 16bit of PVR are checked which works for minor revision updates.
> >>
> >> However in every CPU generation starting POWER7 (at least) there were CPUs
> >> supporting the (almost) same POWER ISA level but having different top
> >> 16bits of PVR - POWER7+, POWER8E, POWER8NVL; this time we got POWER9+
> >> with a new PVR family. We would normally add the PVR mask for the new one
> >> too, the problem with it is that although the physical machines exist,
> >> P9+ is not going to be released as a product, and this situation is likely
> >> to repeat in the future.
> >>
> >> Instead of adding every new CPU family in QEMU, this adds a new sPAPR
> >> machine capability to force PVR setting/checking. It is "on" by default
> >> to preserve the existing behavior. When "off", it is the user's
> >> responsibility to specify the correct CPU.
> >>
> > 
> > I don't quite understand the motivation for this... what does this
> > buy us ?
> 
> I answered that part in another mail in this thread, shortly this is to
> make QEMU work with HV KVM on unknown-to-QEMU CPU family (0x004f).
> 
> 
> > 
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  include/hw/ppc/spapr.h |  5 -
> >>  hw/ppc/spapr.c |  1 +
> >>  hw/ppc/spapr_caps.c| 18 ++
> >>  target/ppc/kvm.c   | 16 ++--
> >>  4 files changed, 37 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index e579eaf28c05..5ccac4d56871 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -81,8 +81,10 @@ typedef enum {
> >>  #define SPAPR_CAP_CCF_ASSIST0x09
> >>  /* Implements PAPR FWNMI option */
> >>  #define SPAPR_CAP_FWNMI 0x0A
> >> +/* Implements PAPR PVR option */
> >> +#define SPAPR_CAP_PVR   0x0B
> >>  /* Num Caps */
> >> -#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
> >> +#define SPAPR_CAP_NUM   (SPAPR_CAP_PVR + 1)
> >>  
> >>  /*
> >>   * Capability Values
> >> @@ -912,6 +914,7 @@ extern const VMStateDescription 
> >> vmstate_spapr_cap_nested_kvm_hv;
> >>  extern const VMStateDescription vmstate_spapr_cap_large_decr;
> >>  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
> >>  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
> >> +extern const VMStateDescription vmstate_spapr_cap_pvr;
> >>  
> >>  static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
> >>  {
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 841b5ec59b12..ecc74c182b9f 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -4535,6 +4535,7 @@ static void spapr_machine_class_init(ObjectClass 
> >> *oc, void *data)
> >>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
> >>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> >> +smc->default_caps.caps[SPAPR_CAP_PVR] = SPAPR_CAP_ON;
> >>  spapr_caps_add_properties(smc, &error_abort);
> >>  smc->irq = &spapr_irq_dual;
> >>  smc->dr_phb_enabled = true;
> >> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> >> index eb54f9422722..398b72b77f9f 100644
> >> --- a/hw/ppc/spapr_caps.c
> >> +++ b/hw/ppc/spapr_caps.c
> >> @@ -525,6 +525,14 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
> >> uint8_t val,
> >>  }
> >>  }
> >>  
> >> +static void cap_pvr_apply(SpaprMachineState *spapr, uint8_t val, Error 
> >> **errp)
> >> +{
> >> +if (val) {
> >> +return;
> >> +}
> >> +warn_report("If you're uing kvm-hv.ko, only \"-cpu host\" is 
> >> supported");
> >> +}
> >> +
> >>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >>  [SPAPR_CAP_HTM] = {
> >>  .name = "htm",
> >> @@ -633,6 +641,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = 
> >> {
> >>  .type = "bool",
> >>  .apply = cap_fwnmi_apply,
> >>  },
> >> +[SPAPR_CAP_PVR] = {
> >> +.name = "pvr",
> >> +.description = "Enforce PVR in KVM",
> >> +.index = SPAPR_CAP_PVR,
> >> +.get = spapr_cap_get_bool,
> >> +.set = spapr_cap_set_bool,
> >> +.type = "bool",
> >> +.apply = cap_pvr_apply,
> >> +},
> >>  };
> >>  
> >>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> >> @@ -773,6 +790,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, 
> >> SPAPR_CAP_NESTED_KVM_HV);
> >>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >>  S

Re: [PATCH v5 06/18] nvme: refactor nvme_addr_read

2020-05-04 Thread Klaus Jensen

On May  5 07:48, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Pull the controller memory buffer check to its own function. The check
> will be used on its own in later patches.
> 
> Signed-off-by: Klaus Jensen 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Maxim Levitsky 
> Reviewed-by: Keith Busch 
> ---
>  hw/block/nvme.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 

Woops, noticed that Maxim's R-b is wrong here, please update it to

Reviewed-by: Maxim Levitsky 

(without the 'y') when merging.

[PATCH v5 16/18] nvme: factor out pmr setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
---
 hw/block/nvme.c | 95 ++---
 1 file changed, 51 insertions(+), 44 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bd255a5c711a..b0b3d3ffb75f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -58,6 +58,7 @@
 #define NVME_REG_SIZE 0x1000
 #define NVME_DB_SIZE  4
 #define NVME_CMB_BIR 2
+#define NVME_PMR_BIR 2
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -1459,6 +1460,55 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice 
*pci_dev)
  PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
 }
 
+static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+/* Controller Capabilities register */
+NVME_CAP_SET_PMRS(n->bar.cap, 1);
+
+/* PMR Capabities register */
+n->bar.pmrcap = 0;
+NVME_PMRCAP_SET_RDS(n->bar.pmrcap, 0);
+NVME_PMRCAP_SET_WDS(n->bar.pmrcap, 0);
+NVME_PMRCAP_SET_BIR(n->bar.pmrcap, NVME_PMR_BIR);
+NVME_PMRCAP_SET_PMRTU(n->bar.pmrcap, 0);
+/* Turn on bit 1 support */
+NVME_PMRCAP_SET_PMRWBM(n->bar.pmrcap, 0x02);
+NVME_PMRCAP_SET_PMRTO(n->bar.pmrcap, 0);
+NVME_PMRCAP_SET_CMSS(n->bar.pmrcap, 0);
+
+/* PMR Control register */
+n->bar.pmrctl = 0;
+NVME_PMRCTL_SET_EN(n->bar.pmrctl, 0);
+
+/* PMR Status register */
+n->bar.pmrsts = 0;
+NVME_PMRSTS_SET_ERR(n->bar.pmrsts, 0);
+NVME_PMRSTS_SET_NRDY(n->bar.pmrsts, 0);
+NVME_PMRSTS_SET_HSTS(n->bar.pmrsts, 0);
+NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 0);
+
+/* PMR Elasticity Buffer Size register */
+n->bar.pmrebs = 0;
+NVME_PMREBS_SET_PMRSZU(n->bar.pmrebs, 0);
+NVME_PMREBS_SET_RBB(n->bar.pmrebs, 0);
+NVME_PMREBS_SET_PMRWBZ(n->bar.pmrebs, 0);
+
+/* PMR Sustained Write Throughput register */
+n->bar.pmrswtp = 0;
+NVME_PMRSWTP_SET_PMRSWTU(n->bar.pmrswtp, 0);
+NVME_PMRSWTP_SET_PMRSWTV(n->bar.pmrswtp, 0);
+
+/* PMR Memory Space Control register */
+n->bar.pmrmsc = 0;
+NVME_PMRMSC_SET_CMSE(n->bar.pmrmsc, 0);
+NVME_PMRMSC_SET_CBA(n->bar.pmrmsc, 0);
+
+pci_register_bar(pci_dev, NVME_PMRCAP_BIR(n->bar.pmrcap),
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64 |
+ PCI_BASE_ADDRESS_MEM_PREFETCH, &n->pmrdev->mr);
+}
+
 static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -1537,50 +1587,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 if (n->params.cmb_size_mb) {
 nvme_init_cmb(n, pci_dev);
 } else if (n->pmrdev) {
-/* Controller Capabilities register */
-NVME_CAP_SET_PMRS(n->bar.cap, 1);
-
-/* PMR Capabities register */
-n->bar.pmrcap = 0;
-NVME_PMRCAP_SET_RDS(n->bar.pmrcap, 0);
-NVME_PMRCAP_SET_WDS(n->bar.pmrcap, 0);
-NVME_PMRCAP_SET_BIR(n->bar.pmrcap, 2);
-NVME_PMRCAP_SET_PMRTU(n->bar.pmrcap, 0);
-/* Turn on bit 1 support */
-NVME_PMRCAP_SET_PMRWBM(n->bar.pmrcap, 0x02);
-NVME_PMRCAP_SET_PMRTO(n->bar.pmrcap, 0);
-NVME_PMRCAP_SET_CMSS(n->bar.pmrcap, 0);
-
-/* PMR Control register */
-n->bar.pmrctl = 0;
-NVME_PMRCTL_SET_EN(n->bar.pmrctl, 0);
-
-/* PMR Status register */
-n->bar.pmrsts = 0;
-NVME_PMRSTS_SET_ERR(n->bar.pmrsts, 0);
-NVME_PMRSTS_SET_NRDY(n->bar.pmrsts, 0);
-NVME_PMRSTS_SET_HSTS(n->bar.pmrsts, 0);
-NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 0);
-
-/* PMR Elasticity Buffer Size register */
-n->bar.pmrebs = 0;
-NVME_PMREBS_SET_PMRSZU(n->bar.pmrebs, 0);
-NVME_PMREBS_SET_RBB(n->bar.pmrebs, 0);
-NVME_PMREBS_SET_PMRWBZ(n->bar.pmrebs, 0);
-
-/* PMR Sustained Write Throughput register */
-n->bar.pmrswtp = 0;
-NVME_PMRSWTP_SET_PMRSWTU(n->bar.pmrswtp, 0);
-NVME_PMRSWTP_SET_PMRSWTV(n->bar.pmrswtp, 0);
-
-/* PMR Memory Space Control register */
-n->bar.pmrmsc = 0;
-NVME_PMRMSC_SET_CMSE(n->bar.pmrmsc, 0);
-NVME_PMRMSC_SET_CBA(n->bar.pmrmsc, 0);
-
-pci_register_bar(pci_dev, NVME_PMRCAP_BIR(n->bar.pmrcap),
-PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
-PCI_BASE_ADDRESS_MEM_PREFETCH, &n->pmrdev->mr);
+nvme_init_pmr(n, pci_dev);
 }
 
 for (i = 0; i < n->num_namespaces; i++) {
-- 
2.26.2

[PATCH v5 17/18] nvme: do cmb/pmr init as part of pci init

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
---
 hw/block/nvme.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b0b3d3ffb75f..6454f3810e5b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1523,6 +1523,12 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev)
 pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
  PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem);
 msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
+
+if (n->params.cmb_size_mb) {
+nvme_init_cmb(n, pci_dev);
+} else if (n->pmrdev) {
+nvme_init_pmr(n, pci_dev);
+}
 }
 
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
@@ -1584,12 +1590,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 n->bar.vs = 0x00010200;
 n->bar.intmc = n->bar.intms = 0;
 
-if (n->params.cmb_size_mb) {
-nvme_init_cmb(n, pci_dev);
-} else if (n->pmrdev) {
-nvme_init_pmr(n, pci_dev);
-}
-
 for (i = 0; i < n->num_namespaces; i++) {
 nvme_init_namespace(n, &n->namespaces[i], &local_err);
 if (local_err) {
-- 
2.26.2

[PATCH v5 14/18] nvme: factor out pci setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c05124676e5e..2e65a780f4f0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1436,6 +1436,22 @@ static void nvme_init_namespace(NvmeCtrl *n, 
NvmeNamespace *ns, Error **errp)
 id_ns->nuse = id_ns->ncap;
 }
 
+static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+uint8_t *pci_conf = pci_dev->config;
+
+pci_conf[PCI_INTERRUPT_PIN] = 1;
+pci_config_set_prog_interface(pci_conf, 0x2);
+pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+pcie_endpoint_cap_init(pci_dev, 0x80);
+
+memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
+  n->reg_size);
+pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem);
+msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1459,19 +1475,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
+nvme_init_pci(n, pci_dev);
+
 pci_conf = pci_dev->config;
-pci_conf[PCI_INTERRUPT_PIN] = 1;
-pci_config_set_prog_interface(pci_dev->config, 0x2);
-pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
-pcie_endpoint_cap_init(pci_dev, 0x80);
-
-memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
-  "nvme", n->reg_size);
-pci_register_bar(pci_dev, 0,
-PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
-&n->iomem);
-msix_init_exclusive_bar(pci_dev, n->params.max_ioqpairs + 1, 4, NULL);
-
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
-- 
2.26.2

[PATCH v5 12/18] nvme: add namespace helpers

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Introduce some small helpers to make the next patches easier on the eye.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c |  3 +--
 hw/block/nvme.h | 17 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index acdd735e0aca..720cc91bcb6a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1571,8 +1571,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 id_ns->dps = 0;
 id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
 id_ns->ncap  = id_ns->nuse = id_ns->nsze =
-cpu_to_le64(n->ns_size >>
-id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas)].ds);
+cpu_to_le64(nvme_ns_nlbas(n, ns));
 }
 }
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 6714616e376e..345eb7bf3a51 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -61,6 +61,17 @@ typedef struct NvmeNamespace {
 NvmeIdNsid_ns;
 } NvmeNamespace;
 
+static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
+{
+NvmeIdNs *id_ns = &ns->id_ns;
+return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
+static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
+{
+return nvme_ns_lbaf(ns)->ds;
+}
+
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
 OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -97,4 +108,10 @@ typedef struct NvmeCtrl {
 NvmeIdCtrl  id_ctrl;
 } NvmeCtrl;
 
+/* calculate the number of LBAs that the namespace can accomodate */
+static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, NvmeNamespace *ns)
+{
+return n->ns_size >> nvme_ns_lbads(ns);
+}
+
 #endif /* HW_NVME_H */
-- 
2.26.2

[PATCH v5 11/18] nvme: factor out block backend setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 72e838a476af..acdd735e0aca 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1408,6 +1408,13 @@ static void nvme_init_state(NvmeCtrl *n)
 n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
 }
 
+static void nvme_init_blk(NvmeCtrl *n, Error **errp)
+{
+blkconf_blocksizes(&n->conf);
+blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
+  false, errp);
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1432,9 +1439,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 
-blkconf_blocksizes(&n->conf);
-if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
-   false, errp)) {
+nvme_init_blk(n, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
 return;
 }
 
-- 
2.26.2

[PATCH v5 13/18] nvme: factor out namespace setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky  
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 720cc91bcb6a..c05124676e5e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1415,6 +1415,27 @@ static void nvme_init_blk(NvmeCtrl *n, Error **errp)
   false, errp);
 }
 
+static void nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+{
+int64_t bs_size;
+NvmeIdNs *id_ns = &ns->id_ns;
+
+bs_size = blk_getlength(n->conf.blk);
+if (bs_size < 0) {
+error_setg_errno(errp, -bs_size, "could not get backing file size");
+return;
+}
+
+n->ns_size = bs_size;
+
+id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
+id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(n, ns));
+
+/* no thin provisioning */
+id_ns->ncap = id_ns->nsze;
+id_ns->nuse = id_ns->ncap;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1422,7 +1443,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 Error *local_err = NULL;
 
 int i;
-int64_t bs_size;
 uint8_t *pci_conf;
 
 nvme_check_constraints(n, &local_err);
@@ -1433,12 +1453,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 nvme_init_state(n);
 
-bs_size = blk_getlength(n->conf.blk);
-if (bs_size < 0) {
-error_setg(errp, "could not get backing file size");
-return;
-}
-
 nvme_init_blk(n, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -1451,8 +1465,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
-n->ns_size = bs_size / (uint64_t)n->num_namespaces;
-
 memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
   "nvme", n->reg_size);
 pci_register_bar(pci_dev, 0,
@@ -1561,17 +1573,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 }
 
 for (i = 0; i < n->num_namespaces; i++) {
-NvmeNamespace *ns = &n->namespaces[i];
-NvmeIdNs *id_ns = &ns->id_ns;
-id_ns->nsfeat = 0;
-id_ns->nlbaf = 0;
-id_ns->flbas = 0;
-id_ns->mc = 0;
-id_ns->dpc = 0;
-id_ns->dps = 0;
-id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
-id_ns->ncap  = id_ns->nuse = id_ns->nsze =
-cpu_to_le64(nvme_ns_nlbas(n, ns));
+nvme_init_namespace(n, &n->namespaces[i], &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 }
 }
 
-- 
2.26.2

[PATCH v5 18/18] nvme: factor out controller identify setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 49 ++---
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6454f3810e5b..73489a1e0eb6 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1531,32 +1531,11 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev)
 }
 }
 
-static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
-NvmeCtrl *n = NVME(pci_dev);
 NvmeIdCtrl *id = &n->id_ctrl;
-Error *local_err = NULL;
+uint8_t *pci_conf = pci_dev->config;
 
-int i;
-uint8_t *pci_conf;
-
-nvme_check_constraints(n, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-nvme_init_state(n);
-
-nvme_init_blk(n, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
-nvme_init_pci(n, pci_dev);
-
-pci_conf = pci_dev->config;
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
@@ -1589,6 +1568,30 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 n->bar.vs = 0x00010200;
 n->bar.intmc = n->bar.intms = 0;
+}
+
+static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+{
+NvmeCtrl *n = NVME(pci_dev);
+Error *local_err = NULL;
+
+int i;
+
+nvme_check_constraints(n, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+nvme_init_state(n);
+nvme_init_blk(n, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+nvme_init_pci(n, pci_dev);
+nvme_init_ctrl(n, pci_dev);
 
 for (i = 0; i < n->num_namespaces; i++) {
 nvme_init_namespace(n, &n->namespaces[i], &local_err);
-- 
2.26.2

[PATCH v5 15/18] nvme: factor out cmb setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2e65a780f4f0..bd255a5c711a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -57,6 +57,7 @@
 
 #define NVME_REG_SIZE 0x1000
 #define NVME_DB_SIZE  4
+#define NVME_CMB_BIR 2
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -1436,6 +1437,28 @@ static void nvme_init_namespace(NvmeCtrl *n, 
NvmeNamespace *ns, Error **errp)
 id_ns->nuse = id_ns->ncap;
 }
 
+static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
+{
+NVME_CMBLOC_SET_BIR(n->bar.cmbloc, NVME_CMB_BIR);
+NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
+
+NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
+NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
+NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
+NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
+NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
+
+n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
+  "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64 |
+ PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
+}
+
 static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -1512,25 +1535,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 n->bar.intmc = n->bar.intms = 0;
 
 if (n->params.cmb_size_mb) {
-
-NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
-NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
-
-NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
-NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
-NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
-NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
-NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
-NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
-NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
-
-n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
-  "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
-PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
-PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
-
+nvme_init_cmb(n, pci_dev);
 } else if (n->pmrdev) {
 /* Controller Capabilities register */
 NVME_CAP_SET_PMRS(n->bar.cap, 1);
-- 
2.26.2

[PATCH v5 09/18] nvme: factor out property/constraint checks

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 48 ++--
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cc6d3059ff7f..13fb90c77e90 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1352,24 +1352,19 @@ static const MemoryRegionOps nvme_cmb_ops = {
 },
 };
 
-static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
 {
-NvmeCtrl *n = NVME(pci_dev);
-NvmeIdCtrl *id = &n->id_ctrl;
+NvmeParams *params = &n->params;
 
-int i;
-int64_t bs_size;
-uint8_t *pci_conf;
-
-if (n->params.num_queues) {
+if (params->num_queues) {
 warn_report("num_queues is deprecated; please use max_ioqpairs "
 "instead");
 
-n->params.max_ioqpairs = n->params.num_queues - 1;
+params->max_ioqpairs = params->num_queues - 1;
 }
 
-if (n->params.max_ioqpairs < 1 ||
-n->params.max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
+if (params->max_ioqpairs < 1 ||
+params->max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
 error_setg(errp, "max_ioqpairs must be between 1 and %d",
PCI_MSIX_FLAGS_QSIZE);
 return;
@@ -1380,13 +1375,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-bs_size = blk_getlength(n->conf.blk);
-if (bs_size < 0) {
-error_setg(errp, "could not get backing file size");
-return;
-}
-
-if (!n->params.serial) {
+if (!params->serial) {
 error_setg(errp, "serial property not set");
 return;
 }
@@ -1406,6 +1395,29 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 host_memory_backend_set_mapped(n->pmrdev, true);
 }
+}
+
+static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+{
+NvmeCtrl *n = NVME(pci_dev);
+NvmeIdCtrl *id = &n->id_ctrl;
+Error *local_err = NULL;
+
+int i;
+int64_t bs_size;
+uint8_t *pci_conf;
+
+nvme_check_constraints(n, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+bs_size = blk_getlength(n->conf.blk);
+if (bs_size < 0) {
+error_setg(errp, "could not get backing file size");
+return;
+}
 
 blkconf_blocksizes(&n->conf);
 if (!blkconf_apply_backend_options(&n->conf, blk_is_read_only(n->conf.blk),
-- 
2.26.2

[PATCH v5 10/18] nvme: factor out device state setup

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 13fb90c77e90..72e838a476af 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1397,6 +1397,17 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 }
 }
 
+static void nvme_init_state(NvmeCtrl *n)
+{
+n->num_namespaces = 1;
+/* add one to max_ioqpairs to account for the admin queue pair */
+n->reg_size = pow2ceil(NVME_REG_SIZE +
+   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
+n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
+n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
+n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1413,6 +1424,8 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 
+nvme_init_state(n);
+
 bs_size = blk_getlength(n->conf.blk);
 if (bs_size < 0) {
 error_setg(errp, "could not get backing file size");
@@ -1431,17 +1444,8 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
-n->num_namespaces = 1;
-
-/* add one to max_ioqpairs to account for the admin queue pair */
-n->reg_size = pow2ceil(NVME_REG_SIZE +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
 n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
-n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
-n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
-
 memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
   "nvme", n->reg_size);
 pci_register_bar(pci_dev, 0,
-- 
2.26.2

[PATCH v5 02/18] nvme: rename trace events to pci_nvme

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Change the prefix of all nvme device related trace events to 'pci_nvme'
to not clash with trace events from the nvme block driver.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c   | 198 +-
 hw/block/trace-events | 180 +++---
 2 files changed, 188 insertions(+), 190 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1d7d7fb3c67a..6702812ecdd1 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -126,16 +126,16 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
 if (msix_enabled(&(n->parent_obj))) {
-trace_nvme_irq_msix(cq->vector);
+trace_pci_nvme_irq_msix(cq->vector);
 msix_notify(&(n->parent_obj), cq->vector);
 } else {
-trace_nvme_irq_pin();
+trace_pci_nvme_irq_pin();
 assert(cq->cqid < 64);
 n->irq_status |= 1 << cq->cqid;
 nvme_irq_check(n);
 }
 } else {
-trace_nvme_irq_masked();
+trace_pci_nvme_irq_masked();
 }
 }
 
@@ -160,7 +160,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 int num_prps = (len >> n->page_bits) + 1;
 
 if (unlikely(!prp1)) {
-trace_nvme_err_invalid_prp();
+trace_pci_nvme_err_invalid_prp();
 return NVME_INVALID_FIELD | NVME_DNR;
 } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
@@ -174,7 +174,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 len -= trans_len;
 if (len) {
 if (unlikely(!prp2)) {
-trace_nvme_err_invalid_prp2_missing();
+trace_pci_nvme_err_invalid_prp2_missing();
 goto unmap;
 }
 if (len > n->page_size) {
@@ -190,7 +190,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 
 if (i == n->max_prp_ents - 1 && len > n->page_size) {
 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
-trace_nvme_err_invalid_prplist_ent(prp_ent);
+trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
 goto unmap;
 }
 
@@ -203,7 +203,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 }
 
 if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
-trace_nvme_err_invalid_prplist_ent(prp_ent);
+trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
 goto unmap;
 }
 
@@ -218,7 +218,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 }
 } else {
 if (unlikely(prp2 & (n->page_size - 1))) {
-trace_nvme_err_invalid_prp2_align(prp2);
+trace_pci_nvme_err_invalid_prp2_align(prp2);
 goto unmap;
 }
 if (qsg->nsg) {
@@ -266,20 +266,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 QEMUIOVector iov;
 uint16_t status = NVME_SUCCESS;
 
-trace_nvme_dma_read(prp1, prp2);
+trace_pci_nvme_dma_read(prp1, prp2);
 
 if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 if (qsg.nsg > 0) {
 if (unlikely(dma_buf_read(ptr, len, &qsg))) {
-trace_nvme_err_invalid_dma();
+trace_pci_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
 qemu_sglist_destroy(&qsg);
 } else {
 if (unlikely(qemu_iovec_from_buf(&iov, 0, ptr, len) != len)) {
-trace_nvme_err_invalid_dma();
+trace_pci_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
 qemu_iovec_destroy(&iov);
@@ -368,7 +368,7 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace 
*ns, NvmeCmd *cmd,
 uint32_t count = nlb << data_shift;
 
 if (unlikely(slba + nlb > ns->id_ns.nsze)) {
-trace_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
+trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
 return NVME_LBA_RANGE | NVME_DNR;
 }
 
@@ -396,11 +396,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, 
NvmeCmd *cmd,
 int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
 enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 
-trace_nvme_rw(is_write ? "write" : "read", nlb, data_size, slba);
+trace_pci_nvme_rw(is_write ? "write" : "read", nlb, data_size, slba);
 
 if (unlikely((slba + nlb) > ns->id_ns.nsze)) {
 block_acct_invalid(blk_get_

[PATCH v5 00/18] nvme: refactoring and cleanups

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Changes since v5

No functional changes, just updated Reviewed-by tags. Also, I screwed up
the CC list when sending v4.

Philippe and Keith, please add a Reviewed-by to

  * "nvme: factor out pmr setup" and
  * "do cmb/pmr init as part of pci init"

since the first one was added and the second one was changed in v4 when
rebasing on Kevins block-next tree which had the PMR work that was not
in master at the time.

With those in place, it should be ready for Kevin to merge.

Klaus Jensen (18):
  nvme: fix pci doorbell size calculation
  nvme: rename trace events to pci_nvme
  nvme: remove superfluous breaks
  nvme: move device parameters to separate struct
  nvme: use constants in identify
  nvme: refactor nvme_addr_read
  nvme: add max_ioqpairs device parameter
  nvme: remove redundant cmbloc/cmbsz members
  nvme: factor out property/constraint checks
  nvme: factor out device state setup
  nvme: factor out block backend setup
  nvme: add namespace helpers
  nvme: factor out namespace setup
  nvme: factor out pci setup
  nvme: factor out cmb setup
  nvme: factor out pmr setup
  nvme: do cmb/pmr init as part of pci init
  nvme: factor out controller identify setup

 hw/block/nvme.c   | 543 --
 hw/block/nvme.h   |  31 ++-
 hw/block/trace-events | 180 +++---
 include/block/nvme.h  |   8 +
 4 files changed, 429 insertions(+), 333 deletions(-)

-- 
2.26.2

[PATCH v5 05/18] nvme: use constants in identify

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c  | 8 
 include/block/nvme.h | 8 
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e26db7591574..4058f2c79796 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -693,7 +693,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeIdentify 
*c)
 
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeIdentify *c)
 {
-static const int data_len = 4 * KiB;
+static const int data_len = NVME_IDENTIFY_DATA_SIZE;
 uint32_t min_nsid = le32_to_cpu(c->nsid);
 uint64_t prp1 = le64_to_cpu(c->prp1);
 uint64_t prp2 = le64_to_cpu(c->prp2);
@@ -723,11 +723,11 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd *cmd)
 NvmeIdentify *c = (NvmeIdentify *)cmd;
 
 switch (le32_to_cpu(c->cns)) {
-case 0x00:
+case NVME_ID_CNS_NS:
 return nvme_identify_ns(n, c);
-case 0x01:
+case NVME_ID_CNS_CTRL:
 return nvme_identify_ctrl(n, c);
-case 0x02:
+case NVME_ID_CNS_NS_ACTIVE_LIST:
 return nvme_identify_nslist(n, c);
 default:
 trace_pci_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns));
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 5525c8e34308..1720ee1d5158 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -705,6 +705,14 @@ typedef struct NvmePSD {
 uint8_t resv[16];
 } NvmePSD;
 
+#define NVME_IDENTIFY_DATA_SIZE 4096
+
+enum {
+NVME_ID_CNS_NS = 0x0,
+NVME_ID_CNS_CTRL   = 0x1,
+NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
+};
+
 typedef struct NvmeIdCtrl {
 uint16_tvid;
 uint16_tssvid;
-- 
2.26.2

[PATCH v5 07/18] nvme: add max_ioqpairs device parameter

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.

Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 51 ++---
 hw/block/nvme.h |  3 ++-
 2 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 623a88be93dc..3875a5f3dcbf 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -20,7 +20,7 @@
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
  *  [pmrdev=,] \
- *  num_queues=
+ *  max_ioqpairs=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -36,6 +36,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "qemu/error-report.h"
 #include "hw/block/block.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pci.h"
@@ -86,12 +87,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -653,7 +654,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 trace_pci_nvme_err_invalid_create_cq_addr(prp1);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
-if (unlikely(vector > n->params.num_queues)) {
+if (unlikely(vector > n->params.max_ioqpairs + 1)) {
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -805,8 +806,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = cpu_to_le32((n->params.num_queues - 2) |
- ((n->params.num_queues - 2) << 16));
+result = cpu_to_le32((n->params.max_ioqpairs - 1) |
+ ((n->params.max_ioqpairs - 1) << 16));
 trace_pci_nvme_getfeat_numq(result);
 break;
 case NVME_TIMESTAMP:
@@ -850,10 +851,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 case NVME_NUMBER_OF_QUEUES:
 trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->params.num_queues - 1,
-n->params.num_queues - 1);
-req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
-  ((n->params.num_queues - 2) << 16));
+n->params.max_ioqpairs,
+n->params.max_ioqpairs);
+req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
+  ((n->params.max_ioqpairs - 1) << 16));
 break;
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, cmd);
@@ -924,12 +925,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
 blk_drain(n->conf.blk);
 
-for (i = 0; i < n->params.num_queues; i++) {
+for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
 if (n->sq[i] != NULL) {
 nvme_free_sq(n->sq[i], n);
 }
 }
-for (i = 0; i < n->params.num_queues; i++) {
+for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
 if (n->cq[i] != NULL) {
 nvme_free_cq(n->cq[i], n);
 }
@@ -1360,8 +1361,17 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 int64_t bs_size;
 uint8_t *pci_conf;
 
-if (!n->params.num_queues) {
-error_setg(errp, "num_queues can't be zero");
+if (n->params.num_queues) {
+warn_report("num_queues is deprecated; please use max_ioqpairs "
+"instead");
+
+n->params.max_ioqpairs = n->params.num_queues - 1;
+}
+
+if (n->params.max_ioqpairs < 1 ||
+n->params.max_ioqpairs > PCI_MSIX_FLAGS_QSIZE) {
+error_setg(errp, "max_ioqpairs must be between 1 and %d",
+   PCI_MSIX_FLAGS_QSIZE);
 return;
 }
 
@@ -1411,21 +1421,21 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 n->num_namespaces = 1;
 
-/* num_queues is really num

[PATCH v5 06/18] nvme: refactor nvme_addr_read

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Pull the controller memory buffer check to its own function. The check
will be used on its own in later patches.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4058f2c79796..623a88be93dc 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -66,14 +66,22 @@
 
 static void nvme_process_sq(void *opaque);
 
+static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
+{
+hwaddr low = n->ctrl_mem.addr;
+hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
+
+return addr >= low && addr < hi;
+}
+
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-if (n->cmbsz && addr >= n->ctrl_mem.addr &&
-addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
+if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
 memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
-} else {
-pci_dma_read(&n->parent_obj, addr, buf, size);
+return;
 }
+
+pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
-- 
2.26.2

[PATCH v5 04/18] nvme: move device parameters to separate struct

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.

Signed-off-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
---
 hw/block/nvme.c | 49 ++---
 hw/block/nvme.h | 11 ---
 2 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f67499d85f3a..e26db7591574 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -78,12 +78,12 @@ static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->params.num_queues && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->num_queues && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->params.num_queues && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -645,7 +645,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd *cmd)
 trace_pci_nvme_err_invalid_create_cq_addr(prp1);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
-if (unlikely(vector > n->num_queues)) {
+if (unlikely(vector > n->params.num_queues)) {
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -797,7 +797,8 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 
16));
+result = cpu_to_le32((n->params.num_queues - 2) |
+ ((n->params.num_queues - 2) << 16));
 trace_pci_nvme_getfeat_numq(result);
 break;
 case NVME_TIMESTAMP:
@@ -841,9 +842,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 case NVME_NUMBER_OF_QUEUES:
 trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->num_queues - 1, n->num_queues - 1);
-req->cqe.result =
-cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
+n->params.num_queues - 1,
+n->params.num_queues - 1);
+req->cqe.result = cpu_to_le32((n->params.num_queues - 2) |
+  ((n->params.num_queues - 2) << 16));
 break;
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, cmd);
@@ -914,12 +916,12 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
 
 blk_drain(n->conf.blk);
 
-for (i = 0; i < n->num_queues; i++) {
+for (i = 0; i < n->params.num_queues; i++) {
 if (n->sq[i] != NULL) {
 nvme_free_sq(n->sq[i], n);
 }
 }
-for (i = 0; i < n->num_queues; i++) {
+for (i = 0; i < n->params.num_queues; i++) {
 if (n->cq[i] != NULL) {
 nvme_free_cq(n->cq[i], n);
 }
@@ -1350,7 +1352,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 int64_t bs_size;
 uint8_t *pci_conf;
 
-if (!n->num_queues) {
+if (!n->params.num_queues) {
 error_setg(errp, "num_queues can't be zero");
 return;
 }
@@ -1366,12 +1368,12 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-if (!n->serial) {
+if (!n->params.serial) {
 error_setg(errp, "serial property not set");
 return;
 }
 
-if (!n->cmb_size_mb && n->pmrdev) {
+if (!n->params.cmb_size_mb && n->pmrdev) {
 if (host_memory_backend_is_mapped(n->pmrdev)) {
 char *path = 
object_get_canonical_path_component(OBJECT(n->pmrdev));
 error_setg(errp, "can't use already busy memdev: %s", path);
@@ -1402,25 +1404,26 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 n->num_namespaces = 1;
 
 /* num_queues is really number of pairs, so each has two doorbells */
-n->reg_size = pow2ceil(NVME_REG_SIZE + 2 * n->num_queues * NVME_DB_SIZE);
+n->reg_size = pow2ceil(NVME_REG_SIZE +
+   2 * n->params.num_queues * NVME_DB_SIZE);
 n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
 n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-n->sq = g_new0(NvmeSQueue *, n->num_queues);
-n->cq = g_new0(NvmeCQueue *, n->num_queues);
+n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
+n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
 
 memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n,
   "nvme", n->reg_size);
 pci_register_bar(pci_dev, 0,
 PCI_

[PATCH v5 01/18] nvme: fix pci doorbell size calculation

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

The size of the BAR is 0x1000 (main registers) + 8 bytes for each
queue. Currently, the size of the BAR is calculated like so:

n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);

Since the 'num_queues' parameter already accounts for the admin queue,
this should in any case not need to be incremented by one. Also, the
size should be initialized to (0x1000).

n->reg_size = pow2ceil(0x1000 + 2 * n->num_queues * 4);

This, with the default value of num_queues (64), we will set aside room
for 1 admin queue and 63 I/O queues (4 bytes per doorbell, 2 doorbells
per queue).

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9b453423cf2c..1d7d7fb3c67a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -54,6 +54,9 @@
 #include "trace.h"
 #include "nvme.h"
 
+#define NVME_REG_SIZE 0x1000
+#define NVME_DB_SIZE  4
+
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
 (trace_##trace)(__VA_ARGS__); \
@@ -1403,7 +1406,9 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
 n->num_namespaces = 1;
-n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
+
+/* num_queues is really number of pairs, so each has two doorbells */
+n->reg_size = pow2ceil(NVME_REG_SIZE + 2 * n->num_queues * NVME_DB_SIZE);
 n->ns_size = bs_size / (uint64_t)n->num_namespaces;
 
 n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
-- 
2.26.2

[PATCH v5 08/18] nvme: remove redundant cmbloc/cmbsz members

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 7 ++-
 hw/block/nvme.h | 2 --
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3875a5f3dcbf..cc6d3059ff7f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -77,7 +77,7 @@ static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
+if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
 memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
 return;
 }
@@ -171,7 +171,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector 
*iov, uint64_t prp1,
 if (unlikely(!prp1)) {
 trace_pci_nvme_err_invalid_prp();
 return NVME_INVALID_FIELD | NVME_DNR;
-} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
+} else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr &&
prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
 qsg->nsg = 0;
 qemu_iovec_init(iov, num_prps);
@@ -1483,9 +1483,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
 NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb);
 
-n->cmbloc = n->bar.cmbloc;
-n->cmbsz = n->bar.cmbsz;
-
 n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
 memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
   "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index c4e3edfebe0b..6714616e376e 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -82,8 +82,6 @@ typedef struct NvmeCtrl {
 uint32_tnum_namespaces;
 uint32_tmax_q_ents;
 uint64_tns_size;
-uint32_tcmbsz;
-uint32_tcmbloc;
 uint8_t *cmbuf;
 uint64_tirq_status;
 uint64_thost_timestamp; /* Timestamp sent by the host 
*/
-- 
2.26.2

[PATCH v5 03/18] nvme: remove superfluous breaks

2020-05-04 Thread Klaus Jensen

From: Klaus Jensen 

These break statements was left over when commit 3036a626e9ef ("nvme:
add Get/Set Feature Timestamp support") was merged.

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6702812ecdd1..f67499d85f3a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -802,7 +802,6 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, 
NvmeRequest *req)
 break;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, cmd);
-break;
 default:
 trace_pci_nvme_err_invalid_getfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -846,11 +845,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 req->cqe.result =
 cpu_to_le32((n->num_queues - 2) | ((n->num_queues - 2) << 16));
 break;
-
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, cmd);
-break;
-
 default:
 trace_pci_nvme_err_invalid_setfeat(dw10);
 return NVME_INVALID_FIELD | NVME_DNR;
-- 
2.26.2

Re: [PATCH 4/4] hw/i386: Make vmmouse helpers static

2020-05-04 Thread Philippe Mathieu-Daudé


On 5/4/20 7:29 PM, Richard Henderson wrote:

On 5/4/20 1:33 AM, Philippe Mathieu-Daudé wrote:

+++ b/hw/i386/vmport.c
@@ -23,10 +23,10 @@
   */
  #include "qemu/osdep.h"
  #include "hw/isa/isa.h"
-#include "hw/i386/pc.h"
  #include "sysemu/hw_accel.h"
  #include "qemu/log.h"
  #include "vmport.h"
+#include "cpu.h"
  #include "trace.h"
  
  #define VMPORT_CMD_GETVERSION 0x0a

@@ -109,27 +109,6 @@ static uint32_t vmport_cmd_ram_size(void *opaque, uint32_t 
addr)
  return ram_size;
  }
  
-/* vmmouse helpers */

-void vmmouse_get_data(uint32_t *data)
-{
-X86CPU *cpu = X86_CPU(current_cpu);
-CPUX86State *env = &cpu->env;
-
-data[0] = env->regs[R_EAX]; data[1] = env->regs[R_EBX];
-data[2] = env->regs[R_ECX]; data[3] = env->regs[R_EDX];
-data[4] = env->regs[R_ESI]; data[5] = env->regs[R_EDI];
-}


Why are you adding "cpu.h" when removing code?


Because this file still use the X86 register definitions:

  static uint32_t vmport_cmd_get_version(void *opaque, uint32_t addr)
  {
  X86CPU *cpu = X86_CPU(current_cpu);

  cpu->env.regs[R_EBX] = VMPORT_MAGIC;
  return 6;
  }


Does that mean you don't need to add "cpu.h" to vmmouse.c?


Now both files vmmouse/vmport uses the X86 register definitions, but 
they don't use anything declared in "hw/i386/pc.h".





r~

Re: [PATCH v4 00/18] nvme: factor out cmb/pmr setup

2020-05-04 Thread Philippe Mathieu-Daudé


Hi Klaus,

On 5/5/20 6:31 AM, Klaus Jensen wrote:

On Apr 29 07:40, Klaus Jensen wrote:

On Apr 22 13:01, Klaus Jensen wrote:

From: Klaus Jensen 

Changes since v3

* Remove the addition of a new PROPERTIES macro in "nvme: move device
   parameters to separate struct" (Philippe)

* Add NVME_PMR_BIR constant and use it in PMR setup.

* Split "nvme: factor out cmb/pmr setup" into
   - "nvme: factor out cmb setup",
   - "nvme: factor out pmr setup" and
   - "nvme: do cmb/pmr init as part of pci init"
   (Philippe)


Klaus Jensen (18):
   nvme: fix pci doorbell size calculation
   nvme: rename trace events to pci_nvme
   nvme: remove superfluous breaks
   nvme: move device parameters to separate struct
   nvme: use constants in identify
   nvme: refactor nvme_addr_read
   nvme: add max_ioqpairs device parameter
   nvme: remove redundant cmbloc/cmbsz members
   nvme: factor out property/constraint checks
   nvme: factor out device state setup
   nvme: factor out block backend setup
   nvme: add namespace helpers
   nvme: factor out namespace setup
   nvme: factor out pci setup
   nvme: factor out cmb setup
   nvme: factor out pmr setup
   nvme: do cmb/pmr init as part of pci init
   nvme: factor out controller identify setup

  hw/block/nvme.c   | 543 --
  hw/block/nvme.h   |  31 ++-
  hw/block/trace-events | 180 +++---
  include/block/nvme.h  |   8 +
  4 files changed, 429 insertions(+), 333 deletions(-)

--
2.26.2




Gentle bump on this.

I apparently managed to screw up the git send-email this time, loosing a
bunch of CCs in the process. Sorry about that.



Bumping again. I have not received any new comments on this.


My understanding is:
- this series goes via Kevin tree
- Kevin was waiting for Keith review (which occurred)
- Kevin tried to apply and asked for rebase
- Minor cosmetics changes on top (not logical)



I'm missing a couple of Reviewed-by's (they all carry Maxim's) on

   nvme: move device parameters to separate struct
   I think this can also carry Philippe's Reviewed-by, since the only
   change is the removal of the PROPERTIES macro.


I don't have this anymore in my mailbox, meaning I processed your 
series, likely giving a R-b.




   nvme: factor out cmb setup
   nvme: factor out pmr setup
   nvme: do cmb/pmr init as part of pci init
   I think these could also carry Reviewed-by from Keith as well,
   since the only change is also factoring out the PMR setup (which
   was not there when Keith reviewed it) and the splitting into two
   trivial patches per request from Philippe.


If respining a rebased v5 with all the previous tags added takes you 
<5min, I recommend you to do it, this will help Kevin. If you are 
comfortable with git-rebase and use git-publish, it can take you only 
2min :)


Looking forward for the next parts until the multiple namespace support!

Regards,

Phil.




Thanks,
Klaus

Re: [PATCH 1/1] target-ppc: fix rlwimi, rlwinm, rlwnm for Clang-9

2020-05-04 Thread David Gibson

On Fri, May 01, 2020 at 03:09:13PM -0400, Daniele Buono wrote:
> Starting with Clang v9, -Wtype-limits is implemented and triggers a
> few "result of comparison is always true" errors when compiling PPC32
> targets.
> 
> The comparisons seem to be necessary only on PPC64, since the
> else branch in PPC32 only has a "g_assert_not_reached();" in all cases.
> 
> This patch restructures the code so that PPC32 does not execute the
> check, while PPC64 works like before
> 
> Signed-off-by: Daniele Buono 

Urgh.  #ifdefs intertangled with if statements gets pretty ugly.  But,
then, it's already pretty ugly, so, applied.

> ---
>  target/ppc/translate.c | 34 +++---
>  1 file changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 807d14faaa..9400fa2c7c 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -1882,6 +1882,7 @@ static void gen_rlwimi(DisasContext *ctx)
>  tcg_gen_deposit_tl(t_ra, t_ra, t_rs, sh, me - mb + 1);
>  } else {
>  target_ulong mask;
> +TCGv_i32 t0;
>  TCGv t1;
>  
>  #if defined(TARGET_PPC64)
> @@ -1891,20 +1892,20 @@ static void gen_rlwimi(DisasContext *ctx)
>  mask = MASK(mb, me);
>  
>  t1 = tcg_temp_new();
> +#if defined(TARGET_PPC64)
>  if (mask <= 0xu) {
> -TCGv_i32 t0 = tcg_temp_new_i32();
> +#endif
> +t0 = tcg_temp_new_i32();
>  tcg_gen_trunc_tl_i32(t0, t_rs);
>  tcg_gen_rotli_i32(t0, t0, sh);
>  tcg_gen_extu_i32_tl(t1, t0);
>  tcg_temp_free_i32(t0);
> -} else {
>  #if defined(TARGET_PPC64)
> +} else {
>  tcg_gen_deposit_i64(t1, t_rs, t_rs, 32, 32);
>  tcg_gen_rotli_i64(t1, t1, sh);
> -#else
> -g_assert_not_reached();
> -#endif
>  }
> +#endif
>  
>  tcg_gen_andi_tl(t1, t1, mask);
>  tcg_gen_andi_tl(t_ra, t_ra, ~mask);
> @@ -1938,7 +1939,9 @@ static void gen_rlwinm(DisasContext *ctx)
>  me += 32;
>  #endif
>  mask = MASK(mb, me);
> +#if defined(TARGET_PPC64)
>  if (mask <= 0xu) {
> +#endif
>  if (sh == 0) {
>  tcg_gen_andi_tl(t_ra, t_rs, mask);
>  } else {
> @@ -1949,15 +1952,13 @@ static void gen_rlwinm(DisasContext *ctx)
>  tcg_gen_extu_i32_tl(t_ra, t0);
>  tcg_temp_free_i32(t0);
>  }
> -} else {
>  #if defined(TARGET_PPC64)
> +} else {
>  tcg_gen_deposit_i64(t_ra, t_rs, t_rs, 32, 32);
>  tcg_gen_rotli_i64(t_ra, t_ra, sh);
>  tcg_gen_andi_i64(t_ra, t_ra, mask);
> -#else
> -g_assert_not_reached();
> -#endif
>  }
> +#endif
>  }
>  if (unlikely(Rc(ctx->opcode) != 0)) {
>  gen_set_Rc0(ctx, t_ra);
> @@ -1972,6 +1973,9 @@ static void gen_rlwnm(DisasContext *ctx)
>  TCGv t_rb = cpu_gpr[rB(ctx->opcode)];
>  uint32_t mb = MB(ctx->opcode);
>  uint32_t me = ME(ctx->opcode);
> +TCGv_i32 t0;
> +TCGv_i32 t1;
> +
>  target_ulong mask;
>  
>  #if defined(TARGET_PPC64)
> @@ -1980,9 +1984,11 @@ static void gen_rlwnm(DisasContext *ctx)
>  #endif
>  mask = MASK(mb, me);
>  
> +#if defined(TARGET_PPC64)
>  if (mask <= 0xu) {
> -TCGv_i32 t0 = tcg_temp_new_i32();
> -TCGv_i32 t1 = tcg_temp_new_i32();
> +#endif
> +t0 = tcg_temp_new_i32();
> +t1 = tcg_temp_new_i32();
>  tcg_gen_trunc_tl_i32(t0, t_rb);
>  tcg_gen_trunc_tl_i32(t1, t_rs);
>  tcg_gen_andi_i32(t0, t0, 0x1f);
> @@ -1990,17 +1996,15 @@ static void gen_rlwnm(DisasContext *ctx)
>  tcg_gen_extu_i32_tl(t_ra, t1);
>  tcg_temp_free_i32(t0);
>  tcg_temp_free_i32(t1);
> -} else {
>  #if defined(TARGET_PPC64)
> +} else {
>  TCGv_i64 t0 = tcg_temp_new_i64();
>  tcg_gen_andi_i64(t0, t_rb, 0x1f);
>  tcg_gen_deposit_i64(t_ra, t_rs, t_rs, 32, 32);
>  tcg_gen_rotl_i64(t_ra, t_ra, t0);
>  tcg_temp_free_i64(t0);
> -#else
> -g_assert_not_reached();
> -#endif
>  }
> +#endif
>  
>  tcg_gen_andi_tl(t_ra, t_ra, mask);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 0/2] checkpatch: fix handling of acpi expected files

2020-05-04 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200504115848.34410-1-...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200504115848.34410-1-...@redhat.com
Subject: [PATCH 0/2] checkpatch: fix handling of acpi expected files
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
0f382d2 checkpatch: ignore allowed diff list
ba77a3e checkpatch: fix acpi check with multiple file name

=== OUTPUT BEGIN ===
1/2 Checking commit ba77a3ef0c54 (checkpatch: fix acpi check with multiple file 
name)
ERROR: line over 90 characters
#74: FILE: scripts/checkpatch.pl:1459:
+   checkfilename($realfile, \$acpi_testexpected, 
\$acpi_nontestexpected);

ERROR: line over 90 characters
#79: FILE: scripts/checkpatch.pl:1463:
+   checkfilename($realfile, \$acpi_testexpected, 
\$acpi_nontestexpected);

total: 2 errors, 0 warnings, 58 lines checked

Patch 1/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/2 Checking commit 0f382d20ef00 (checkpatch: ignore allowed diff list)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200504115848.34410-1-...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices

2020-05-04 Thread Alex Williamson

On Tue, 5 May 2020 04:48:37 +0530
Kirti Wankhede  wrote:

> On 3/26/2020 1:26 AM, Alex Williamson wrote:
> > On Wed, 25 Mar 2020 02:39:02 +0530
> > Kirti Wankhede  wrote:
> >   
> >> These functions save and restore PCI device specific data - config
> >> space of PCI device.
> >> Tested save and restore with MSI and MSIX type.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   hw/vfio/pci.c | 163 
> >> ++
> >>   include/hw/vfio/vfio-common.h |   2 +
> >>   2 files changed, 165 insertions(+)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 6c77c12e44b9..8deb11e87ef7 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -41,6 +41,7 @@
> >>   #include "trace.h"
> >>   #include "qapi/error.h"
> >>   #include "migration/blocker.h"
> >> +#include "migration/qemu-file.h"
> >>   
> >>   #define TYPE_VFIO_PCI "vfio-pci"
> >>   #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> >> @@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
> >>   }
> >>   }
> >>   
> >> +static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
> >> +{
> >> +PCIDevice *pdev = &vdev->pdev;
> >> +VFIOBAR *bar = &vdev->bars[nr];
> >> +uint64_t addr;
> >> +uint32_t addr_lo, addr_hi = 0;
> >> +
> >> +/* Skip unimplemented BARs and the upper half of 64bit BARS. */
> >> +if (!bar->size) {
> >> +return 0;
> >> +}
> >> +
> >> +addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 
> >> 4);
> >> +
> >> +addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
> >> +   PCI_BASE_ADDRESS_MEM_MASK);  
> > 
> > Nit, &= or combine with previous set.
> >   
> >> +if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
> >> +addr_hi = pci_default_read_config(pdev,
> >> + PCI_BASE_ADDRESS_0 + (nr + 1) * 
> >> 4, 4);
> >> +}
> >> +
> >> +addr = ((uint64_t)addr_hi << 32) | addr_lo;  
> > 
> > Could we use a union?
> >   
> >> +
> >> +if (!QEMU_IS_ALIGNED(addr, bar->size)) {
> >> +return -EINVAL;
> >> +}  
> > 
> > What specifically are we validating here?  This should be true no
> > matter what we wrote to the BAR or else BAR emulation is broken.  The
> > bits that could make this unaligned are not implemented in the BAR.
> >   
> >> +
> >> +return 0;
> >> +}
> >> +
> >> +static int vfio_bars_validate(VFIOPCIDevice *vdev)
> >> +{
> >> +int i, ret;
> >> +
> >> +for (i = 0; i < PCI_ROM_SLOT; i++) {
> >> +ret = vfio_bar_validate(vdev, i);
> >> +if (ret) {
> >> +error_report("vfio: BAR address %d validation failed", i);
> >> +return ret;
> >> +}
> >> +}
> >> +return 0;
> >> +}
> >> +
> >>   static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
> >>   {
> >>   VFIOBAR *bar = &vdev->bars[nr];
> >> @@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
> >> *vbasedev)
> >>   return OBJECT(vdev);
> >>   }
> >>   
> >> +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
> >> +{
> >> +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> >> +PCIDevice *pdev = &vdev->pdev;
> >> +uint16_t pci_cmd;
> >> +int i;
> >> +
> >> +for (i = 0; i < PCI_ROM_SLOT; i++) {
> >> +uint32_t bar;
> >> +
> >> +bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 
> >> 4);
> >> +qemu_put_be32(f, bar);
> >> +}
> >> +
> >> +qemu_put_be32(f, vdev->interrupt);
> >> +if (vdev->interrupt == VFIO_INT_MSI) {
> >> +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> >> +bool msi_64bit;
> >> +
> >> +msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
> >> PCI_MSI_FLAGS,
> >> +2);
> >> +msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
> >> +
> >> +msi_addr_lo = pci_default_read_config(pdev,
> >> + pdev->msi_cap + 
> >> PCI_MSI_ADDRESS_LO, 4);
> >> +qemu_put_be32(f, msi_addr_lo);
> >> +
> >> +if (msi_64bit) {
> >> +msi_addr_hi = pci_default_read_config(pdev,
> >> + pdev->msi_cap + 
> >> PCI_MSI_ADDRESS_HI,
> >> + 4);
> >> +}
> >> +qemu_put_be32(f, msi_addr_hi);
> >> +
> >> +msi_data = pci_default_read_config(pdev,
> >> +pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
> >> PCI_MSI_DATA_32),
> >> +2);
> >> +qemu_put_be32(f, msi_data);  
> > 
> > Isn't the data field only a u16?
> >   
> 
> Yes, fixing it.
> 
> >> +} else if (vdev->interrupt == VFIO_INT_MSIX) {
> >> +uint16_t offset;
> >> +
> >> +/* save enable bit and maskall bit */
> >> +of

Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-05-04 Thread Alex Williamson

On Tue, 5 May 2020 04:49:10 +0530
Kirti Wankhede  wrote:

> On 3/26/2020 2:32 AM, Alex Williamson wrote:
> > On Wed, 25 Mar 2020 02:39:06 +0530
> > Kirti Wankhede  wrote:
> >   
> >> Define flags to be used as delimeter in migration file stream.
> >> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
> >> region from these functions at source during saving or pre-copy phase.
> >> Set VFIO device state depending on VM's state. During live migration, VM is
> >> running when .save_setup is called, _SAVING | _RUNNING state is set for 
> >> VFIO
> >> device. During save-restore, VM is paused, _SAVING state is set for VFIO 
> >> device.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   hw/vfio/migration.c  | 76 
> >> 
> >>   hw/vfio/trace-events |  2 ++
> >>   2 files changed, 78 insertions(+)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 22ded9d28cf3..033f76526e49 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -8,6 +8,7 @@
> >>*/
> >>   
> >>   #include "qemu/osdep.h"
> >> +#include "qemu/main-loop.h"
> >>   #include 
> >>   
> >>   #include "sysemu/runstate.h"
> >> @@ -24,6 +25,17 @@
> >>   #include "pci.h"
> >>   #include "trace.h"
> >>   
> >> +/*
> >> + * Flags used as delimiter:
> >> + * 0x => MSB 32-bit all 1s
> >> + * 0xef10 => emulated (virtual) function IO
> >> + * 0x => 16-bits reserved for flags
> >> + */
> >> +#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
> >> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
> >> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
> >> +#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> >> +
> >>   static void vfio_migration_region_exit(VFIODevice *vbasedev)
> >>   {
> >>   VFIOMigration *migration = vbasedev->migration;
> >> @@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t mask,
> >>   return 0;
> >>   }
> >>   
> >> +/* -- 
> >> */
> >> +
> >> +static int vfio_save_setup(QEMUFile *f, void *opaque)
> >> +{
> >> +VFIODevice *vbasedev = opaque;
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +int ret;
> >> +
> >> +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> >> +
> >> +if (migration->region.mmaps) {
> >> +qemu_mutex_lock_iothread();
> >> +ret = vfio_region_mmap(&migration->region);
> >> +qemu_mutex_unlock_iothread();
> >> +if (ret) {
> >> +error_report("%s: Failed to mmap VFIO migration region %d: 
> >> %s",
> >> + vbasedev->name, migration->region.index,
> >> + strerror(-ret));
> >> +return ret;
> >> +}
> >> +}
> >> +
> >> +ret = vfio_migration_set_state(vbasedev, ~0, 
> >> VFIO_DEVICE_STATE_SAVING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state SAVING", vbasedev->name);
> >> +return ret;
> >> +}
> >> +
> >> +/*
> >> + * Save migration region size. This is used to verify migration 
> >> region size
> >> + * is greater than or equal to migration region size at destination
> >> + */
> >> +qemu_put_be64(f, migration->region.size);  
> > 
> > Is this requirement supported by the uapi?
> 
> Yes, on UAPI thread we discussed this:
> 
>   * For the user application, data is opaque. The user application 
> should write
>   * data in the same order as the data is received and the data should be of
>   * same transaction size at the source.
> 
> data should be same transaction size, so migration region size should be 
> greater than or equal to the size at source when verifying at destination.

We are that user application for which the data is opaque, therefore we
should make no assumptions about how the vendor driver makes use of
their region.  If we get a transaction that exceeds the end of the
region, I agree, that would be an error.  But we have no business
predicting that such a transaction might occur if the vendor driver
indicates it can support the migration.

> > The vendor driver operates
> > within the migration region, but it has no requirement to use the full
> > extent of the region.  Shouldn't we instead insert the version string
> > from versioning API Yan proposed?  Is this were we might choose to use
> > an interface via the vfio API rather than sysfs if we had one?
> >  
> 
> VFIO API cannot be used by libvirt or management tool stack. We need 
> sysfs as Yan proposed to be used by libvirt or management tool stack.

It's been a long time, but that doesn't seem like what I was asking.
The sysfs version checking is used to select a target that is likely to
succeed, but the migration stream is still generated by a user and the
vendor driver is still ultimately responsible fo

Re: [PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers

2020-05-04 Thread Alex Williamson

On Tue, 5 May 2020 04:48:14 +0530
Kirti Wankhede  wrote:

> On 3/26/2020 3:33 AM, Alex Williamson wrote:
> > On Wed, 25 Mar 2020 02:39:07 +0530
> > Kirti Wankhede  wrote:
> >   
> >> Added .save_live_pending, .save_live_iterate and 
> >> .save_live_complete_precopy
> >> functions. These functions handles pre-copy and stop-and-copy phase.
> >>
> >> In _SAVING|_RUNNING device state or pre-copy phase:
> >> - read pending_bytes. If pending_bytes > 0, go through below steps.
> >> - read data_offset - indicates kernel driver to write data to staging
> >>buffer.
> >> - read data_size - amount of data in bytes written by vendor driver in
> >>migration region.
> >> - read data_size bytes of data from data_offset in the migration region.
> >> - Write data packet to file stream as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
> >> VFIO_MIG_FLAG_END_OF_STATE }
> >>
> >> In _SAVING device state or stop-and-copy phase
> >> a. read config space of device and save to migration file stream. This
> >> doesn't need to be from vendor driver. Any other special config state
> >> from driver can be saved as data in following iteration.
> >> b. read pending_bytes. If pending_bytes > 0, go through below steps.
> >> c. read data_offset - indicates kernel driver to write data to staging
> >> buffer.
> >> d. read data_size - amount of data in bytes written by vendor driver in
> >> migration region.
> >> e. read data_size bytes of data from data_offset in the migration region.
> >> f. Write data packet as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
> >> g. iterate through steps b to f while (pending_bytes > 0)
> >> h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> >>
> >> When data region is mapped, its user's responsibility to read data from
> >> data_offset of data_size before moving to next steps.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   hw/vfio/migration.c   | 245 
> >> +-
> >>   hw/vfio/trace-events  |   6 ++
> >>   include/hw/vfio/vfio-common.h |   1 +
> >>   3 files changed, 251 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 033f76526e49..ecbeed5182c2 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -138,6 +138,137 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t mask,
> >>   return 0;
> >>   }
> >>   
> >> +static void *find_data_region(VFIORegion *region,
> >> +  uint64_t data_offset,
> >> +  uint64_t data_size)
> >> +{
> >> +void *ptr = NULL;
> >> +int i;
> >> +
> >> +for (i = 0; i < region->nr_mmaps; i++) {
> >> +if ((data_offset >= region->mmaps[i].offset) &&
> >> +(data_offset < region->mmaps[i].offset + 
> >> region->mmaps[i].size) &&
> >> +(data_size <= region->mmaps[i].size)) {  
> > 
> > (data_offset - region->mmaps[i].offset) can be non-zero, so this test
> > is invalid.  Additionally the uapi does not require that a give data
> > chunk fits exclusively within an mmap'd area, it may overlap one or
> > more mmap'd sections of the region, possibly with non-mmap'd areas
> > included.
> >   
> 
> What's the advantage of having mmap and non-mmap overlapped regions?
> Isn't it better to have data section either mapped or trapped?

The spec allows for it, therefore we need to support it.  A vendor
driver might choose to include a header with sequence and checksum
information for each transaction, they might accomplish this by setting
data_offset to a trapped area backed by kernel memory followed by an
area supporting direct mmap to the device.  The target end could then
fault on writing the header if the sequence information is incorrect.
A trapped area at the end of the transaction could allow the vendor
driver to validate a checksum.

> >> +ptr = region->mmaps[i].mmap + (data_offset -
> >> +   region->mmaps[i].offset);
> >> +break;
> >> +}
> >> +}
> >> +return ptr;
> >> +}
> >> +
> >> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> >> +{
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +VFIORegion *region = &migration->region;
> >> +uint64_t data_offset = 0, data_size = 0;
> >> +int ret;
> >> +
> >> +ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
> >> +region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> + data_offset));
> >> +if (ret != sizeof(data_offset)) {
> >> +error_report("%s: Failed to get migration buffer data offset %d",
> >> + vbasedev->name, ret);
> >> +return -EINVAL;
> >> +}
> >> +
> >> +ret = pread(vbasedev->fd, &data_size, sizeof(data_size),
> >> +regio

Re: [PATCH v4 00/18] nvme: factor out cmb/pmr setup

2020-05-04 Thread Klaus Jensen

On Apr 29 07:40, Klaus Jensen wrote:
> On Apr 22 13:01, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Changes since v3
> > 
> > * Remove the addition of a new PROPERTIES macro in "nvme: move device
> >   parameters to separate struct" (Philippe)
> > 
> > * Add NVME_PMR_BIR constant and use it in PMR setup.
> > 
> > * Split "nvme: factor out cmb/pmr setup" into
> >   - "nvme: factor out cmb setup",
> >   - "nvme: factor out pmr setup" and
> >   - "nvme: do cmb/pmr init as part of pci init"
> >   (Philippe)
> > 
> > 
> > Klaus Jensen (18):
> >   nvme: fix pci doorbell size calculation
> >   nvme: rename trace events to pci_nvme
> >   nvme: remove superfluous breaks
> >   nvme: move device parameters to separate struct
> >   nvme: use constants in identify
> >   nvme: refactor nvme_addr_read
> >   nvme: add max_ioqpairs device parameter
> >   nvme: remove redundant cmbloc/cmbsz members
> >   nvme: factor out property/constraint checks
> >   nvme: factor out device state setup
> >   nvme: factor out block backend setup
> >   nvme: add namespace helpers
> >   nvme: factor out namespace setup
> >   nvme: factor out pci setup
> >   nvme: factor out cmb setup
> >   nvme: factor out pmr setup
> >   nvme: do cmb/pmr init as part of pci init
> >   nvme: factor out controller identify setup
> > 
> >  hw/block/nvme.c   | 543 --
> >  hw/block/nvme.h   |  31 ++-
> >  hw/block/trace-events | 180 +++---
> >  include/block/nvme.h  |   8 +
> >  4 files changed, 429 insertions(+), 333 deletions(-)
> > 
> > -- 
> > 2.26.2
> > 
> > 
> 
> Gentle bump on this.
> 
> I apparently managed to screw up the git send-email this time, loosing a
> bunch of CCs in the process. Sorry about that.
> 

Bumping again. I have not received any new comments on this.

I'm missing a couple of Reviewed-by's (they all carry Maxim's) on

  nvme: move device parameters to separate struct
  I think this can also carry Philippe's Reviewed-by, since the only
  change is the removal of the PROPERTIES macro.

  nvme: factor out cmb setup
  nvme: factor out pmr setup
  nvme: do cmb/pmr init as part of pci init
  I think these could also carry Reviewed-by from Keith as well,
  since the only change is also factoring out the PMR setup (which
  was not there when Keith reviewed it) and the splitting into two
  trivial patches per request from Philippe.


Thanks,
Klaus

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-04 Thread Ying Fang

Hi, Ike.

I think this tricky bug was fixed by Paolo last month. 
Please try patch 
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=5710a3e09f9b85801e5ce70797a4a511e5fc9e2c.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Incomplete
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Incomplete
Status in qemu source package in Bionic:
  Incomplete
Status in qemu source package in Disco:
  Incomplete
Status in qemu source package in Eoan:
  Incomplete
Status in qemu source package in Focal:
  Incomplete

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

[Bug 1875762] Re: Poor disk performance on sparse VMDKs

2020-05-04 Thread Alan Murtagh

Thanks Stefan.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1875762

Title:
  Poor disk performance on sparse VMDKs

Status in QEMU:
  New

Bug description:
  Found in QEMU 4.1, and reproduced on master.

  QEMU appears to suffer from remarkably poor disk performance when
  writing to sparse-extent VMDKs. Of course it's to be expected that
  allocation takes time and sparse VMDKs peform worse than allocated
  VMDKs, but surely not on the orders of magnitude I'm observing. On my
  system, the fully allocated write speeds are approximately 1.5GB/s,
  while the fully sparse write speeds can be as low as 10MB/s. I've
  noticed that adding "cache unsafe" reduces the issue dramatically,
  bringing speeds up to around 750MB/s. I don't know if this is still
  slow or if this perhaps reveals a problem with the default caching
  method.

  To reproduce the issue I've attached two 4GiB VMDKs. Both are
  completely empty and both are technically sparse-extent VMDKs, but one
  is 100% pre-allocated and the other is 100% unallocated. If you attach
  these VMDKs as second and third disks to an Ubuntu VM running on QEMU
  (with KVM) and measure their write performance (using dd to write to
  /dev/sdb and /dev/sdc for example) the difference in write speeds is
  clear.

  For what it's worth, the flags I'm using that relate to the VMDK are
  as follows:

  `-drive if=none,file=sparse.vmdk,id=hd0,format=vmdk -device virtio-
  scsi-pci,id=scsi -device scsi-hd,drive=hd0`

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1875762/+subscriptions

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-05-04 Thread Ike Panhc

Take several CPUs offline and re-test. Even only 32 threads left, I
still can reproduce this issue easily.

ubuntu@kreiken:~$ lscpu | grep list;for i in `seq 1 10`;do echo ;rm -f 
out.img;timeout 30 qemu-img convert -f qcow2 -O qcow2 
./bionic-server-cloudimg-arm64.img out.img -p; done
On-line CPU(s) list:  0-31
Off-line CPU(s) list: 32-127

(100.00/100%)

(43.20/100%)
(0.00/100%)
(1.00/100%)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Incomplete
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Incomplete
Status in qemu source package in Bionic:
  Incomplete
Status in qemu source package in Disco:
  Incomplete
Status in qemu source package in Eoan:
  Incomplete
Status in qemu source package in Focal:
  Incomplete

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

Re: [PATCH qemu] spapr: Add PVR setting capability

2020-05-04 Thread Alexey Kardashevskiy




On 04/05/2020 21:30, Greg Kurz wrote:
> On Fri, 17 Apr 2020 14:11:05 +1000
> Alexey Kardashevskiy  wrote:
> 
>> At the moment the VCPU init sequence includes setting PVR which in case of
>> KVM-HV only checks if it matches the hardware PVR mask as PVR cannot be
>> virtualized by the hardware. In order to cope with various CPU revisions
>> only top 16bit of PVR are checked which works for minor revision updates.
>>
>> However in every CPU generation starting POWER7 (at least) there were CPUs
>> supporting the (almost) same POWER ISA level but having different top
>> 16bits of PVR - POWER7+, POWER8E, POWER8NVL; this time we got POWER9+
>> with a new PVR family. We would normally add the PVR mask for the new one
>> too, the problem with it is that although the physical machines exist,
>> P9+ is not going to be released as a product, and this situation is likely
>> to repeat in the future.
>>
>> Instead of adding every new CPU family in QEMU, this adds a new sPAPR
>> machine capability to force PVR setting/checking. It is "on" by default
>> to preserve the existing behavior. When "off", it is the user's
>> responsibility to specify the correct CPU.
>>
> 
> I don't quite understand the motivation for this... what does this
> buy us ?

I answered that part in another mail in this thread, shortly this is to
make QEMU work with HV KVM on unknown-to-QEMU CPU family (0x004f).


> 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  include/hw/ppc/spapr.h |  5 -
>>  hw/ppc/spapr.c |  1 +
>>  hw/ppc/spapr_caps.c| 18 ++
>>  target/ppc/kvm.c   | 16 ++--
>>  4 files changed, 37 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index e579eaf28c05..5ccac4d56871 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -81,8 +81,10 @@ typedef enum {
>>  #define SPAPR_CAP_CCF_ASSIST0x09
>>  /* Implements PAPR FWNMI option */
>>  #define SPAPR_CAP_FWNMI 0x0A
>> +/* Implements PAPR PVR option */
>> +#define SPAPR_CAP_PVR   0x0B
>>  /* Num Caps */
>> -#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
>> +#define SPAPR_CAP_NUM   (SPAPR_CAP_PVR + 1)
>>  
>>  /*
>>   * Capability Values
>> @@ -912,6 +914,7 @@ extern const VMStateDescription 
>> vmstate_spapr_cap_nested_kvm_hv;
>>  extern const VMStateDescription vmstate_spapr_cap_large_decr;
>>  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
>>  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
>> +extern const VMStateDescription vmstate_spapr_cap_pvr;
>>  
>>  static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
>>  {
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 841b5ec59b12..ecc74c182b9f 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4535,6 +4535,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>> void *data)
>>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
>> +smc->default_caps.caps[SPAPR_CAP_PVR] = SPAPR_CAP_ON;
>>  spapr_caps_add_properties(smc, &error_abort);
>>  smc->irq = &spapr_irq_dual;
>>  smc->dr_phb_enabled = true;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index eb54f9422722..398b72b77f9f 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -525,6 +525,14 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
>> uint8_t val,
>>  }
>>  }
>>  
>> +static void cap_pvr_apply(SpaprMachineState *spapr, uint8_t val, Error 
>> **errp)
>> +{
>> +if (val) {
>> +return;
>> +}
>> +warn_report("If you're uing kvm-hv.ko, only \"-cpu host\" is 
>> supported");
>> +}
>> +
>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>  [SPAPR_CAP_HTM] = {
>>  .name = "htm",
>> @@ -633,6 +641,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>  .type = "bool",
>>  .apply = cap_fwnmi_apply,
>>  },
>> +[SPAPR_CAP_PVR] = {
>> +.name = "pvr",
>> +.description = "Enforce PVR in KVM",
>> +.index = SPAPR_CAP_PVR,
>> +.get = spapr_cap_get_bool,
>> +.set = spapr_cap_set_bool,
>> +.type = "bool",
>> +.apply = cap_pvr_apply,
>> +},
>>  };
>>  
>>  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
>> @@ -773,6 +790,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, 
>> SPAPR_CAP_NESTED_KVM_HV);
>>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
>> +SPAPR_CAP_MIG_STATE(pvr, SPAPR_CAP_PVR);
>>  
>>  void spapr_caps_init(SpaprMachineState *spapr)
>>  {
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 03d0667e8f94..a4adc29b6522 100644
>> --- a/tar

Re: [PATCH v1 4/4] .travis.yml: reduce the load on [ppc64] GCC check-tcg

2020-05-04 Thread David Gibson

On Mon, May 04, 2020 at 08:48:46PM +0100, Alex Bennée wrote:
> 
> Richard Henderson  writes:
> 
> > On 5/3/20 7:10 PM, David Gibson wrote:
> >   - TEST_CMD="make check check-tcg V=1"
> > -- CONFIG="--disable-containers 
> > --target-list=${MAIN_SOFTMMU_TARGETS},ppc64le-linux-user"
> > +- CONFIG="--disable-containers 
> > --target-list=ppc64-softmmu,ppc64le-linux-user"
> 
>  Cc'ing David, since I'm not sure about this one... Maybe split as we
>  did with other jobs?
> > ...
> >> Hrm.  I'd prefer not to drop this coverage if we can avoid it.  What
> >> we're not testing with the proposed patch is TCG generation for a ppc
> >> host but a non-ppc target.  e.g. if the x86 or ARM target side generates
> >> some pattern of TCG ops that's very rare for the ppc target, and is
> >> buggy in the ppc host side.
> >
> > Are we actually testing those here?  As far as I can see, we're not 
> > installing
> > any cross-compilers here, so we're not building any non-ppc binaries.  Nor 
> > are
> > we running check-acceptance which would download pre-built foreign
> > binaries.
> 
> We are testing the very minimal boot stubs that each -system binary has
> in qtest but they are hardly going to be exercising the majority of the
> TCG. Basically the $SELF-linux-user is going to be exercising more of
> the TCG than anything else.

Oh, good points.

Go ahead then.  In fact we should probably do that for all the
check-tcg builds that don't install cross compilers.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 5/6] block/nvme: Align block pages queue to host page size

2020-05-04 Thread David Gibson

On Mon, May 04, 2020 at 11:46:40AM +0200, Philippe Mathieu-Daudé wrote:
> In nvme_create_queue_pair() we create a page list using
> qemu_blockalign(), then map it with qemu_vfio_dma_map():
> 
>   q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>   r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
> s->page_size * NVME_QUEUE_SIZE, ...);
> 
> With:
> 
>   s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
> 
> The qemu_vfio_dma_map() documentation says "The caller need
> to make sure the area is aligned to page size". While we use
> multiple s->page_size as alignment, it might be not sufficient
> on some hosts. Use the qemu_real_host_page_size value to be
> sure the host alignment is respected.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: David Gibson 

> ---
> Cc: Cédric Le Goater 
> Cc: David Gibson 
> Cc: Laurent Vivier 
> ---
>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 7b7c0cc5d6..bde0d28b39 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  
>  s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>  s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
> -bs->bl.opt_mem_alignment = s->page_size;
> +bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
>  timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
>  
>  /* Reset device to get a clean state. */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH qemu] spapr: Add PVR setting capability

2020-05-04 Thread Alexey Kardashevskiy




On 04/05/2020 21:38, Cédric Le Goater wrote:
> 
> 
> On 5/4/20 1:30 PM, Greg Kurz wrote:
>> On Fri, 17 Apr 2020 14:11:05 +1000
>> Alexey Kardashevskiy  wrote:
>>
>>> At the moment the VCPU init sequence includes setting PVR which in case of
>>> KVM-HV only checks if it matches the hardware PVR mask as PVR cannot be
>>> virtualized by the hardware. In order to cope with various CPU revisions
>>> only top 16bit of PVR are checked which works for minor revision updates.
>>>
>>> However in every CPU generation starting POWER7 (at least) there were CPUs
>>> supporting the (almost) same POWER ISA level but having different top
>>> 16bits of PVR - POWER7+, POWER8E, POWER8NVL; this time we got POWER9+
>>> with a new PVR family. We would normally add the PVR mask for the new one
>>> too, the problem with it is that although the physical machines exist,
>>> P9+ is not going to be released as a product, and this situation is likely
>>> to repeat in the future.
>>>
>>> Instead of adding every new CPU family in QEMU, this adds a new sPAPR
>>> machine capability to force PVR setting/checking. It is "on" by default
>>> to preserve the existing behavior. When "off", it is the user's
>>> responsibility to specify the correct CPU.
>>>
>>
>> I don't quite understand the motivation for this... what does this
>> buy us ?
> 
> So we could use the command line options : 
> 
>  -cpu POWER8 -machine pseries,pvr=off
> 
> instead of
> 
>  -cpu host -machine pseries,max-cpu-compat=POWER8  
> 
> is that it ? 

This does not work for my case.

> 
> I am not sure I get it either.


QEMU has to know the host CPU family to work with HV KVM: QEMU reads
pvr, picks a CPU class and goes from there, the max-compat applies lot
later and does not affect the CPU class which QEMU chooses.

Now I have a machine ("swift") with 0x004F1100 and QEMU has no idea what
0x004F is (but I know it is POWER9) and there is currently no way to
bypass the PVR check which QEMU requests from KVM which KVM PR passes
but KVM HV fails as it has to match. Adding a new CPU family to the
upstream QEMU makes no sense as there will only be a handful of those
ever. And forcing PVR does not buy us that much anyway.




> 
> 
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  include/hw/ppc/spapr.h |  5 -
>>>  hw/ppc/spapr.c |  1 +
>>>  hw/ppc/spapr_caps.c| 18 ++
>>>  target/ppc/kvm.c   | 16 ++--
>>>  4 files changed, 37 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index e579eaf28c05..5ccac4d56871 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -81,8 +81,10 @@ typedef enum {
>>>  #define SPAPR_CAP_CCF_ASSIST0x09
>>>  /* Implements PAPR FWNMI option */
>>>  #define SPAPR_CAP_FWNMI 0x0A
>>> +/* Implements PAPR PVR option */
>>> +#define SPAPR_CAP_PVR   0x0B
>>>  /* Num Caps */
>>> -#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
>>> +#define SPAPR_CAP_NUM   (SPAPR_CAP_PVR + 1)
>>>  
>>>  /*
>>>   * Capability Values
>>> @@ -912,6 +914,7 @@ extern const VMStateDescription 
>>> vmstate_spapr_cap_nested_kvm_hv;
>>>  extern const VMStateDescription vmstate_spapr_cap_large_decr;
>>>  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
>>>  extern const VMStateDescription vmstate_spapr_cap_fwnmi;
>>> +extern const VMStateDescription vmstate_spapr_cap_pvr;
>>>  
>>>  static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
>>>  {
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 841b5ec59b12..ecc74c182b9f 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -4535,6 +4535,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>>> void *data)
>>>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>>>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
>>> +smc->default_caps.caps[SPAPR_CAP_PVR] = SPAPR_CAP_ON;
>>>  spapr_caps_add_properties(smc, &error_abort);
>>>  smc->irq = &spapr_irq_dual;
>>>  smc->dr_phb_enabled = true;
>>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>>> index eb54f9422722..398b72b77f9f 100644
>>> --- a/hw/ppc/spapr_caps.c
>>> +++ b/hw/ppc/spapr_caps.c
>>> @@ -525,6 +525,14 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
>>> uint8_t val,
>>>  }
>>>  }
>>>  
>>> +static void cap_pvr_apply(SpaprMachineState *spapr, uint8_t val, Error 
>>> **errp)
>>> +{
>>> +if (val) {
>>> +return;
>>> +}
>>> +warn_report("If you're uing kvm-hv.ko, only \"-cpu host\" is 
>>> supported");
>>> +}
>>> +
>>>  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>  [SPAPR_CAP_HTM] = {
>>>  .name = "htm",
>>> @@ -633,6 +641,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>>  .type = "bool",
>>>  .apply = cap_fwnmi_appl

[PATCH v2 4/4] softfloat: fix floatx80 pseudo-denormal round to integer

2020-05-04 Thread Joseph Myers

The softfloat function floatx80_round_to_int incorrectly handles the
case of a pseudo-denormal where only the high bit of the significand
is set, ignoring that bit (treating the number as an exact zero)
rather than treating the number as an alternative representation of
+/- 2^-16382 (which may round to +/- 1 depending on the rounding mode)
as hardware does.  Fix this check (simplifying the code in the
process).

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c|  2 +-
 tests/tcg/i386/test-i386-pseudo-denormal.c | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8e9c714e6f..e29b07542a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5741,7 +5741,7 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status 
*status)
 }
 if ( aExp < 0x3FFF ) {
 if (( aExp == 0 )
- && ( (uint64_t) ( extractFloatx80Frac( a )<<1 ) == 0 ) ) {
+ && ( (uint64_t) ( extractFloatx80Frac( a ) ) == 0 ) ) {
 return a;
 }
 status->float_exception_flags |= float_flag_inexact;
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
index acf2b9cf03..00d510cf4a 100644
--- a/tests/tcg/i386/test-i386-pseudo-denormal.c
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -14,6 +14,7 @@ volatile long double ld_res;
 
 int main(void)
 {
+short cw;
 int ret = 0;
 ld_res = ld_pseudo_m16382.ld + ld_pseudo_m16382.ld;
 if (ld_res != 0x1p-16381L) {
@@ -24,5 +25,14 @@ int main(void)
 printf("FAIL: pseudo-denormal compare\n");
 ret = 1;
 }
+/* Set round-upward.  */
+__asm__ volatile ("fnstcw %0" : "=m" (cw));
+cw = (cw & ~0xc00) | 0x800;
+__asm__ volatile ("fldcw %0" : : "m" (cw));
+__asm__ ("frndint" : "=t" (ld_res) : "0" (ld_pseudo_m16382.ld));
+if (ld_res != 1.0L) {
+printf("FAIL: pseudo-denormal round-to-integer\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 3/4] softfloat: fix floatx80 pseudo-denormal comparisons

2020-05-04 Thread Joseph Myers

The softfloat floatx80 comparisons fail to allow for pseudo-denormals,
which should compare equal to corresponding values with biased
exponent 1 rather than 0.  Add an adjustment for that case when
comparing numbers with the same sign.

Note that this fix only changes floatx80_compare_internal, not the
other more specific comparison operations.  That is the only
comparison function for floatx80 used in the i386 port, which is the
only supported port with these pseudo-denormal semantics.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c| 5 +
 tests/tcg/i386/test-i386-pseudo-denormal.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6094d267b5..8e9c714e6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7966,6 +7966,11 @@ static inline int floatx80_compare_internal(floatx80 a, 
floatx80 b,
 return 1 - (2 * aSign);
 }
 } else {
+/* Normalize pseudo-denormals before comparison.  */
+if ((a.high & 0x7fff) == 0 && a.low & UINT64_C(0x8000))
+++a.high;
+if ((b.high & 0x7fff) == 0 && b.low & UINT64_C(0x8000))
+++b.high;
 if (a.low == b.low && a.high == b.high) {
 return float_relation_equal;
 } else {
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
index cfa2a500b0..acf2b9cf03 100644
--- a/tests/tcg/i386/test-i386-pseudo-denormal.c
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -20,5 +20,9 @@ int main(void)
 printf("FAIL: pseudo-denormal add\n");
 ret = 1;
 }
+if (ld_pseudo_m16382.ld != 0x1p-16382L) {
+printf("FAIL: pseudo-denormal compare\n");
+ret = 1;
+}
 return ret;
 }
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2 1/4] softfloat: silence sNaN for conversions to/from floatx80

2020-05-04 Thread Joseph Myers

Conversions between IEEE floating-point formats should convert
signaling NaNs to quiet NaNs.  Most of those in QEMU's softfloat code
do so, but those for floatx80 fail to.  Fix those conversions to
silence signaling NaNs as well.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c | 24 +++---
 tests/tcg/i386/test-i386-snan-convert.c | 63 +
 2 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-snan-convert.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ae6ba71854..ac116c70b8 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -4498,7 +4498,9 @@ floatx80 float32_to_floatx80(float32 a, float_status 
*status)
 aSign = extractFloat32Sign( a );
 if ( aExp == 0xFF ) {
 if (aSig) {
-return commonNaNToFloatx80(float32ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float32ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5016,7 +5018,9 @@ floatx80 float64_to_floatx80(float64 a, float_status 
*status)
 aSign = extractFloat64Sign( a );
 if ( aExp == 0x7FF ) {
 if (aSig) {
-return commonNaNToFloatx80(float64ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float64ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign,
 floatx80_infinity_high,
@@ -5618,7 +5622,9 @@ float32 floatx80_to_float32(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat32(floatx80ToCommonNaN(a, status), status);
+float32 res = commonNaNToFloat32(floatx80ToCommonNaN(a, status),
+ status);
+return float32_silence_nan(res, status);
 }
 return packFloat32( aSign, 0xFF, 0 );
 }
@@ -5650,7 +5656,9 @@ float64 floatx80_to_float64(floatx80 a, float_status 
*status)
 aSign = extractFloatx80Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat64(floatx80ToCommonNaN(a, status), status);
+float64 res = commonNaNToFloat64(floatx80ToCommonNaN(a, status),
+ status);
+return float64_silence_nan(res, status);
 }
 return packFloat64( aSign, 0x7FF, 0 );
 }
@@ -5681,7 +5689,9 @@ float128 floatx80_to_float128(floatx80 a, float_status 
*status)
 aExp = extractFloatx80Exp( a );
 aSign = extractFloatx80Sign( a );
 if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) {
-return commonNaNToFloat128(floatx80ToCommonNaN(a, status), status);
+float128 res = commonNaNToFloat128(floatx80ToCommonNaN(a, status),
+   status);
+return float128_silence_nan(res, status);
 }
 shift128Right( aSig<<1, 0, 16, &zSig0, &zSig1 );
 return packFloat128( aSign, aExp, zSig0, zSig1 );
@@ -6959,7 +6969,9 @@ floatx80 float128_to_floatx80(float128 a, float_status 
*status)
 aSign = extractFloat128Sign( a );
 if ( aExp == 0x7FFF ) {
 if ( aSig0 | aSig1 ) {
-return commonNaNToFloatx80(float128ToCommonNaN(a, status), status);
+floatx80 res = commonNaNToFloatx80(float128ToCommonNaN(a, status),
+   status);
+return floatx80_silence_nan(res, status);
 }
 return packFloatx80(aSign, floatx80_infinity_high,
floatx80_infinity_low);
diff --git a/tests/tcg/i386/test-i386-snan-convert.c 
b/tests/tcg/i386/test-i386-snan-convert.c
new file mode 100644
index 00..ed6d535ce2
--- /dev/null
+++ b/tests/tcg/i386/test-i386-snan-convert.c
@@ -0,0 +1,63 @@
+/* Test conversions of signaling NaNs to and from long double.  */
+
+#include 
+#include 
+
+volatile float f_res;
+volatile double d_res;
+volatile long double ld_res;
+
+volatile float f_snan = __builtin_nansf("");
+volatile double d_snan = __builtin_nans("");
+volatile long double ld_snan = __builtin_nansl("");
+
+int issignaling_f(float x)
+{
+union { float f; uint32_t u; } u = { .f = x };
+return (u.u & 0x7fff) > 0x7f80 && (u.u & 0x40) == 0;
+}
+
+int issignaling_d(double x)
+{
+union { double d; uint64_t u; } u = { .d = x };
+return (((u.u & UINT64_C(0x7fff)) >
+UINT64_C(0x7ff0)) &&
+(u.u & UINT64_C(0x8)) == 0);
+}
+
+int issignaling_ld(long double x)
+{
+union {
+long double ld;
+struct { uint6

Re: [PATCH v16 QEMU 10/16] vfio: Add load state functions to SaveVMHandlers

2020-05-04 Thread Kirti Wankhede





On 4/2/2020 12:28 AM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 179 +++
  hw/vfio/trace-events |   3 +
  2 files changed, 182 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ecbeed5182c2..ab295d25620e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -269,6 +269,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
  return qemu_file_get_error(f);
  }
  
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)

+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
  /* -- */
  
  static int vfio_save_setup(QEMUFile *f, void *opaque)

@@ -434,12 +461,164 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }
  
+static int vfio_load_setup(QEMUFile *f, void *opaque)

+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(&migration->region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_RESUMING);
+if (ret) {
+error_report("%s: Failed to set state RESUMING", vbasedev->name);
+}
+return ret;
+}
+
+static int vfio_load_cleanup(void *opaque)
+{
+vfio_save_cleanup(opaque);
+return 0;
+}
+
+static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+uint64_t data, data_size;
+
+data = qemu_get_be64(f);
+while (data != VFIO_MIG_FLAG_END_OF_STATE) {
+
+trace_vfio_load_state(vbasedev->name, data);
+
+switch (data) {
+case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
+{
+ret = vfio_load_device_config_state(f, opaque);
+if (ret) {
+return ret;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_SETUP_STATE:
+{
+uint64_t region_size = qemu_get_be64(f);
+
+if (migration->region.size < region_size) {
+error_report("%s: SETUP STATE: migration region too small, "
+ "0x%"PRIx64 " < 0x%"PRIx64, vbasedev->name,
+ migration->region.size, region_size);
+return -EINVAL;
+}
+
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {


Can you explain why you're reading this here rather than letting it drop
through to the read at the end of the loop?



To make sure sequence is followed, otherwise throw error.


+return ret;
+} else {
+error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_DATA_STATE:
+{
+VFIORegion *region = &migration->region;
+void *buf = NULL;
+bool buffer_mmaped = false;
+uint64_t data_offset = 0;
+
+data_size = qemu_get_be64(f);
+if (data_size == 0) {
+break;
+}
+
+ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
+region->fd_offset +
+offsetof(struct vfio_device_migration_info,
+data_offset));
+

Re: [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices

2020-05-04 Thread Kirti Wankhede





On 4/7/2020 9:40 AM, Longpeng (Mike, Cloud Infrastructure Service 
Product Dept.) wrote:



On 2020/3/25 5:09, Kirti Wankhede wrote:

These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/pci.c | 163 ++
  include/hw/vfio/vfio-common.h |   2 +
  2 files changed, 165 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6c77c12e44b9..8deb11e87ef7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
  #include "trace.h"
  #include "qapi/error.h"
  #include "migration/blocker.h"
+#include "migration/qemu-file.h"
  
  #define TYPE_VFIO_PCI "vfio-pci"

  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
  }
  }
  
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)

+{
+PCIDevice *pdev = &vdev->pdev;
+VFIOBAR *bar = &vdev->bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+   PCI_BASE_ADDRESS_MEM_MASK);
+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;
+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = &vdev->bars[nr];
@@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
  return OBJECT(vdev);
  }
  
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)

+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be32(f, msi_data);
+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar = qemu_get_be32(f);
+
+vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
+}
+
+ret = vfio_bars_validate(vdev);
+if (ret) {
+r

[PATCH v2 2/4] softfloat: fix floatx80 pseudo-denormal addition / subtraction

2020-05-04 Thread Joseph Myers

The softfloat function addFloatx80Sigs, used for addition of values
with the same sign and subtraction of values with opposite sign, fails
to handle the case where the two values both have biased exponent zero
and there is a carry resulting from adding the significands, which can
occur if one or both values are pseudo-denormals (biased exponent
zero, explicit integer bit 1).  Add a check for that case, so making
the results match those seen on x86 hardware for pseudo-denormals.

Signed-off-by: Joseph Myers 
---
 fpu/softfloat.c|  6 ++
 tests/tcg/i386/test-i386-pseudo-denormal.c | 24 ++
 2 files changed, 30 insertions(+)
 create mode 100644 tests/tcg/i386/test-i386-pseudo-denormal.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ac116c70b8..6094d267b5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5866,6 +5866,12 @@ static floatx80 addFloatx80Sigs(floatx80 a, floatx80 b, 
flag zSign,
 zSig1 = 0;
 zSig0 = aSig + bSig;
 if ( aExp == 0 ) {
+if ((aSig | bSig) & UINT64_C(0x8000) && zSig0 < aSig) {
+/* At least one of the values is a pseudo-denormal,
+ * and there is a carry out of the result.  */
+zExp = 1;
+goto shiftRight1;
+}
 if (zSig0 == 0) {
 return packFloatx80(zSign, 0, 0);
 }
diff --git a/tests/tcg/i386/test-i386-pseudo-denormal.c 
b/tests/tcg/i386/test-i386-pseudo-denormal.c
new file mode 100644
index 00..cfa2a500b0
--- /dev/null
+++ b/tests/tcg/i386/test-i386-pseudo-denormal.c
@@ -0,0 +1,24 @@
+/* Test pseudo-denormal operations.  */
+
+#include 
+#include 
+
+union u {
+struct { uint64_t sig; uint16_t sign_exp; } s;
+long double ld;
+};
+
+volatile union u ld_pseudo_m16382 = { .s = { UINT64_C(1) << 63, 0 } };
+
+volatile long double ld_res;
+
+int main(void)
+{
+int ret = 0;
+ld_res = ld_pseudo_m16382.ld + ld_pseudo_m16382.ld;
+if (ld_res != 0x1p-16381L) {
+printf("FAIL: pseudo-denormal add\n");
+ret = 1;
+}
+return ret;
+}
-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v16 QEMU 07/16] vfio: Add migration state change notifier

2020-05-04 Thread Kirti Wankhede





On 4/1/2020 4:57 PM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to vendor
driver.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c   | 29 +
  hw/vfio/trace-events  |  1 +
  include/hw/vfio/vfio-common.h |  1 +
  3 files changed, 31 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index af9443c275fb..22ded9d28cf3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -154,6 +154,27 @@ static void vfio_vmstate_change(void *opaque, int running, 
RunState state)
  }
  }
  
+static void vfio_migration_state_notifier(Notifier *notifier, void *data)

+{
+MigrationState *s = data;
+VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state);
+int ret;
+
+trace_vfio_migration_state_notifier(vbasedev->name, s->state);


You might want to use MigrationStatus_str(s->status) to make that
readable.



Yes.


+switch (s->state) {
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_FAILED:
+ret = vfio_migration_set_state(vbasedev,
+  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
+  VFIO_DEVICE_STATE_RUNNING);
+if (ret) {
+error_report("%s: Failed to set state RUNNING", vbasedev->name);
+}


In the migration code we check to see if the VM was running prior to the
start of the migration before we start the CPUs going again (see
migration_iteration_finish):
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_CANCELLING:
 if (s->vm_was_running) {
 vm_start();
 } else {
 if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
 runstate_set(RUN_STATE_POSTMIGRATE);
 }

so if the guest was paused before a migration we don't falsely restart
it.  Maybe you need something similar?



Guest paused means vCPUs are paused, but that doesn't pause device. Init 
state of VFIO device is also RUNNING and device will not get any 
instructions until vCPUs are running. So I think putting device in 
RUNNING is still fine.


Thanks,
Kirti


Dave


+}
+}
+
  static int vfio_migration_init(VFIODevice *vbasedev,
 struct vfio_region_info *info)
  {
@@ -173,6 +194,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
  vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
vbasedev);
  
+vbasedev->migration_state.notify = vfio_migration_state_notifier;

+add_migration_state_change_notifier(&vbasedev->migration_state);
+
  return 0;
  }
  
@@ -211,6 +235,11 @@ add_blocker:
  
  void vfio_migration_finalize(VFIODevice *vbasedev)

  {
+
+if (vbasedev->migration_state.notify) {
+remove_migration_state_change_notifier(&vbasedev->migration_state);
+}
+
  if (vbasedev->vm_state) {
  qemu_del_vm_change_state_handler(vbasedev->vm_state);
  }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 3d15bacd031a..69503228f20e 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,3 +148,4 @@ vfio_display_edid_write_error(void) ""
  vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
  vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
  vfio_vmstate_change(char *name, int running, const char *reason, uint32_t dev_state) 
" (%s) running %d reason %s device state %d"
+vfio_migration_state_notifier(char *name, int state) " (%s) state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3d18eb146b33..28f55f66d019 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -123,6 +123,7 @@ typedef struct VFIODevice {
  VMChangeStateEntry *vm_state;
  uint32_t device_state;
  int vm_running;
+Notifier migration_state;
  } VFIODevice;
  
  struct VFIODeviceOps {

--
2.7.0


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v16 QEMU 13/16] vfio: Add function to start and stop dirty pages tracking

2020-05-04 Thread Kirti Wankhede





On 4/2/2020 12:33 AM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
---
  hw/vfio/migration.c | 36 
  1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ab295d25620e..1827b7cfb316 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -9,6 +9,7 @@
  
  #include "qemu/osdep.h"

  #include "qemu/main-loop.h"
+#include 
  #include 
  
  #include "sysemu/runstate.h"

@@ -296,6 +297,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
  return qemu_file_get_error(f);
  }
  
+static int vfio_start_dirty_page_tracking(VFIODevice *vbasedev, bool start)

+{
+int ret;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return 0;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+}
+return ret;
+}
+
  /* -- */
  
  static int vfio_save_setup(QEMUFile *f, void *opaque)

@@ -330,6 +357,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
   */
  qemu_put_be64(f, migration->region.size);
  
+ret = vfio_start_dirty_page_tracking(vbasedev, true);

+if (ret) {
+return ret;
+}
+
  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
  
  ret = qemu_file_get_error(f);

@@ -346,6 +378,8 @@ static void vfio_save_cleanup(void *opaque)
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
  
+vfio_start_dirty_page_tracking(vbasedev, false);


Shouldn't you check the return value?



Even if return value is checked, it will be ignored and this function 
returns void.


Thanks,
Kirti


+
  if (migration->region.mmaps) {
  vfio_region_unmap(&migration->region);
  }
@@ -669,6 +703,8 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  if (ret) {
  error_report("%s: Failed to set state RUNNING", vbasedev->name);
  }
+
+vfio_start_dirty_page_tracking(vbasedev, false);
  }
  }
  
--

2.7.0


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v18 QEMU 13/18] vfio: Add function to start and stop dirty pages tracking

2020-05-04 Thread Kirti Wankhede

Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
---
 hw/vfio/migration.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3d11d212b1ce..7d1f64a96676 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include 
 #include 
 
 #include "sysemu/runstate.h"
@@ -297,6 +298,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_start_dirty_page_tracking(VFIODevice *vbasedev, bool start)
+{
+int ret;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return -EINVAL;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+}
+return ret;
+}
+
 /* -- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -333,6 +360,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
  */
 qemu_put_be64(f, migration->region.size);
 
+ret = vfio_start_dirty_page_tracking(vbasedev, true);
+if (ret) {
+return ret;
+}
+
 qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 
 ret = qemu_file_get_error(f);
@@ -348,6 +380,8 @@ static void vfio_save_cleanup(void *opaque)
 VFIODevice *vbasedev = opaque;
 VFIOMigration *migration = vbasedev->migration;
 
+vfio_start_dirty_page_tracking(vbasedev, false);
+
 if (migration->region.mmaps) {
 vfio_region_unmap(&migration->region);
 }
@@ -682,6 +716,8 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 if (ret) {
 error_report("%s: Failed to set state RUNNING", vbasedev->name);
 }
+
+vfio_start_dirty_page_tracking(vbasedev, false);
 }
 }
 
-- 
2.7.0

[PATCH v2 0/4] softfloat: fix floatx80 emulation bugs

2020-05-04 Thread Joseph Myers

Attempting to run the GCC and glibc testsuites for i686 under QEMU
shows up a range of bugs in the x87 floating-point emulation.  This
series fixes some bugs (found both through those testsuites and
through subsequent code inspection) that appear to be in the softfloat
code itself rather than in the target/i386 code; I intend to address
such bugs in target/i386 separately.

Note that the floatx80 code is used for both i386 and m68k emulation,
but the two variants of the floatx80 format are not entirely
compatible.  Where the code should do different things for i386 and
m68k, it consistently only does the thing that is right for i386, not
the thing that is right for m68k, and my patches (specifically, the
second and third patches) continue this, doing the things that are
right for i386 but not for m68k.

Specifically, the formats have the following differences (based on
documentation; I don't have m68k hardware to test):

* For m68k, the explicit integer bit of the significand may be either
  0 or 1 for infinities and NaNs, but for i386 it must be 1 and having
  0 there makes it an invalid encoding.

* For i386, when the biased exponent is 0, this is interpreted the
  same way as a biased exponent of 0 in an IEEE format; an explicit
  integer bit 0 means a subnormal value while an explicit integer bit
  1 means a pseudo-denormal; the integer bit has value 2^-16382, as
  for a biased exponent of 1.  For m68k, a biased exponent of 0
  results in the integer bit having value 2^-16383, so values with
  integer bit 1 are normal and those with integer bit 0 are
  subnormal.  So the least subnormal value is 2^-16445 for i386 and
  2^-16446 for m68k.  (This means that the i386 floatx80 format meets
  the IEEE definition of an extended format, which requires a certain
  relation between the largest and smallest exponents, but the m68k
  floatx80 format does not meet that definition.)

  Patches 2 and 3 in this series deal with pseudo-denormals in a way
  that is correct for i386 but not for m68k; to support the m68k
  format properly, the new code in patch 3 could simply be disabled
  for m68k, but addition / subtraction would need more complicated
  changes to be correct for m68k and just disabling the new code would
  not make it correct (likewise, various changes elsewhere in the
  softfloat code would be needed to handle the m68k semantics for
  biased exponent 0).

This second version of the patch series includes i386-specific tests
for the bugs being fixed (written to be reasonably self-contained
rather than depending on libm functionality).  Given the previous
discussion of how some existing tests for floating-point operations
that are present but not enabled fail for unrelated reasons if enabled
for floatx80, this does not do anything regarding enabling such tests.

Joseph Myers (4):
  softfloat: silence sNaN for conversions to/from floatx80
  softfloat: fix floatx80 pseudo-denormal addition / subtraction
  softfloat: fix floatx80 pseudo-denormal comparisons
  softfloat: fix floatx80 pseudo-denormal round to integer

 fpu/softfloat.c| 37 ++---
 tests/tcg/i386/test-i386-pseudo-denormal.c | 38 +
 tests/tcg/i386/test-i386-snan-convert.c| 63 ++
 3 files changed, 131 insertions(+), 7 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-pseudo-denormal.c
 create mode 100644 tests/tcg/i386/test-i386-snan-convert.c

-- 
2.17.1


-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v16 QEMU 05/16] vfio: Add migration region initialization and finalize function

2020-05-04 Thread Kirti Wankhede





On 3/26/2020 11:22 PM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

- Migration functions are implemented for VFIO_DEVICE_TYPE_PCI device in this
   patch series.
- VFIO device supports migration or not is decided based of migration region
   query. If migration region query is successful and migration region
   initialization is successful then migration is supported else migration is
   blocked.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/Makefile.objs |   2 +-
  hw/vfio/migration.c   | 138 ++
  hw/vfio/trace-events  |   3 +
  include/hw/vfio/vfio-common.h |   9 +++
  4 files changed, 151 insertions(+), 1 deletion(-)
  create mode 100644 hw/vfio/migration.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 9bb1c09e8477..8b296c889ed9 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += common.o spapr.o
+obj-y += common.o spapr.o migration.o
  obj-$(CONFIG_VFIO_PCI) += pci.o pci-quirks.o display.o
  obj-$(CONFIG_VFIO_CCW) += ccw.o
  obj-$(CONFIG_VFIO_PLATFORM) += platform.o
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
new file mode 100644
index ..a078dcf1dd8f
--- /dev/null
+++ b/hw/vfio/migration.c
@@ -0,0 +1,138 @@
+/*
+ * Migration support for VFIO devices
+ *
+ * Copyright NVIDIA, Inc. 2019


Time flies by...


+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.


Are you sure you want this to be V2 only? Most code added to qemu now is
v2 or later.



I kept it same as in files vfio-pci and hw/vfio/common.c

Should it be different? Can you give some reference what it should be?

Thanks,
Kirti


+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "cpu.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "migration/register.h"
+#include "migration/blocker.h"
+#include "migration/misc.h"
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/ram_addr.h"
+#include "pci.h"
+#include "trace.h"
+
+static void vfio_migration_region_exit(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return;
+}
+
+if (migration->region.size) {
+vfio_region_exit(&migration->region);
+vfio_region_finalize(&migration->region);
+}
+}
+
+static int vfio_migration_region_init(VFIODevice *vbasedev, int index)
+{
+VFIOMigration *migration = vbasedev->migration;
+Object *obj = NULL;
+int ret = -EINVAL;
+
+if (!vbasedev->ops->vfio_get_object) {
+return ret;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+if (!obj) {
+return ret;
+}
+
+ret = vfio_region_setup(obj, vbasedev, &migration->region, index,
+"migration");
+if (ret) {
+error_report("%s: Failed to setup VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+if (!migration->region.size) {
+ret = -EINVAL;
+error_report("%s: Invalid region size of VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+return 0;
+
+err:
+vfio_migration_region_exit(vbasedev);
+return ret;
+}
+
+static int vfio_migration_init(VFIODevice *vbasedev,
+   struct vfio_region_info *info)
+{
+int ret;
+
+vbasedev->migration = g_new0(VFIOMigration, 1);
+
+ret = vfio_migration_region_init(vbasedev, info->index);
+if (ret) {
+error_report("%s: Failed to initialise migration region",
+ vbasedev->name);
+g_free(vbasedev->migration);
+vbasedev->migration = NULL;
+return ret;
+}
+
+return 0;
+}
+
+/* -- */
+
+int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_region_info *info;
+Error *local_err = NULL;
+int ret;
+
+ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
+   VFIO_REGION_SUBTYPE_MIGRATION, &info);
+if (ret) {
+goto add_blocker;
+}
+
+ret = vfio_migration_init(vbasedev, info);
+if (ret) {
+goto add_blocker;
+}
+
+trace_vfio_migration_probe(vbasedev->name, info->index);
+return 0;
+
+add_blocker:
+error_setg(&vbasedev->migration_blocker,
+   "VFIO device doesn't support migration");
+ret = migrate_add_blocker(vbasedev->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(vbasedev->migration_blocker);
+}
+return ret;
+}
+
+void vfio_migration_finalize(VFIODevice *vbasedev)
+{
+if (vbasedev->migration_blocker) {
+

Re: [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices

2020-05-04 Thread Kirti Wankhede





On 3/26/2020 11:16 PM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/pci.c | 163 ++
  include/hw/vfio/vfio-common.h |   2 +
  2 files changed, 165 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6c77c12e44b9..8deb11e87ef7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
  #include "trace.h"
  #include "qapi/error.h"
  #include "migration/blocker.h"
+#include "migration/qemu-file.h"
  
  #define TYPE_VFIO_PCI "vfio-pci"

  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
  }
  }
  
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)

+{
+PCIDevice *pdev = &vdev->pdev;
+VFIOBAR *bar = &vdev->bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+   PCI_BASE_ADDRESS_MEM_MASK);
+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;
+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = &vdev->bars[nr];
@@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
  return OBJECT(vdev);
  }
  
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)

+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be32(f, msi_data);
+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar = qemu_get_be32(f);
+
+vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
+}
+
+ret = vfio_bars_validate(vdev);


This isn't quite what I'd expected, since that validate

Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-05-04 Thread Kirti Wankhede





On 4/1/2020 11:06 PM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 76 
  hw/vfio/trace-events |  2 ++
  2 files changed, 78 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 22ded9d28cf3..033f76526e49 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,6 +8,7 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
  #include 
  
  #include "sysemu/runstate.h"

@@ -24,6 +25,17 @@
  #include "pci.h"
  #include "trace.h"
  
+/*

+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
  static void vfio_migration_region_exit(VFIODevice *vbasedev)
  {
  VFIOMigration *migration = vbasedev->migration;
@@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+/* -- */

+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(&migration->region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+/*
+ * Save migration region size. This is used to verify migration region size
+ * is greater than or equal to migration region size at destination
+ */
+qemu_put_be64(f, migration->region.size);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);


OK, good, so now we can change that to something else if you want to
migrate something extra in the future.


+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+trace_vfio_save_setup(vbasedev->name);


I'd put that trace at the start of the function.


+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(&migration->region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
  static void vfio_vmstate_change(void *opaque, int running, RunState state)
  {
  VFIODevice *vbasedev = opaque;
@@ -191,6 +266,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
  return ret;
  }
  
+register_savevm_live("vfio", -1, 1, &savevm_vfio_handlers, vbasedev);


That doesn't look right to me;  firstly the -1 should now be
VMSTATE_INSTANCE_ID_ANY - after the recent change in commit 1df2c9a

Have you tried this with two vfio devices?


Yes. And it works with multiple vfio devices.

Thanks,
Kirti


This is quite rare - it's an iterative device that can have
multiple instances;  if you look at 'ram' for example, all the RAM
instances are handled inside the save_setup/save for the one instance of
'ram'.  I think here you're trying to register an individual vfio
device, so if you had multiple devices you'd see this called twice.

So either you need to make vfio_save_* do all of the devices in a loop -
which feels like a bad idea;  or replace "vfio" in that call by a unique
device name;  as long as your device has a bus path then you should be
able to use the same trick vmstate_register_with_alias_id does, and use
I think,  vmstate_if_get_id(VMSTAETE_IF(vbasedev)).

but it might take some experimentati

[PATCH v18 QEMU 16/18] vfio: Add ioctl to get dirty pages bitmap during dma unmap.

2020-05-04 Thread Kirti Wankhede

With vIOMMU, IO virtual address range can get unmapped while in pre-copy
phase of migration. In that case, unmap ioctl should return pages pinned
in that range and QEMU should find its correcponding guest physical
addresses and report those dirty.

Note: This patch is not yet tested. I'm trying to see how I can test this
code path.

Suggested-by: Alex Williamson 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c | 79 +---
 1 file changed, 75 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4277b275ca21..b94e2bcb1178 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -311,11 +311,77 @@ static bool vfio_devices_are_stopped_and_saving(void)
 return true;
 }
 
+static bool vfio_devices_are_running_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_dma_unmap *unmap;
+struct vfio_bitmap *bitmap;
+uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS;
+int ret;
+
+unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
+if (!unmap) {
+return -ENOMEM;
+}
+
+unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
+unmap->flags |= VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP;
+bitmap = (struct vfio_bitmap *)&unmap->data;
+
+/*
+ * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
+ * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to
+ * TARGET_PAGE_SIZE.
+ */
+
+bitmap->pgsize = TARGET_PAGE_SIZE;
+bitmap->size = ROUND_UP(pages / 8, sizeof(uint64_t));
+bitmap->data = g_malloc0(bitmap->size);
+if (!bitmap->data) {
+error_report("UNMAP: Error allocating bitmap of size 0x%llx",
+ bitmap->size);
+g_free(unmap);
+return -ENOMEM;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
+if (!ret) {
+cpu_physical_memory_set_dirty_lebitmap((uint64_t *)bitmap->data,
+iotlb->translated_addr, pages);
+} else {
+error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %d", -errno);
+}
+
+g_free(bitmap->data);
+g_free(unmap);
+return ret;
+}
+
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
-  hwaddr iova, ram_addr_t size)
+  hwaddr iova, ram_addr_t size,
+  IOMMUTLBEntry *iotlb)
 {
 struct vfio_iommu_type1_dma_unmap unmap = {
 .argsz = sizeof(unmap),
@@ -324,6 +390,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
 .size = size,
 };
 
+if (iotlb && container->dirty_pages_supported &&
+vfio_devices_are_running_and_saving()) {
+return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+}
+
 while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
 /*
  * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -371,7 +442,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
  * the VGA ROM space.
  */
 if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-(errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
  ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
 return 0;
 }
@@ -519,7 +590,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  iotlb->addr_mask + 1, vaddr, ret);
 }
 } else {
-ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
+ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
@@ -822,7 +893,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 
 if (try_unmap) {
-ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
-- 
2.7.0

Re: [PATCH v16 QEMU 13/16] vfio: Add function to start and stop dirty pages tracking

2020-05-04 Thread Kirti Wankhede





On 3/27/2020 12:40 AM, Alex Williamson wrote:

On Wed, 25 Mar 2020 02:39:11 +0530
Kirti Wankhede  wrote:


Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
---
  hw/vfio/migration.c | 36 
  1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ab295d25620e..1827b7cfb316 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -9,6 +9,7 @@
  
  #include "qemu/osdep.h"

  #include "qemu/main-loop.h"
+#include 
  #include 
  
  #include "sysemu/runstate.h"

@@ -296,6 +297,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
  return qemu_file_get_error(f);
  }
  
+static int vfio_start_dirty_page_tracking(VFIODevice *vbasedev, bool start)

+{
+int ret;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return 0;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}


Dirty logging and device saving are logically separate, why do we link
them here?



Dirty logging is associated with migration state and in vfio case we get 
to know that migration state for per device. We don't know which device 
is first or last. So start dirty page logging .save_setup. But this 
function can be called from other places also, so for sanity check start 
dirty pages tracking only when VFIO_DEVICE_STATE_SAVING flag is set.



Why do we return success when we want to start logging if we haven't
started logging?



It should be -EINVAL since dirty page tracking shouldn't start if 
VFIO_DEVICE_STATE_SAVING flag is not set, i.e. devices are not in SAVING 
state.



+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+}
+return ret;
+}
+
  /* -- */
  
  static int vfio_save_setup(QEMUFile *f, void *opaque)

@@ -330,6 +357,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
   */
  qemu_put_be64(f, migration->region.size);
  
+ret = vfio_start_dirty_page_tracking(vbasedev, true);

+if (ret) {
+return ret;
+}
+


Haven't we corrupted the migration stream by exiting here?  Maybe this
implies the entire migration fails, therefore we don't need to add the
end marker?  Thanks,



If returned error here means migration fails.

Thanks,
Kirti


Alex


  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
  
  ret = qemu_file_get_error(f);

@@ -346,6 +378,8 @@ static void vfio_save_cleanup(void *opaque)
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
  
+vfio_start_dirty_page_tracking(vbasedev, false);

+
  if (migration->region.mmaps) {
  vfio_region_unmap(&migration->region);
  }
@@ -669,6 +703,8 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  if (ret) {
  error_report("%s: Failed to set state RUNNING", vbasedev->name);
  }
+
+vfio_start_dirty_page_tracking(vbasedev, false);
  }
  }

[PATCH v18 QEMU 10/18] vfio: Add load state functions to SaveVMHandlers

2020-05-04 Thread Kirti Wankhede

Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
   staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 189 +++
 hw/vfio/trace-events |   3 +
 2 files changed, 192 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 97fbb0c2b301..3d11d212b1ce 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -270,6 +270,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
 /* -- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -436,12 +463,174 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(&migration->region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
+   VFIO_DEVICE_STATE_RESUMING);
+if (ret) {
+error_report("%s: Failed to set state RESUMING", vbasedev->name);
+}
+return ret;
+}
+
+static int vfio_load_cleanup(void *opaque)
+{
+vfio_save_cleanup(opaque);
+return 0;
+}
+
+static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+uint64_t data, data_size;
+
+data = qemu_get_be64(f);
+while (data != VFIO_MIG_FLAG_END_OF_STATE) {
+
+trace_vfio_load_state(vbasedev->name, data);
+
+switch (data) {
+case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
+{
+ret = vfio_load_device_config_state(f, opaque);
+if (ret) {
+return ret;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_SETUP_STATE:
+{
+uint64_t region_size = qemu_get_be64(f);
+
+if (migration->region.size < region_size) {
+error_report("%s: SETUP STATE: migration region too small, "
+ "0x%"PRIx64 " < 0x%"PRIx64, vbasedev->name,
+ migration->region.size, region_size);
+return -EINVAL;
+}
+
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {
+return ret;
+} else {
+error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_DATA_STATE:
+{
+VFIORegion *region = &migration->region;
+void *buf = NULL;
+bool buffer_mmaped = false;
+uint64_t data_offset = 0;
+
+data_size = qemu_get_be64(f);
+if (data_size == 0) {
+break;
+}
+
+ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
+region->fd_offset +
+offsetof(struct vfio_device_migration_info,
+data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s:Failed to get migration buffer data offset 
%d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (re

[PATCH v18 QEMU 07/18] vfio: Add migration state change notifier

2020-05-04 Thread Kirti Wankhede

Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to vendor
driver.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 30 ++
 hw/vfio/trace-events  |  5 +++--
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e79b34003079..c2f5564b51c3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -154,6 +154,28 @@ static void vfio_vmstate_change(void *opaque, int running, 
RunState state)
 }
 }
 
+static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+{
+MigrationState *s = data;
+VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state);
+int ret;
+
+trace_vfio_migration_state_notifier(vbasedev->name,
+MigrationStatus_str(s->state));
+
+switch (s->state) {
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_FAILED:
+ret = vfio_migration_set_state(vbasedev,
+  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
+  VFIO_DEVICE_STATE_RUNNING);
+if (ret) {
+error_report("%s: Failed to set state RUNNING", vbasedev->name);
+}
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -173,6 +195,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
+vbasedev->migration_state.notify = vfio_migration_state_notifier;
+add_migration_state_change_notifier(&vbasedev->migration_state);
+
 return 0;
 }
 
@@ -211,6 +236,11 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+
+if (vbasedev->migration_state.notify) {
+remove_migration_state_change_notifier(&vbasedev->migration_state);
+}
+
 if (vbasedev->vm_state) {
 qemu_del_vm_change_state_handler(vbasedev->vm_state);
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 14b0a86c0035..bd3d47b005cb 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -146,5 +146,6 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
-vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
-vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(const char *name, int running, const char *reason, 
uint32_t dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3d18eb146b33..28f55f66d019 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -123,6 +123,7 @@ typedef struct VFIODevice {
 VMChangeStateEntry *vm_state;
 uint32_t device_state;
 int vm_running;
+Notifier migration_state;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.7.0

[PATCH v18 QEMU 15/18] vfio: Get migration capability flags for container

2020-05-04 Thread Kirti Wankhede

Added helper functions to get IOMMU info capability chain.
Added function to get migration capability flags from that capability
chain for IOMMU container.

Similar change was proposed earlier:
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html

Signed-off-by: Kirti Wankhede 
Cc: Shameer Kolothum 
Cc: Eric Auger 
---
 hw/vfio/common.c  | 85 +++
 include/hw/vfio/vfio-common.h |  1 +
 2 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4bf864695a8e..4277b275ca21 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1398,6 +1398,69 @@ static int vfio_init_container(VFIOContainer *container, 
int group_fd,
 return 0;
 }
 
+static int vfio_get_iommu_info(VFIOContainer *container,
+   struct vfio_iommu_type1_info **info)
+{
+
+size_t argsz = sizeof(struct vfio_iommu_type1_info);
+
+*info = g_new0(struct vfio_iommu_type1_info, 1);
+again:
+(*info)->argsz = argsz;
+
+if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if (((*info)->argsz > argsz)) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+goto again;
+}
+
+return 0;
+}
+
+static struct vfio_info_cap_header *
+vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
+static void vfio_get_iommu_info_migration(VFIOContainer *container,
+ struct vfio_iommu_type1_info *info)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_iommu_type1_info_cap_migration *cap_mig;
+
+hdr = vfio_get_iommu_info_cap(info, VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION);
+if (!hdr) {
+return;
+}
+
+cap_mig = container_of(hdr, struct vfio_iommu_type1_info_cap_migration,
+header);
+
+if (cap_mig->flags & VFIO_IOMMU_INFO_CAPS_MIGRATION_DIRTY_PAGE_TRACK) {
+container->dirty_pages_supported = true;
+}
+}
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
   Error **errp)
 {
@@ -1462,6 +1525,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 container->space = space;
 container->fd = fd;
 container->error = NULL;
+container->dirty_pages_supported = false;
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 
@@ -1474,7 +1538,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-struct vfio_iommu_type1_info info;
+struct vfio_iommu_type1_info *info;
 
 /*
  * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
@@ -1483,15 +1547,20 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  * existing Type1 IOMMUs generally support any IOVA we're
  * going to actually try in practice.
  */
-info.argsz = sizeof(info);
-ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
-/* Ignore errors */
-if (ret || !(info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+ret = vfio_get_iommu_info(container, &info);
+if (ret) {
+goto free_container_exit;
+}
+
+if (!(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
 /* Assume 4k IOVA page size */
-info.iova_pgsizes = 4096;
+info->iova_pgsizes = 4096;
 }
-vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
-container->pgsizes = info.iova_pgsizes;
+vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
+container->pgsizes = info->iova_pgsizes;
+
+vfio_get_iommu_info_migration(container, info);
+g_free(info);
 break;
 }
 case VFIO_SPAPR_TCE_v2_IOMMU:
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c78033e4149d..8ab741463d50 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -79,6 +79,7 @@ typedef struct VFIOContainer {
 unsigned iommu_type;
 Error *error;
 bool initialized;
+bool dirty_pages_supported;
 unsigned long pgsizes;
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
-- 
2.7.0

[PATCH v18 QEMU 11/18] iommu: add callback to get address limit IOMMU supports

2020-05-04 Thread Kirti Wankhede

Add optional method to get address limit IOMMU supports

Signed-off-by: Kirti Wankhede 
---
 hw/i386/intel_iommu.c |  9 +
 include/exec/memory.h | 18 ++
 memory.c  | 11 +++
 3 files changed, 38 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index df7ad254ac15..d0b88c20c31e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3577,6 +3577,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 return;
 }
 
+static hwaddr vtd_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr)
+{
+VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
+IntelIOMMUState *s = vtd_as->iommu_state;
+
+return VTD_ADDRESS_SIZE(s->aw_bits) - 1;
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -3878,6 +3886,7 @@ static void 
vtd_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = vtd_iommu_translate;
 imrc->notify_flag_changed = vtd_iommu_notify_flag_changed;
 imrc->replay = vtd_iommu_replay;
+imrc->get_address_limit = vtd_iommu_get_address_limit;
 }
 
 static const TypeInfo vtd_iommu_memory_region_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e000bd2f97b2..2d0cbd46d2a6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -355,6 +355,16 @@ typedef struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/*
+ * Return address limit this IOMMU supports.
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_iommu_get_address_limit() will return 0.
+ *
+ * @iommu: the IOMMUMemoryRegion
+ */
+hwaddr (*get_address_limit)(IOMMUMemoryRegion *iommu);
 } IOMMUMemoryRegionClass;
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1364,6 +1374,14 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
 /**
+ * memory_region_iommu_get_address_limit : return the maximum address limit
+ * that this IOMMU supports.
+ *
+ * @iommu_mr: the memory region
+ */
+hwaddr memory_region_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr);
+
+/**
  * memory_region_name: get a memory region's name
  *
  * Returns the string that was used to initialize the memory region.
diff --git a/memory.c b/memory.c
index 601b74990620..52f1a4cd37f0 100644
--- a/memory.c
+++ b/memory.c
@@ -1887,6 +1887,17 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 }
 }
 
+hwaddr memory_region_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+
+if (imrc->get_address_limit) {
+return imrc->get_address_limit(iommu_mr);
+}
+
+return 0;
+}
+
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
  IOMMUNotifier *n)
 {
-- 
2.7.0

[PATCH v18 QEMU 17/18] vfio: Make vfio-pci device migration capable

2020-05-04 Thread Kirti Wankhede

If device is not failover primary device call vfio_migration_probe()
and vfio_migration_finalize() functions for vfio-pci device to enable
migration for vfio PCI device which support migration.
Removed vfio_pci_vmstate structure.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 32 +++-
 hw/vfio/pci.h |  1 -
 2 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b1239ba5b283..a104e0def94f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2916,22 +2916,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
-if (!pdev->failover_pair_id) {
-error_setg(&vdev->migration_blocker,
-"VFIO device doesn't support migration");
-ret = migrate_add_blocker(vdev->migration_blocker, &err);
-if (ret) {
-error_propagate(errp, err);
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-return;
-}
-}
-
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = &vfio_pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
 vdev->vbasedev.dev = DEVICE(vdev);
+vdev->vbasedev.device_state = 0;
 
 tmp = g_strdup_printf("%s/iommu_group", vdev->vbasedev.sysfsdev);
 len = readlink(tmp, group_path, sizeof(group_path));
@@ -3195,6 +3184,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 }
 }
 
+if (!pdev->failover_pair_id) {
+ret = vfio_migration_probe(&vdev->vbasedev, errp);
+if (ret) {
+error_report("%s: Failed to setup for migration",
+ vdev->vbasedev.name);
+}
+}
+
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
@@ -3209,11 +3206,6 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3225,10 +3217,7 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-}
+
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3256,6 +3245,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 }
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
+vfio_migration_finalize(&vdev->vbasedev);
 }
 
 static void vfio_pci_reset(DeviceState *dev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 0da7a20a7ec2..b148c937ef72 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,7 +168,6 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
-Error *migration_blocker;
 Notifier irqchip_change_notifier;
 } VFIOPCIDevice;
 
-- 
2.7.0

[PATCH v18 QEMU 09/18] vfio: Add save state functions to SaveVMHandlers

2020-05-04 Thread Kirti Wankhede

Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 245 +-
 hw/vfio/trace-events  |   6 ++
 include/hw/vfio/vfio-common.h |   1 +
 3 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index efadc04c9fe7..97fbb0c2b301 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -139,6 +139,137 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+static void *find_data_region(VFIORegion *region,
+  uint64_t data_offset,
+  uint64_t data_size)
+{
+void *ptr = NULL;
+int i;
+
+for (i = 0; i < region->nr_mmaps; i++) {
+if ((data_offset >= region->mmaps[i].offset) &&
+(data_offset < region->mmaps[i].offset + region->mmaps[i].size) &&
+(data_size <= region->mmaps[i].size)) {
+ptr = region->mmaps[i].mmap + (data_offset -
+   region->mmaps[i].offset);
+break;
+}
+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = &migration->region;
+uint64_t data_offset = 0, data_size = 0;
+int ret;
+
+ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s: Failed to get migration buffer data offset %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+ret = pread(vbasedev->fd, &data_size, sizeof(data_size),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_size));
+if (ret != sizeof(data_size)) {
+error_report("%s: Failed to get migration buffer data size %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (data_size > 0) {
+void *buf = NULL;
+bool buffer_mmaped;
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset, data_size);
+}
+
+buffer_mmaped = (buf != NULL);
+
+if (!buffer_mmaped) {
+buf = g_try_malloc(data_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+
+ret = pread(vbasedev->fd, buf, data_size,
+region->fd_offset + data_offset);
+if (ret != data_size) {
+error_report("%s: Failed to get migration data %d",
+ vbasedev->name, ret);
+g_free(buf);
+return -EINVAL;
+}
+}
+
+qemu_put_be64(f, data_size);
+qemu_put_buffer(f, buf, data_size);
+
+if (!buffer_mmaped) {
+g_free(buf);
+}
+} else {
+qemu_put_be64(f, data_size);
+}
+
+trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+   migration->pending_bytes);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return data_size;
+}
+
+static int vfio_update_pending(VFIO

Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-05-04 Thread Kirti Wankhede





On 3/26/2020 2:32 AM, Alex Williamson wrote:

On Wed, 25 Mar 2020 02:39:06 +0530
Kirti Wankhede  wrote:


Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 76 
  hw/vfio/trace-events |  2 ++
  2 files changed, 78 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 22ded9d28cf3..033f76526e49 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,6 +8,7 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
  #include 
  
  #include "sysemu/runstate.h"

@@ -24,6 +25,17 @@
  #include "pci.h"
  #include "trace.h"
  
+/*

+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
  static void vfio_migration_region_exit(VFIODevice *vbasedev)
  {
  VFIOMigration *migration = vbasedev->migration;
@@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+/* -- */

+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(&migration->region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+/*
+ * Save migration region size. This is used to verify migration region size
+ * is greater than or equal to migration region size at destination
+ */
+qemu_put_be64(f, migration->region.size);


Is this requirement supported by the uapi?  


Yes, on UAPI thread we discussed this:

 * For the user application, data is opaque. The user application 
should write

 * data in the same order as the data is received and the data should be of
 * same transaction size at the source.

data should be same transaction size, so migration region size should be 
greater than or equal to the size at source when verifying at destination.



The vendor driver operates
within the migration region, but it has no requirement to use the full
extent of the region.  Shouldn't we instead insert the version string
from versioning API Yan proposed?  Is this were we might choose to use
an interface via the vfio API rather than sysfs if we had one?



VFIO API cannot be used by libvirt or management tool stack. We need 
sysfs as Yan proposed to be used by libvirt or management tool stack.


Thanks,
Kirti


+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+trace_vfio_save_setup(vbasedev->name);
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(&migration->region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
  static void vfio_vmstate_change(void *opaque, int running, RunState state)
  {
  VFIODevice *vbasedev = opaque;
@@ -191,6 +266,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
  return ret;
  }
  
+register_savevm_live("vfio", -1, 1, &savevm_vfio_handlers, vbasedev);

  vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
vbasedev);
  
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-eve

Re: [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices

2020-05-04 Thread Kirti Wankhede





On 3/26/2020 1:26 AM, Alex Williamson wrote:

On Wed, 25 Mar 2020 02:39:02 +0530
Kirti Wankhede  wrote:


These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/pci.c | 163 ++
  include/hw/vfio/vfio-common.h |   2 +
  2 files changed, 165 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6c77c12e44b9..8deb11e87ef7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
  #include "trace.h"
  #include "qapi/error.h"
  #include "migration/blocker.h"
+#include "migration/qemu-file.h"
  
  #define TYPE_VFIO_PCI "vfio-pci"

  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
  }
  }
  
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)

+{
+PCIDevice *pdev = &vdev->pdev;
+VFIOBAR *bar = &vdev->bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+   PCI_BASE_ADDRESS_MEM_MASK);


Nit, &= or combine with previous set.


+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;


Could we use a union?


+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}


What specifically are we validating here?  This should be true no
matter what we wrote to the BAR or else BAR emulation is broken.  The
bits that could make this unaligned are not implemented in the BAR.


+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = &vdev->bars[nr];
@@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
  return OBJECT(vdev);
  }
  
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)

+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be32(f, msi_data);


Isn't the data field only a u16?



Yes, fixing it.


+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+

[PATCH v18 QEMU 18/18] qapi: Add VFIO devices migration stats in Migration stats

2020-05-04 Thread Kirti Wankhede

Added amount of bytes transferred to the target VM by all VFIO devices

Signed-off-by: Kirti Wankhede 
---
 hw/vfio/common.c| 20 
 hw/vfio/migration.c | 10 +-
 include/qemu/vfio-helpers.h |  3 +++
 migration/migration.c   | 12 
 monitor/hmp-cmds.c  |  6 ++
 qapi/migration.json | 19 ++-
 6 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b94e2bcb1178..53455946b31b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "qemu/vfio-helpers.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -293,6 +294,25 @@ const MemoryRegionOps vfio_region_ops = {
  * Device state interfaces
  */
 
+bool vfio_mig_active(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+if (QLIST_EMPTY(&vfio_group_list)) {
+return false;
+}
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+if (vbasedev->migration_blocker) {
+return false;
+}
+}
+}
+return true;
+}
+
 static bool vfio_devices_are_stopped_and_saving(void)
 {
 VFIOGroup *group;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 7d1f64a96676..250a24d4b9ff 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -26,7 +26,7 @@
 #include "exec/ram_addr.h"
 #include "pci.h"
 #include "trace.h"
-
+#include "qemu/vfio-helpers.h"
 /*
  * Flags used as delimiter:
  * 0x => MSB 32-bit all 1s
@@ -38,6 +38,8 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
 
+static int64_t bytes_transferred;
+
 static void vfio_migration_region_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -229,6 +231,7 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice 
*vbasedev)
 return ret;
 }
 
+bytes_transferred += data_size;
 return data_size;
 }
 
@@ -750,6 +753,11 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 
 /* -- */
 
+int64_t vfio_mig_bytes_transferred(void)
+{
+return bytes_transferred;
+}
+
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
 struct vfio_region_info *info;
diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 1f057c2b9e40..26a7df0767b1 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -29,4 +29,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, 
void *bar,
 int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
int irq_type, Error **errp);
 
+bool vfio_mig_active(void);
+int64_t vfio_mig_bytes_transferred(void);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c2d..9d763447261c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -54,6 +54,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/vfio-helpers.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -967,6 +968,15 @@ static void populate_disk_info(MigrationInfo *info)
 }
 }
 
+static void populate_vfio_info(MigrationInfo *info)
+{
+if (vfio_mig_active()) {
+info->has_vfio = true;
+info->vfio = g_malloc0(sizeof(*info->vfio));
+info->vfio->bytes = vfio_mig_bytes_transferred();
+}
+}
+
 static void fill_source_migration_info(MigrationInfo *info)
 {
 MigrationState *s = migrate_get_current();
@@ -992,6 +1002,7 @@ static void fill_source_migration_info(MigrationInfo *info)
 populate_time_info(info, s);
 populate_ram_info(info, s);
 populate_disk_info(info);
+populate_vfio_info(info);
 break;
 case MIGRATION_STATUS_COLO:
 info->has_status = true;
@@ -1000,6 +1011,7 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 case MIGRATION_STATUS_COMPLETED:
 populate_time_info(info, s);
 populate_ram_info(info, s);
+populate_vfio_info(info);
 break;
 case MIGRATION_STATUS_FAILED:
 info->has_status = true;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 7f6e982dc834..d04bc042f2fe 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -353,6 +353,12 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 }
 monitor_printf(mon, "]\n");
 }
+
+if (info->has_vfio) {
+monitor_printf(mon, "vfio device bytes: %" PRIu64 " kbytes\n",
+   info->vfio->bytes >> 10);
+}
+
 qapi_free_MigrationInfo(info);
 }
 
diff --git a/qapi/migration.json b/qapi/migration.json
index eca2981d0a33..a06ecae89b3d 100644
--

[PATCH v18 QEMU 08/18] vfio: Register SaveVMHandlers for VFIO device

2020-05-04 Thread Kirti Wankhede

Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 79 
 hw/vfio/trace-events |  2 ++
 2 files changed, 81 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c2f5564b51c3..efadc04c9fe7 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,12 +8,14 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include 
 
 #include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
+#include "migration/vmstate.h"
 #include "migration/qemu-file.h"
 #include "migration/register.h"
 #include "migration/blocker.h"
@@ -24,6 +26,17 @@
 #include "pci.h"
 #include "trace.h"
 
+/*
+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
 static void vfio_migration_region_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -126,6 +139,70 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+/* -- */
+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+trace_vfio_save_setup(vbasedev->name);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(&migration->region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+/*
+ * Save migration region size. This is used to verify migration region size
+ * is greater than or equal to migration region size at destination
+ */
+qemu_put_be64(f, migration->region.size);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(&migration->region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
 static void vfio_vmstate_change(void *opaque, int running, RunState state)
 {
 VFIODevice *vbasedev = opaque;
@@ -192,6 +269,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+register_savevm_live("vfio", VMSTATE_INSTANCE_ID_ANY, 1,
+ &savevm_vfio_handlers, vbasedev);
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index bd3d47b005cb..86c18def016e 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -149,3 +149,5 @@ vfio_migration_probe(const char *name, uint32_t index) " 
(%s) Region %d"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, 
uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
+vfio_save_setup(const char *name) " (%s)"
+vfio_save_cleanup(const char *name) " (%s)"
-- 
2.7.0

[PATCH v18 QEMU 06/18] vfio: Add VM state change handler to know state of VM

2020-05-04 Thread Kirti Wankhede

VM state change handler gets called on change in VM's state. This is used to set
VFIO device state to _RUNNING.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 87 +++
 hw/vfio/trace-events  |  2 +
 include/hw/vfio/vfio-common.h |  4 ++
 3 files changed, 93 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index bf9384907ec0..e79b34003079 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include 
 
+#include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
@@ -74,6 +75,85 @@ err:
 return ret;
 }
 
+static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
+uint32_t value)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = &migration->region;
+uint32_t device_state;
+int ret;
+
+ret = pread(vbasedev->fd, &device_state, sizeof(device_state),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+  device_state));
+if (ret < 0) {
+error_report("%s: Failed to read device state %d %s",
+ vbasedev->name, ret, strerror(errno));
+return ret;
+}
+
+device_state = (device_state & mask) | value;
+
+if (!VFIO_DEVICE_STATE_VALID(device_state)) {
+return -EINVAL;
+}
+
+ret = pwrite(vbasedev->fd, &device_state, sizeof(device_state),
+ region->fd_offset + offsetof(struct 
vfio_device_migration_info,
+  device_state));
+if (ret < 0) {
+error_report("%s: Failed to set device state %d %s",
+ vbasedev->name, ret, strerror(errno));
+
+ret = pread(vbasedev->fd, &device_state, sizeof(device_state),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+device_state));
+if (ret < 0) {
+error_report("%s: On failure, failed to read device state %d %s",
+vbasedev->name, ret, strerror(errno));
+return ret;
+}
+
+if (VFIO_DEVICE_STATE_IS_ERROR(device_state)) {
+error_report("%s: Device is in error state 0x%x",
+ vbasedev->name, device_state);
+return -EFAULT;
+}
+}
+
+vbasedev->device_state = device_state;
+trace_vfio_migration_set_state(vbasedev->name, device_state);
+return 0;
+}
+
+static void vfio_vmstate_change(void *opaque, int running, RunState state)
+{
+VFIODevice *vbasedev = opaque;
+
+if ((vbasedev->vm_running != running)) {
+int ret;
+uint32_t value = 0, mask = 0;
+
+if (running) {
+value = VFIO_DEVICE_STATE_RUNNING;
+if (vbasedev->device_state & VFIO_DEVICE_STATE_RESUMING) {
+mask = ~VFIO_DEVICE_STATE_RESUMING;
+}
+} else {
+mask = ~VFIO_DEVICE_STATE_RUNNING;
+}
+
+ret = vfio_migration_set_state(vbasedev, mask, value);
+if (ret) {
+error_report("%s: Failed to set device state 0x%x",
+ vbasedev->name, value & mask);
+}
+vbasedev->vm_running = running;
+trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
+  value & mask);
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -90,6 +170,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
+  vbasedev);
+
 return 0;
 }
 
@@ -128,6 +211,10 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+if (vbasedev->vm_state) {
+qemu_del_vm_change_state_handler(vbasedev->vm_state);
+}
+
 if (vbasedev->migration_blocker) {
 migrate_del_blocker(vbasedev->migration_blocker);
 error_free(vbasedev->migration_blocker);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index fd034ac53684..14b0a86c0035 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -146,3 +146,5 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d4b268641173..3d18eb146b33 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
 #ifdef CO

[PATCH v18 QEMU 02/18] vfio: Add function to unmap VFIO region

2020-05-04 Thread Kirti Wankhede

This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/common.c  | 20 
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 22 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0c4..4a2f0d6a2233 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -983,6 +983,26 @@ int vfio_region_mmap(VFIORegion *region)
 return 0;
 }
 
+void vfio_region_unmap(VFIORegion *region)
+{
+int i;
+
+if (!region->mem) {
+return;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+trace_vfio_region_unmap(memory_region_name(®ion->mmaps[i].mem),
+region->mmaps[i].offset,
+region->mmaps[i].offset +
+region->mmaps[i].size - 1);
+memory_region_del_subregion(region->mem, ®ion->mmaps[i].mem);
+munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+object_unparent(OBJECT(®ion->mmaps[i].mem));
+region->mmaps[i].mmap = NULL;
+}
+}
+
 void vfio_region_exit(VFIORegion *region)
 {
 int i;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a33ffd..8cdc27946cb8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -111,6 +111,7 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) 
"Region %s unmap [0x%lx - 0x%lx]"
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd564209ac71..8d7a0fbb1046 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -171,6 +171,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
   int index, const char *name);
 int vfio_region_mmap(VFIORegion *region);
 void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
-- 
2.7.0

[PATCH v18 QEMU 14/18] vfio: Add vfio_listener_log_sync to mark dirty pages

2020-05-04 Thread Kirti Wankhede

vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c | 183 +--
 hw/vfio/trace-events |   1 +
 2 files changed, 179 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4a2f0d6a2233..4bf864695a8e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -29,6 +29,7 @@
 #include "hw/vfio/vfio.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
+#include "exec/ram_addr.h"
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -38,6 +39,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/migration.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
 };
 
 /*
+ * Device state interfaces
+ */
+
+static bool vfio_devices_are_stopped_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, &vfio_group_list, next) {
+QLIST_FOREACH(vbasedev, &group->device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+/*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
@@ -408,8 +432,8 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
 }
 
 /* Called with rcu_read_lock held.  */
-static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
-   bool *read_only)
+static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
+   ram_addr_t *ram_addr, bool *read_only)
 {
 MemoryRegion *mr;
 hwaddr xlat;
@@ -440,9 +464,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return false;
 }
 
-*vaddr = memory_region_get_ram_ptr(mr) + xlat;
-*read_only = !writable || mr->readonly;
+if (vaddr) {
+*vaddr = memory_region_get_ram_ptr(mr) + xlat;
+}
 
+if (ram_addr) {
+*ram_addr = memory_region_get_ram_addr(mr) + xlat;
+}
+
+if (read_only) {
+*read_only = !writable || mr->readonly;
+}
 return true;
 }
 
@@ -467,7 +499,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 rcu_read_lock();
 
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
+if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
 goto out;
 }
 /*
@@ -813,9 +845,150 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 }
 
+static int vfio_get_dirty_bitmap(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+VFIOGuestIOMMU *giommu;
+IOMMUTLBEntry iotlb;
+hwaddr granularity, iova, iova_end;
+int ret;
+
+if (memory_region_is_iommu(section->mr)) {
+QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+if (MEMORY_REGION(giommu->iommu) == section->mr &&
+giommu->n.start == section->offset_within_region) {
+break;
+}
+}
+
+if (!giommu) {
+return -EINVAL;
+}
+}
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+
+if (memory_region_is_iommu(section->mr)) {
+hwaddr iova_size;
+
+granularity = memory_region_iommu_get_min_page_size(giommu->iommu);
+iova_end = memory_region_iommu_get_address_limit(giommu->iommu);
+
+if (iova_end) {
+iova_size = MIN(int128_get64(section->size), iova_end - iova + 1);
+} else {
+iova_size = int128_get64(section->size);
+}
+
+iova_end = iova + iova_size - 1;
+} else {
+granularity = memory_region_size(section->mr);
+iova_end = iova + int128_get64(section->size) - 1;
+}
+
+RCU_READ_LOCK_GUARD();
+
+while (iova < iova_end) {
+struct vfio_iommu_type1_dirty_bitmap *dbitmap;
+struct vfio_iommu_type1_dirty_bitmap_get *range;
+ram_addr_t start, pages;
+uint64_t iova_xlat, size;
+
+if (memory_region_is_iommu(section->mr)) {
+iotlb = address_space_get_iotlb_entry(container->space->as, iova,
+ true, MEMTXATTRS_UNSPECIFIED);
+if ((iotlb.target_as == NULL) ||

[PATCH v18 QEMU 05/18] vfio: Add migration region initialization and finalize function

2020-05-04 Thread Kirti Wankhede

- Migration functions are implemented for VFIO_DEVICE_TYPE_PCI device in this
  patch series.
- VFIO device supports migration or not is decided based of migration region
  query. If migration region query is successful and migration region
  initialization is successful then migration is supported else migration is
  blocked.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/Makefile.objs |   2 +-
 hw/vfio/migration.c   | 138 ++
 hw/vfio/trace-events  |   3 +
 include/hw/vfio/vfio-common.h |   9 +++
 4 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 hw/vfio/migration.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 9bb1c09e8477..8b296c889ed9 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += common.o spapr.o
+obj-y += common.o spapr.o migration.o
 obj-$(CONFIG_VFIO_PCI) += pci.o pci-quirks.o display.o
 obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_VFIO_PLATFORM) += platform.o
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
new file mode 100644
index ..bf9384907ec0
--- /dev/null
+++ b/hw/vfio/migration.c
@@ -0,0 +1,138 @@
+/*
+ * Migration support for VFIO devices
+ *
+ * Copyright NVIDIA, Inc. 2020
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "cpu.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "migration/register.h"
+#include "migration/blocker.h"
+#include "migration/misc.h"
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/ram_addr.h"
+#include "pci.h"
+#include "trace.h"
+
+static void vfio_migration_region_exit(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return;
+}
+
+if (migration->region.size) {
+vfio_region_exit(&migration->region);
+vfio_region_finalize(&migration->region);
+}
+}
+
+static int vfio_migration_region_init(VFIODevice *vbasedev, int index)
+{
+VFIOMigration *migration = vbasedev->migration;
+Object *obj = NULL;
+int ret = -EINVAL;
+
+if (!vbasedev->ops->vfio_get_object) {
+return ret;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+if (!obj) {
+return ret;
+}
+
+ret = vfio_region_setup(obj, vbasedev, &migration->region, index,
+"migration");
+if (ret) {
+error_report("%s: Failed to setup VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+if (!migration->region.size) {
+ret = -EINVAL;
+error_report("%s: Invalid region size of VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+return 0;
+
+err:
+vfio_migration_region_exit(vbasedev);
+return ret;
+}
+
+static int vfio_migration_init(VFIODevice *vbasedev,
+   struct vfio_region_info *info)
+{
+int ret;
+
+vbasedev->migration = g_new0(VFIOMigration, 1);
+
+ret = vfio_migration_region_init(vbasedev, info->index);
+if (ret) {
+error_report("%s: Failed to initialise migration region",
+ vbasedev->name);
+g_free(vbasedev->migration);
+vbasedev->migration = NULL;
+return ret;
+}
+
+return 0;
+}
+
+/* -- */
+
+int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_region_info *info;
+Error *local_err = NULL;
+int ret;
+
+ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
+   VFIO_REGION_SUBTYPE_MIGRATION, &info);
+if (ret) {
+goto add_blocker;
+}
+
+ret = vfio_migration_init(vbasedev, info);
+if (ret) {
+goto add_blocker;
+}
+
+trace_vfio_migration_probe(vbasedev->name, info->index);
+return 0;
+
+add_blocker:
+error_setg(&vbasedev->migration_blocker,
+   "VFIO device doesn't support migration");
+ret = migrate_add_blocker(vbasedev->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(vbasedev->migration_blocker);
+}
+return ret;
+}
+
+void vfio_migration_finalize(VFIODevice *vbasedev)
+{
+if (vbasedev->migration_blocker) {
+migrate_del_blocker(vbasedev->migration_blocker);
+error_free(vbasedev->migration_blocker);
+}
+
+vfio_migration_region_exit(vbasedev);
+g_free(vbasedev->migration);
+}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8cdc27946cb8..fd034ac53684 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -143,3 +143,6 @@ vfio_display_e

[PATCH v18 QEMU 01/18] vfio: KABI for migration interface - Kernel header placeholder

2020-05-04 Thread Kirti Wankhede

Kernel header patches are being reviewed along with kernel side changes.
This patch is only for place holder.
Link to Kernel patch set:
https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg03225.html

This patch include all changes in vfio.h from above patch set

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 linux-headers/linux/vfio.h | 311 -
 1 file changed, 309 insertions(+), 2 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index a41c45286511..0f1a6aa559f9 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2. Save the state or suspend
+ * |->|-->|
+ *
+ * 3. Save the state during live migration
+ * |--->|>|-->

[PATCH v18 QEMU 04/18] vfio: Add save and load functions for VFIO PCI devices

2020-05-04 Thread Kirti Wankhede

These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 163 ++
 include/hw/vfio/vfio-common.h |   2 +
 2 files changed, 165 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6c77c12e44b9..b1239ba5b283 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 }
 }
 
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
+{
+PCIDevice *pdev = &vdev->pdev;
+VFIOBAR *bar = &vdev->bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo &= (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+  PCI_BASE_ADDRESS_MEM_MASK);
+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;
+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
 VFIOBAR *bar = &vdev->bars[nr];
@@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
 return OBJECT(vdev);
 }
 
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be16(f, msi_data);
+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = &vdev->pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar = qemu_get_be32(f);
+
+vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
+}
+
+ret = vfio_bars_validate(vdev);
+if (ret) {
+return ret;
+}
+
+interrupt_type = qemu_get_be32(f);
+
+if (interrupt_type == VFIO_INT_MSI) {
+/* restore msi configuration */
+msi_flags = pci_defaul

[PATCH v18 QEMU 12/18] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

2020-05-04 Thread Kirti Wankhede

Signed-off-by: Kirti Wankhede 
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 52f1a4cd37f0..5b868fe5eab3 100644
--- a/memory.c
+++ b/memory.c
@@ -1788,7 +1788,7 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
 uint8_t mask = mr->dirty_log_mask;
-if (global_dirty_log && mr->ram_block) {
+if (global_dirty_log && (mr->ram_block || memory_region_is_iommu(mr))) {
 mask |= (1 << DIRTY_MEMORY_MIGRATION);
 }
 return mask;
-- 
2.7.0

Re: [PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers

2020-05-04 Thread Kirti Wankhede





On 3/26/2020 3:33 AM, Alex Williamson wrote:

On Wed, 25 Mar 2020 02:39:07 +0530
Kirti Wankhede  wrote:


Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
   buffer.
- read data_size - amount of data in bytes written by vendor driver in
   migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
doesn't need to be from vendor driver. Any other special config state
from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
buffer.
d. read data_size - amount of data in bytes written by vendor driver in
migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c   | 245 +-
  hw/vfio/trace-events  |   6 ++
  include/hw/vfio/vfio-common.h |   1 +
  3 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 033f76526e49..ecbeed5182c2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -138,6 +138,137 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+static void *find_data_region(VFIORegion *region,

+  uint64_t data_offset,
+  uint64_t data_size)
+{
+void *ptr = NULL;
+int i;
+
+for (i = 0; i < region->nr_mmaps; i++) {
+if ((data_offset >= region->mmaps[i].offset) &&
+(data_offset < region->mmaps[i].offset + region->mmaps[i].size) &&
+(data_size <= region->mmaps[i].size)) {


(data_offset - region->mmaps[i].offset) can be non-zero, so this test
is invalid.  Additionally the uapi does not require that a give data
chunk fits exclusively within an mmap'd area, it may overlap one or
more mmap'd sections of the region, possibly with non-mmap'd areas
included.



What's the advantage of having mmap and non-mmap overlapped regions?
Isn't it better to have data section either mapped or trapped?


+ptr = region->mmaps[i].mmap + (data_offset -
+   region->mmaps[i].offset);
+break;
+}
+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = &migration->region;
+uint64_t data_offset = 0, data_size = 0;
+int ret;
+
+ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s: Failed to get migration buffer data offset %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+ret = pread(vbasedev->fd, &data_size, sizeof(data_size),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_size));
+if (ret != sizeof(data_size)) {
+error_report("%s: Failed to get migration buffer data size %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (data_size > 0) {
+void *buf = NULL;
+bool buffer_mmaped;
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset, data_size);
+}
+
+buffer_mmaped = (buf != NULL) ? true : false;


The ternary is unnecessary, "? true : false" is redundant.



Removing it.


+
+if (!buffer_mmaped) {
+buf = g_try_malloc0(data_size);


Why do we need zero'd memory?



Zeroed memory not required, removing 0


+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+
+ret = pread(vbasedev->fd, buf, data_size,
+region->fd_offset + data_offset);
+i

[PATCH v18 QEMU 03/18] vfio: Add vfio_get_object callback to VFIODeviceOps

2020-05-04 Thread Kirti Wankhede

Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Suggested-by: Cornelia Huck 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/pci.c | 8 
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5e75a95129ac..6c77c12e44b9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2407,10 +2407,18 @@ static void vfio_pci_compute_needs_reset(VFIODevice 
*vbasedev)
 }
 }
 
+static Object *vfio_pci_get_object(VFIODevice *vbasedev)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+return OBJECT(vdev);
+}
+
 static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_intx_eoi,
+.vfio_get_object = vfio_pci_get_object,
 };
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8d7a0fbb1046..74261feaeac9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -119,6 +119,7 @@ struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
+Object *(*vfio_get_object)(VFIODevice *vdev);
 };
 
 typedef struct VFIOGroup {
-- 
2.7.0

[PATCH v18 QEMU 00/18] Add migration support for VFIO devices

2020-05-04 Thread Kirti Wankhede

Hi,

This Patch set adds migration support for VFIO devices in QEMU.

This Patch set include patches as below:
Patch 1:
- Define KABI for VFIO device for migration support for device state and newly
  added ioctl definations to get dirty pages bitmap. This is a placeholder
  patch.

Patch 2-4:
- Few code refactor
- Added save and restore functions for PCI configuration space

Patch 5-10:
- Generic migration functionality for VFIO device.
  * This patch set adds functionality only for PCI devices, but can be
extended to other VFIO devices.
  * Added all the basic functions required for pre-copy, stop-and-copy and
resume phases of migration.
  * Added state change notifier and from that notifier function, VFIO
device's state changed is conveyed to VFIO device driver.
  * During save setup phase and resume/load setup phase, migration region
is queried and is used to read/write VFIO device data.
  * .save_live_pending and .save_live_iterate are implemented to use QEMU's
functionality of iteration during pre-copy phase.
  * In .save_live_complete_precopy, that is in stop-and-copy phase,
iteration to read data from VFIO device driver is implemented till pending
bytes returned by driver are not zero.

Patch 11-12
- Add helper function for migration with vIOMMU enabled to get address limit
  IOMMU supports.
- Set DIRTY_MEMORY_MIGRATION flag in dirty log mask for migration with vIOMMU
  enabled.

Patch 13-14:
- Add function to start and stop dirty pages tracking.
- Add vfio_listerner_log_sync to mark dirty pages. Dirty pages bitmap is queried
  per container. All pages pinned by vendor driver through vfio_pin_pages
  external API has to be marked as dirty during  migration.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned by vendor driver
  should be considered as dirty.
  In Qemu, marking pages dirty is only done when device is in stop-and-copy
  phase because if pages are marked dirty during pre-copy phase and content is
  transfered from source to distination, there is no way to know newly dirtied
  pages from the point they were copied earlier until device stops. To avoid
  repeated copy of same content, pinned pages are marked dirty only during
  stop-and-copy phase.

Patch 15:
- Get migration capability flags from kernel module.

Patch 16:
- With vIOMMU, IO virtual address range can get unmapped while in pre-copy
  phase of migration. In that case, unmap ioctl should return pages pinned
  in that range and QEMU should report corresponding guest physical pages
  dirty.

Patch 17:
- Make VFIO PCI device migration capable. If migration region is not provided by
  driver, migration is blocked.

Patch 18:
- Added VFIO device stats to MigrationInfo

Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required in kernel such that
vendor driver could report dirty pages to VFIO module during migration phases.

Below is the flow of state change for live migration where states in brackets
represent VM state, migration state and VFIO device state as:
(VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)

Live migration save path:
QEMU normal running state
(RUNNING, _NONE, _RUNNING)
|
migrate_init spawns migration_thread.
(RUNNING, _SETUP, _RUNNING|_SAVING)
Migration thread then calls each device's .save_setup()
|
(RUNNING, _ACTIVE, _RUNNING|_SAVING)
If device is active, get pending bytes by .save_live_pending()
if pending bytes >= threshold_size,  call save_live_iterate()
Data of VFIO device for pre-copy phase is copied.
Iterate till pending bytes converge and are less than threshold
|
On migration completion, vCPUs stops and calls .save_live_complete_precopy
for each active device. VFIO device is then transitioned in
 _SAVING state.
(FINISH_MIGRATE, _DEVICE, _SAVING)
For VFIO device, iterate in  .save_live_complete_precopy  until
pending data is 0.
(FINISH_MIGRATE, _DEVICE, _STOPPED)
|
(FINISH_MIGRATE, _COMPLETED, STOPPED)
Migraton thread schedule cleanup bottom half and exit

Live migration resume path:
Incomming migration calls .load_setup for each device
(RESTORE_VM, _ACTIVE, STOPPED)
|
For each device, .load_state is called for that device section data
|
At the end, called .load_cleanup for each device and vCPUs are started.
|
(RUNNING, _NONE, _RUNNING)

Note that:
- Migration post copy is

Re: [PATCH] ppc: Use hard-float in ppc fp_hlper as early as possible. This would increase the performance better than enable hard-float it in soft-float.c; Just using fadd fsub fmul fdiv as a simple b

2020-05-04 Thread Aleksandar Markovic

пон, 4. мај 2020. у 21:31  је написао/ла:
>
> From: Yonggang Luo 
>
> Just post as an idea to improve PPC fp performance.
> With this idea, we have no need to adjust the helper orders.
>
> Signed-off-by: Yonggang Luo 
> ---
>  target/ppc/fpu_helper.c | 44 +
>  1 file changed, 44 insertions(+)
>
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 2bd49a2cdf..79051e4540 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -926,6 +926,17 @@ static void float_invalid_op_addsub(CPUPPCState *env, 
> bool set_fpcc,
>  /* fadd - fadd. */
>  float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd + u2.nd;

Besides what Richard mentioned, you neglect here "flush-denormals-to-zero"
property of FPUs. You implicitly assume that the host has the same behavior
as the target (ppc). But that simply may not be the case, leading to the wrong
result.

Yours,
Aleksandar

> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_add(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -941,6 +952,17 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1, 
> float64 arg2)
>  /* fsub - fsub. */
>  float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd - u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_sub(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -967,6 +989,17 @@ static void float_invalid_op_mul(CPUPPCState *env, bool 
> set_fprc,
>  /* fmul - fmul. */
>  float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd * u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_mul(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -997,6 +1030,17 @@ static void float_invalid_op_div(CPUPPCState *env, bool 
> set_fprc,
>  /* fdiv - fdiv. */
>  float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd / u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_div(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> --
> 2.23.0.windows.1
>
>

Re: [PATCH] ppc: Use hard-float in ppc fp_hlper as early as possible...

2020-05-04 Thread Richard Henderson

On 5/4/20 12:29 PM, luoyongg...@gmail.com wrote:

> Re: [PATCH] ppc: Use hard-float in ppc fp_hlper as early as possible. This 
> would increase the performance better than enable hard-float it in 
> soft-float.c; Just using fadd fsub fmul fdiv as a simple bench demo. With 
> this patch, performance are increased 2x. and 1.3x than the one enable 
> hard-float in soft-float.c Both version are not considerate inexact fp 
> exception yet.

Use a return after the one sentence title to separate it from the body of the
description.

>  float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd + u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}

First, you need to verify that the current rounding mode is
float_round_nearest_even.  Otherwise you are actively computing wrong results
for other rounding modes.

Second, including zero result in your acceptance test misses out on underflow
exceptions.

Third, what is your plan for inexact?  There's no point in continuing this
thread unless you fill in the TODO a bit more.

https://cafehayek.com/wp-content/uploads/2014/03/miracle_cartoon.jpg

r~

[PULL 4/4] block/nbd-client: drop max_block restriction from discard

2020-05-04 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

The NBD spec was updated (see nbd.git commit 9f30fedb) so that
max_block doesn't relate to NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20200401150112.9557-3-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: tweak commit message to call out NBD commit]
Signed-off-by: Eric Blake 
---
 block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index d4d518a780c9..4ac23c8f6299 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1955,7 +1955,7 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }

 bs->bl.request_alignment = min;
-bs->bl.max_pdiscard = max;
+bs->bl.max_pdiscard = QEMU_ALIGN_DOWN(INT_MAX, min);
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;

-- 
2.26.2

[PULL 0/4] NBD patches through 2020-05-04

2020-05-04 Thread Eric Blake

Happy Star Wars Day! May the Fourth be with you as you apply this...

The following changes since commit 5375af3cd7b8adcc10c18d8083b7be63976c9645:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2020-05-04 15:51:09 +0100)

are available in the Git repository at:

  https://repo.or.cz/qemu/ericb.git tags/pull-nbd-2020-05-04

for you to fetch changes up to 714eb0dbc5480c8a9d9f39eb931cb5d2acc1b6c6:

  block/nbd-client: drop max_block restriction from discard (2020-05-04 
15:16:46 -0500)


nbd patches for 2020-05-04

- reduce client-side fragmentation of NBD trim and status requests
- fix iotest 41 when run in deep tree
- fix socket activation in qemu-nbd


Eric Blake (1):
  tools: Fix use of fcntl(F_SETFD) during socket activation

Max Reitz (1):
  iotests/041: Fix NBD socket path

Vladimir Sementsov-Ogievskiy (2):
  block/nbd-client: drop max_block restriction from block_status
  block/nbd-client: drop max_block restriction from discard

 block/nbd.c| 6 ++
 util/systemd.c | 4 +++-
 tests/qemu-iotests/041 | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

-- 
2.26.2

[PULL 1/4] tools: Fix use of fcntl(F_SETFD) during socket activation

2020-05-04 Thread Eric Blake

Blindly setting FD_CLOEXEC without a read-modify-write will
inadvertently clear any other intentionally-set bits, such as a
proposed new bit for designating a fd that must behave in 32-bit mode.
However, we cannot use our wrapper qemu_set_cloexec(), because that
wrapper intentionally abort()s on failure, whereas the probe here
intentionally tolerates failure to deal with incorrect socket
activation gracefully.  Instead, fix the code to do the proper
read-modify-write.

Signed-off-by: Eric Blake 
Message-Id: <20200420175309.75894-3-ebl...@redhat.com>
Reviewed-by: Peter Maydell 
---
 util/systemd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/util/systemd.c b/util/systemd.c
index 1dd0367d9a84..5bcac9b40169 100644
--- a/util/systemd.c
+++ b/util/systemd.c
@@ -23,6 +23,7 @@ unsigned int check_socket_activation(void)
 unsigned long nr_fds;
 unsigned int i;
 int fd;
+int f;
 int err;

 s = getenv("LISTEN_PID");
@@ -54,7 +55,8 @@ unsigned int check_socket_activation(void)
 /* So the file descriptors don't leak into child processes. */
 for (i = 0; i < nr_fds; ++i) {
 fd = FIRST_SOCKET_ACTIVATION_FD + i;
-if (fcntl(fd, F_SETFD, FD_CLOEXEC) == -1) {
+f = fcntl(fd, F_GETFD);
+if (f == -1 || fcntl(fd, F_SETFD, f | FD_CLOEXEC) == -1) {
 /* If we cannot set FD_CLOEXEC then it probably means the file
  * descriptor is invalid, so socket activation has gone wrong
  * and we should exit.
-- 
2.26.2

[PULL 3/4] block/nbd-client: drop max_block restriction from block_status

2020-05-04 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

The NBD spec was updated (see nbd.git commit 9f30fedb) so that
max_block doesn't relate to NBD_CMD_BLOCK_STATUS. So, drop the
restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Message-Id: <20200401150112.9557-2-vsement...@virtuozzo.com>
[eblake: tweak commit message to call out NBD commit]
Signed-off-by: Eric Blake 
---
 block/nbd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f6499..d4d518a780c9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1320,9 +1320,7 @@ static int coroutine_fn nbd_client_co_block_status(
 NBDRequest request = {
 .type = NBD_CMD_BLOCK_STATUS,
 .from = offset,
-.len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
-bs->bl.request_alignment),
-s->info.max_block),
+.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
MIN(bytes, s->info.size - offset)),
 .flags = NBD_CMD_FLAG_REQ_ONE,
 };
-- 
2.26.2

[PULL 2/4] iotests/041: Fix NBD socket path

2020-05-04 Thread Eric Blake

From: Max Reitz 

We should put all UNIX socket files into the sock_dir, not test_dir.

Reported-by: Elena Ufimtseva 
Signed-off-by: Max Reitz 
Message-Id: <20200424134626.78945-1-mre...@redhat.com>
Reviewed-by: Eric Blake 
Fixes: a1da1878607a
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/041 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 5d67bf14bfe8..46bf1f6c8164 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -35,7 +35,7 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
 quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
 quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')

-nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
+nbd_sock_path = os.path.join(iotests.sock_dir, 'nbd.sock')

 class TestSingleDrive(iotests.QMPTestCase):
 image_len = 1 * 1024 * 1024 # MB
-- 
2.26.2

Publishing Python Packages

2020-05-04 Thread John Snow

Hi!

It keeps coming up in review or in bugs that it would be nice to ship
certain python scripts or modules outside of QEMU for easy consumption
as dev tooling, light debugging SDKs, or other various tasks. We keep
avoiding the question as a diversion.

Let's investigate this seriously, but let's keep the scope small. Let's
look at shipping what's in python/qemu/ for starters, as a beta package
-- to explore the space and see what changes are necessary.

Let me start by saying that I have reserved the "qemu" package on
PyPI.org -- I have done so in good faith in order to have a public
discussion about the right way to factor this package -- and can
abdicate my ownership of this package at any point to Peter Maydell,
Eduardo Habkost, etc.

(There is also a conflict resolution process outlined by PEP 541, which
should ensure that I won't be able to maliciously withhold this package
space.)

Here's the package: https://pypi.org/project/qemu/

The only way to 'reserve' a package on pypi is to actually just create
one, so this is a blank package with nothing in it, versioned as low as
you can.

(This blank release can be deleted later, but we can never re-release a
v0.0.0a1 package.)

I'm working on a patchset to "demo" an installable version of what
exists in python/qemu/ right now, but a lot of project structure,
versioning, and layout will have to be debated with a careful list of
pros/cons.

So, for the moment, I am not committing to anything, but am looking
forward to some discussion on the forthcoming patches.

--js

Re: [PATCH v5 for-5.0? 0/7] Tighten qemu-img rules on missing backing format

2020-05-04 Thread Eric Blake


On 4/3/20 12:58 PM, Eric Blake wrote:

v4 was here:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg03775.html
In v5:
- fix 'qemu-img convert -B' to actually warn [Kashyap]
- squash in followups
- a couple more iotest improvements

If we decide this is not 5.0 material, then patches 4 and 7 need a
tweak to s/5.0/5.1/ as the start of the deprecation clock.


Ping.  I've already made the 5.1 change in my local tree, does anyone 
want to review the rest of this series before I post a v6?




Eric Blake (7):
   sheepdog: Add trivial backing_fmt support
   vmdk: Add trivial backing_fmt support
   qcow: Tolerate backing_fmt=, but warn on backing_fmt=raw
   qcow2: Deprecate use of qemu-img amend to change backing file
   iotests: Specify explicit backing format where sensible
   block: Add support to warn on backing file change without format
   qemu-img: Deprecate use of -b without -F



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] ppc: Use hard-float in ppc fp_hlper as early as possible. This would increase the performance better than enable hard-float it in soft-float.c; Just using fadd fsub fmul fdiv as a simple b

2020-05-04 Thread Yonggang Luo

Bench result;
orignal:
-> FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 27.768 sec
MFLOPS: 38.65
FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 28.359 sec
MFLOPS: 37.84


soft-hard-float:

GCC version: 4.3.3
Ops count: 1073217024
Time spent: 14.874 sec
MFLOPS: 72.15
FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 14.249 sec
MFLOPS: 75.32

direct-hard-float:

-> FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 13.021 sec
MFLOPS: 82.42
FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 12.472 sec
MFLOPS: 86.05
FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 11.803 sec
MFLOPS: 90.93
FLOPS 3.00
GCC version: 4.3.3
Ops count: 1073217024
Time spent: 11.945 sec
MFLOPS: 89.85

bench program:

```
#include 
#include 
#ifdef __vxworks
#include 
#include 
#include 
#include 
#elif defined(_MSC_VER)
#include 
#include 
#else
#include 
#endif
/*
cl -O2 test_flops.c
gcc -O2 test_flops.c -o test_flops

*/
#ifndef DIM
#define DIM 1024
const long long int nop = 1073217024;
#else
#define COUNT
long long int nop = 0;
#endif

void printm(double A[DIM][DIM])
{
int i,j;
for (i=0; i 1) {
sscanf(argv[1], "%d", &count);
}
#endif
for (i = 0; i < count; i += 1) {
dge(X);
}
t = get_seconds() - t;
printf("Ops count: %llu\n", nop * count);
printf("Time spent: %.3lf sec\n", t);
printf("MFLOPS: %.2f\n", 1e-6 * nop * count / t );
#ifdef PRINTM
printm(X);
#endif
return 0;
}
```

On Tue, May 5, 2020 at 3:30 AM  wrote:

> From: Yonggang Luo 
>
> Just post as an idea to improve PPC fp performance.
> With this idea, we have no need to adjust the helper orders.
>
> Signed-off-by: Yonggang Luo 
> ---
>  target/ppc/fpu_helper.c | 44 +
>  1 file changed, 44 insertions(+)
>
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 2bd49a2cdf..79051e4540 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -926,6 +926,17 @@ static void float_invalid_op_addsub(CPUPPCState *env,
> bool set_fpcc,
>  /* fadd - fadd. */
>  float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd + u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_add(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -941,6 +952,17 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1,
> float64 arg2)
>  /* fsub - fsub. */
>  float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd - u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_sub(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -967,6 +989,17 @@ static void float_invalid_op_mul(CPUPPCState *env,
> bool set_fprc,
>  /* fmul - fmul. */
>  float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd * u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_mul(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> @@ -997,6 +1030,17 @@ static void float_invalid_op_div(CPUPPCState *env,
> bool set_fprc,
>  /* fdiv - fdiv. */
>  float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
>  {
> +CPU_DoubleU u1, u2;
> +
> +u1.d = arg1;
> +u2.d = arg2;
> +CPU_DoubleU retDouble;
> +retDouble.nd = u1.nd / u2.nd;
> +if (likely(float64_is_zero_or_normal(retDouble.d)))
> +{
> +/* TODO: Handling inexact */
> +return retDouble.d;
> +}
>  float64 ret = float64_div(arg1, arg2, &env->fp_status);
>  int status = get_float_exception_flags(&env->fp_status);
>
> --
> 2.23.0.windows.1
>
>

-- 
 此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo

Re: [PATCH v2 2/6] block/nbd-client: drop max_block restriction from discard

2020-05-04 Thread Eric Blake


On 4/21/20 4:56 PM, Eric Blake wrote:

On 4/1/20 10:01 AM, Vladimir Sementsov-Ogievskiy wrote:

NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Eric Blake 

I might tweak the commit message of 1/6 and here to call out the NBD 
spec commit id (nbd.git 9f30fedb), but that doesn't change the patch 
proper.


I'm queuing 1 and 2 through my NBD tree now; the rest involve more of 
the block layer and go in tandem with your other work on cleaning up 
64-bit operations throughout, and I still need to give that a better review.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v1 2/4] .travis.yml: drop MacOSX

2020-05-04 Thread Alex Bennée



Daniel P. Berrangé  writes:

> On Fri, May 01, 2020 at 12:15:03PM +0100, Alex Bennée wrote:
>> This keeps breaking on Travis so lets just fall back to the Cirrus CI
>> builds which seem to be better maintained. Fix up the comments while
>> we are doing this as we never had a windows build.
>
> FYI the current problem with macOS biulds is not a Travis problem,
> it is a Homebrew problem, fixed by this patch:
>
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg04234.html

I had another go and it still went red with a timeout so I think I'll
stick with the original plan of dropping it and leaving it to Cirrus for
the coverage.

>
>
>> 
>> Signed-off-by: Alex Bennée 
>> ---
>>  .travis.yml | 28 +---
>>  1 file changed, 1 insertion(+), 27 deletions(-)
>> 
>> diff --git a/.travis.yml b/.travis.yml
>> index a4c3c6c805..49267b73b3 100644
>> --- a/.travis.yml
>> +++ b/.travis.yml
>> @@ -9,9 +9,8 @@ compiler:
>>  cache:
>># There is one cache per branch and compiler version.
>># characteristics of each job are used to identify the cache:
>> -  # - OS name (currently, linux, osx, or windows)
>> +  # - OS name (currently only linux)
>># - OS distribution (for Linux, xenial, trusty, or precise)
>> -  # - macOS image name (e.g., xcode7.2)
>># - Names and values of visible environment variables set in .travis.yml 
>> or Settings panel
>>timeout: 1200
>>ccache: true
>> @@ -271,31 +270,6 @@ jobs:
>>  - TEST_CMD=""
>>  
>>  
>> -# MacOSX builds - cirrus.yml also tests some MacOS builds including 
>> latest Xcode
>> -
>> -- name: "OSX Xcode 10.3"
>> -  env:
>> -- BASE_CONFIG="--disable-docs --enable-tools"
>> -- 
>> CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu"
>> -  os: osx
>> -  osx_image: xcode10.3
>> -  compiler: clang
>> -  addons:
>> -homebrew:
>> -  packages:
>> -- ccache
>> -- glib
>> -- pixman
>> -- gnu-sed
>> -- python
>> -  update: true
>> -  before_script:
>> -- brew link --overwrite python
>> -- export PATH="/usr/local/opt/ccache/libexec:$PATH"
>> -- mkdir -p ${BUILD_DIR} && cd ${BUILD_DIR}
>> -- ${SRC_DIR}/configure ${BASE_CONFIG} ${CONFIG} || { cat config.log 
>> && exit 1; }
>> -
>> -
>>  # Python builds
>>  - name: "GCC Python 3.5 (x86_64-softmmu)"
>>env:
>> -- 
>> 2.20.1
>> 
>> 
>
> Regards,
> Daniel


-- 
Alex Bennée

Re: [PATCH v1 4/4] .travis.yml: reduce the load on [ppc64] GCC check-tcg

2020-05-04 Thread Alex Bennée



Richard Henderson  writes:

> On 5/3/20 7:10 PM, David Gibson wrote:
>   - TEST_CMD="make check check-tcg V=1"
> -- CONFIG="--disable-containers 
> --target-list=${MAIN_SOFTMMU_TARGETS},ppc64le-linux-user"
> +- CONFIG="--disable-containers 
> --target-list=ppc64-softmmu,ppc64le-linux-user"

 Cc'ing David, since I'm not sure about this one... Maybe split as we
 did with other jobs?
> ...
>> Hrm.  I'd prefer not to drop this coverage if we can avoid it.  What
>> we're not testing with the proposed patch is TCG generation for a ppc
>> host but a non-ppc target.  e.g. if the x86 or ARM target side generates
>> some pattern of TCG ops that's very rare for the ppc target, and is
>> buggy in the ppc host side.
>
> Are we actually testing those here?  As far as I can see, we're not installing
> any cross-compilers here, so we're not building any non-ppc binaries.  Nor are
> we running check-acceptance which would download pre-built foreign
> binaries.

We are testing the very minimal boot stubs that each -system binary has
in qtest but they are hardly going to be exercising the majority of the
TCG. Basically the $SELF-linux-user is going to be exercising more of
the TCG than anything else.

>
>
> r~


-- 
Alex Bennée

Re: [PATCH 2/3] io/task: Move 'qom/object.h' header to source

2020-05-04 Thread Philippe Mathieu-Daudé


On 5/4/20 7:42 PM, Richard Henderson wrote:

On 5/4/20 1:46 AM, Philippe Mathieu-Daudé wrote:

We need "qom/object.h" to call object_ref()/object_unref().


This description doesn't seem to match


+++ b/include/io/task.h
@@ -21,8 +21,6 @@
  #ifndef QIO_TASK_H
  #define QIO_TASK_H
  
-#include "qom/object.h"

-
  typedef struct QIOTask QIOTask;
  
  typedef void (*QIOTaskFunc)(QIOTask *task,

diff --git a/io/task.c b/io/task.c
index 1ae7b86488..53c0bed686 100644
--- a/io/task.c
+++ b/io/task.c
@@ -22,6 +22,7 @@
  #include "io/task.h"
  #include "qapi/error.h"
  #include "qemu/thread.h"
+#include "qom/object.h"


the change.  Since io/task.c includes io/tash.h, what are you actually doing?


Sorry to not document clearly on the cover.

The original goal was to stop using $SRC_PATH as include directory, but 
as it is huge I believe it will never get fully accepted, so I simply 
kept the few maybe worthwhile patches...


The final patch is:

-- >8 --
--- a/configure
+++ b/configure
@@ -601,7 +601,7 @@ QEMU_CFLAGS="-fno-strict-aliasing -fno-common 
-fwrapv -std=gnu99 $QEMU_CFLAGS"
 QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
$QEMU_CFLAGS"

 QEMU_CFLAGS="-Wstrict-prototypes -Wredundant-decls $QEMU_CFLAGS"
 QEMU_CFLAGS="-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
$QEMU_CFLAGS"
-QEMU_INCLUDES="-iquote . -iquote \$(SRC_PATH) -iquote 
\$(SRC_PATH)/accel/tcg -iquote \$(SRC_PATH)/include"
+QEMU_INCLUDES="-iquote . -iquote \$(SRC_PATH)/accel/tcg -iquote 
\$(SRC_PATH)/include"

 QEMU_INCLUDES="$QEMU_INCLUDES -iquote \$(SRC_PATH)/disas/libvixl"
 if test "$debug_info" = "yes"; then
 CFLAGS="-g $CFLAGS"
---

Re: [PULL 00/29] virtio,acpi,pci,pc: backlog from pre-5.0

2020-05-04 Thread Peter Maydell

On Mon, 4 May 2020 at 15:29, Michael S. Tsirkin  wrote:
>
> The following changes since commit 9af638cc1f665712522608c5d6b8c03d8fa67666:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20200504' into staging (2020-05-04 
> 13:37:17 +0100)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to d8a05995bd64117bf5219d3ba7956277e608e3ca:
>
>   hw/i386: Make vmmouse helpers static (2020-05-04 10:25:03 -0400)
>
> 
> virtio,acpi,pci,pc: backlog from pre-5.0
>
> Mostly fixes, cleanups, but also new features for arm/virt and pc acpi.
>
> Signed-off-by: Michael S. Tsirkin 
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM

[PATCH] ppc: Use hard-float in ppc fp_hlper as early as possible. This would increase the performance better than enable hard-float it in soft-float.c; Just using fadd fsub fmul fdiv as a simple bench

2020-05-04 Thread luoyonggang

From: Yonggang Luo 

Just post as an idea to improve PPC fp performance.
With this idea, we have no need to adjust the helper orders.

Signed-off-by: Yonggang Luo 
---
 target/ppc/fpu_helper.c | 44 +
 1 file changed, 44 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 2bd49a2cdf..79051e4540 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -926,6 +926,17 @@ static void float_invalid_op_addsub(CPUPPCState *env, bool 
set_fpcc,
 /* fadd - fadd. */
 float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd + u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_add(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -941,6 +952,17 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1, 
float64 arg2)
 /* fsub - fsub. */
 float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd - u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_sub(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -967,6 +989,17 @@ static void float_invalid_op_mul(CPUPPCState *env, bool 
set_fprc,
 /* fmul - fmul. */
 float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd * u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_mul(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -997,6 +1030,17 @@ static void float_invalid_op_div(CPUPPCState *env, bool 
set_fprc,
 /* fdiv - fdiv. */
 float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd / u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_div(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
-- 
2.23.0.windows.1

[PATCH v3 1/3] target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA

2020-05-04 Thread Richard Henderson

Now that we can pass 7 parameters, do not encode register
operands within simd_data.

Reviewed-by: Alex Bennée 
Reviewed-by: Taylor Simpson 
Signed-off-by: Richard Henderson 
---
v2: Remove gen_helper_sve_fmla typedef (phil).
---
 target/arm/helper-sve.h|  45 +++
 target/arm/sve_helper.c| 157 ++---
 target/arm/translate-sve.c |  70 ++---
 3 files changed, 114 insertions(+), 158 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 2f47279155..7a200755ac 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1099,25 +1099,40 @@ DEF_HELPER_FLAGS_6(sve_fcadd_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fcadd_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sve_fcmla_zpzzz_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_ftmad_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_ftmad_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fdfa652094..33b5a54a47 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3372,23 +3372,11 @@ DO_ZPZ_FP(sve_ucvt_dd, uint64_t, , 
uint64_to_float64)
 
 #undef DO_ZPZ_FP
 
-/* 4-operand predicated multiply-add.  This requires 7 operands to pass
- * "properly", so we need to encode some of the registers into DESC.
- */
-QEMU_BUILD_BUG_ON(SIMD_DATA_SHIFT + 20 > 32);
-
-static void do_fmla_zpzzz_h(CPUARMState *env, void *vg, uint32_t desc,
+static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
+float_status *status, uint32_t desc,
 uint16_t neg1, uint16_t neg3)
 {
 intptr_t i = simd_oprsz(desc);
-unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5);
-unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5);
-unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5);
-unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5);
-void *vd = &env->vfp.zregs[rd];
-void *vn = &env->vfp.zregs[rn];
-void *vm = &env->vfp.zregs[rm];
-void

[PATCH v3 2/3] target/arm: Use tcg_gen_gvec_mov for clear_vec_high

2020-05-04 Thread Richard Henderson

The 8-byte store for the end a !is_q operation can be
merged with the other stores.  Use a no-op vector move
to trigger the expand_clr portion of tcg_gen_gvec_mov.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index a896f9c4b8..729e746e25 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -496,14 +496,8 @@ static void clear_vec_high(DisasContext *s, bool is_q, int 
rd)
 unsigned ofs = fp_reg_offset(s, rd, MO_64);
 unsigned vsz = vec_full_reg_size(s);
 
-if (!is_q) {
-TCGv_i64 tcg_zero = tcg_const_i64(0);
-tcg_gen_st_i64(tcg_zero, cpu_env, ofs + 8);
-tcg_temp_free_i64(tcg_zero);
-}
-if (vsz > 16) {
-tcg_gen_gvec_dup8i(ofs + 16, vsz - 16, vsz - 16, 0);
-}
+/* Nop move, with side effect of clearing the tail. */
+tcg_gen_gvec_mov(MO_64, ofs, ofs, is_q ? 16 : 8, vsz);
 }
 
 void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
-- 
2.20.1

[PATCH v3 0/3] target/arm: misc cleanups

2020-05-04 Thread Richard Henderson

Richard Henderson (3):
  target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA
  target/arm: Use tcg_gen_gvec_mov for clear_vec_high
  target/arm: Use clear_vec_high more effectively

 target/arm/helper-sve.h|  45 +++
 target/arm/sve_helper.c| 157 ++---
 target/arm/translate-a64.c |  69 
 target/arm/translate-sve.c |  70 ++---
 4 files changed, 152 insertions(+), 189 deletions(-)

-- 
2.20.1

[PATCH v3 3/3] target/arm: Use clear_vec_high more effectively

2020-05-04 Thread Richard Henderson

Do not explicitly store zero to the NEON high part
when we can pass !is_q to clear_vec_high.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 59 +++---
 1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 729e746e25..d1c9150c4f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -939,11 +939,10 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
 {
 /* This always zero-extends and writes to a full 128 bit wide vector */
 TCGv_i64 tmplo = tcg_temp_new_i64();
-TCGv_i64 tmphi;
+TCGv_i64 tmphi = NULL;
 
 if (size < 4) {
 MemOp memop = s->be_data + size;
-tmphi = tcg_const_i64(0);
 tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), memop);
 } else {
 bool be = s->be_data == MO_BE;
@@ -961,12 +960,13 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
 }
 
 tcg_gen_st_i64(tmplo, cpu_env, fp_reg_offset(s, destidx, MO_64));
-tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
-
 tcg_temp_free_i64(tmplo);
-tcg_temp_free_i64(tmphi);
 
-clear_vec_high(s, true, destidx);
+if (tmphi) {
+tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
+tcg_temp_free_i64(tmphi);
+}
+clear_vec_high(s, tmphi != NULL, destidx);
 }
 
 /*
@@ -6960,8 +6960,8 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
 return;
 }
 
-tcg_resh = tcg_temp_new_i64();
 tcg_resl = tcg_temp_new_i64();
+tcg_resh = NULL;
 
 /* Vd gets bits starting at pos bits into Vm:Vn. This is
  * either extracting 128 bits from a 128:128 concatenation, or
@@ -6973,7 +6973,6 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
 read_vec_element(s, tcg_resh, rm, 0, MO_64);
 do_ext64(s, tcg_resh, tcg_resl, pos);
 }
-tcg_gen_movi_i64(tcg_resh, 0);
 } else {
 TCGv_i64 tcg_hh;
 typedef struct {
@@ -6988,6 +6987,7 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
 pos -= 64;
 }
 
+tcg_resh = tcg_temp_new_i64();
 read_vec_element(s, tcg_resl, elt->reg, elt->elt, MO_64);
 elt++;
 read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
@@ -7003,9 +7003,12 @@ static void disas_simd_ext(DisasContext *s, uint32_t 
insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
 }
 
 /* TBL/TBX
@@ -7042,17 +7045,21 @@ static void disas_simd_tb(DisasContext *s, uint32_t 
insn)
  * the input.
  */
 tcg_resl = tcg_temp_new_i64();
-tcg_resh = tcg_temp_new_i64();
+tcg_resh = NULL;
 
 if (is_tblx) {
 read_vec_element(s, tcg_resl, rd, 0, MO_64);
 } else {
 tcg_gen_movi_i64(tcg_resl, 0);
 }
-if (is_tblx && is_q) {
-read_vec_element(s, tcg_resh, rd, 1, MO_64);
-} else {
-tcg_gen_movi_i64(tcg_resh, 0);
+
+if (is_q) {
+tcg_resh = tcg_temp_new_i64();
+if (is_tblx) {
+read_vec_element(s, tcg_resh, rd, 1, MO_64);
+} else {
+tcg_gen_movi_i64(tcg_resh, 0);
+}
 }
 
 tcg_idx = tcg_temp_new_i64();
@@ -7072,9 +7079,12 @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
 }
 
 /* ZIP/UZP/TRN
@@ -7111,7 +7121,7 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)
 }
 
 tcg_resl = tcg_const_i64(0);
-tcg_resh = tcg_const_i64(0);
+tcg_resh = is_q ? tcg_const_i64(0) : NULL;
 tcg_res = tcg_temp_new_i64();
 
 for (i = 0; i < elements; i++) {
@@ -7162,9 +7172,12 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
 }
 
 /*
-- 
2.20.1

[PATCH] [ppc] Use hard-float as early as possible for PPC. And this would increase the performance better than enable it in soft-float.c; Just using fadd fsub fmul fdiv as a demo. With this patch. Per

2020-05-04 Thread luoyonggang

From: Yonggang Luo 

Just post as an idea to improve PPC fp performance.
Through this path, we have no need to revise the helper orders.

Signed-off-by: Yonggang Luo 
---
 target/ppc/fpu_helper.c | 44 +
 1 file changed, 44 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 2bd49a2cdf..79051e4540 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -926,6 +926,17 @@ static void float_invalid_op_addsub(CPUPPCState *env, bool 
set_fpcc,
 /* fadd - fadd. */
 float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd + u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_add(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -941,6 +952,17 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1, 
float64 arg2)
 /* fsub - fsub. */
 float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd - u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_sub(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -967,6 +989,17 @@ static void float_invalid_op_mul(CPUPPCState *env, bool 
set_fprc,
 /* fmul - fmul. */
 float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd * u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_mul(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
@@ -997,6 +1030,17 @@ static void float_invalid_op_div(CPUPPCState *env, bool 
set_fprc,
 /* fdiv - fdiv. */
 float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+CPU_DoubleU u1, u2;
+
+u1.d = arg1;
+u2.d = arg2;
+CPU_DoubleU retDouble;
+retDouble.nd = u1.nd / u2.nd;
+if (likely(float64_is_zero_or_normal(retDouble.d)))
+{
+/* TODO: Handling inexact */
+return retDouble.d;
+}
 float64 ret = float64_div(arg1, arg2, &env->fp_status);
 int status = get_float_exception_flags(&env->fp_status);
 
-- 
2.23.0.windows.1

Re: An first try to improve PPC float simulation, not even compiled. Just ask question.

2020-05-04 Thread BALATON Zoltan

On Mon, 4 May 2020, Richard Henderson wrote:

On 5/4/20 11:30 AM, BALATON Zoltan wrote:

On Mon, 4 May 2020, Richard Henderson wrote:

On 5/3/20 5:41 PM, 罗勇刚(Yonggang Luo) wrote:

On Mon, May 4, 2020 at 7:40 AM BALATON Zoltan mailto:bala...@eik.bme.hu>> wrote:

    Hello,

    On Mon, 4 May 2020, 罗勇刚(Yonggang Luo) wrote:
   > Hello Richard, Can you have a look at the following patch, and was that
are
   > the right direction?

    Formatting of the patch is broken by your mailer, try sending it with
    something that does not change it otherwise it's a bit hard to read.

    Richard suggested to add an assert to check the fp_status is correctly
    cleared in place of helper_reset_fpstatus first for debugging so you could
    change the helper accordingly before deleting it and run a few tests to
    verify it still works. You'll need get some tests and benchmarks working
    to be able to verify your changes that's why I've said that would be step
    0. If you checked that it still produces the same results and the assert
    does not trigger then you can remove the helper.

That's what I need help,
1. How to write a assert to replace helper_reset_fpstatus .
  just directly assert? or something else

You can't place the assert where helper_reset_fpstatus was.  You need to place
it in each of the helpers, like helper_fadd, that previously has a call to
helper_reset_fpstatus preceeding it.

Why? If we want to verify that clearing fp_status after flags are processed is
equivalent to clearing flags before fp ops then verifying that the fp_status is
already cleared when the current helper_reset_fpstatus is called should be
enough to check that nothing has set the flags in between so the current reset
helper would be no op. Therefore I thought you could put the assert there for
checking this. This assert is for debugging and checking the change only and
not meant to be left there otherwise we lose all the performance gain so it's
easier to put in the current helper before removing it for this than in every
fp op helper. What am I missing?

I'm not sure what you are suggesting.

If you are suggesting

void helper_reset_fpstatus(CPUPPCState *env)
{
-set_float_exception_flags(0, &env->fp_status);
+assert(get_float_exception_flags(&env->fp_status) == 0);
}

then, sure, that works.  But we also want to remove that call, so in order to
retain the check for debugging, we need to move the assert into the other 
helpers.

Yes, I meant to change helper_reset_fpstatus as above and add clearing 
fp_status after processing flags then run some tests to verify we can 
remove this call and then remove it together with the assert which should 
not be needed after this checking.

Regards,
BALATON Zoltan

Re: An first try to improve PPC float simulation, not even compiled. Just ask question.

2020-05-04 Thread Richard Henderson

On 5/4/20 11:30 AM, BALATON Zoltan wrote:
> On Mon, 4 May 2020, Richard Henderson wrote:
>> On 5/3/20 5:41 PM, 罗勇刚(Yonggang Luo) wrote:
>>> On Mon, May 4, 2020 at 7:40 AM BALATON Zoltan >> > wrote:
>>>
>>>     Hello,
>>>
>>>     On Mon, 4 May 2020, 罗勇刚(Yonggang Luo) wrote:
>>>    > Hello Richard, Can you have a look at the following patch, and was that
>>> are
>>>    > the right direction?
>>>
>>>     Formatting of the patch is broken by your mailer, try sending it with
>>>     something that does not change it otherwise it's a bit hard to read.
>>>
>>>     Richard suggested to add an assert to check the fp_status is correctly
>>>     cleared in place of helper_reset_fpstatus first for debugging so you 
>>> could
>>>     change the helper accordingly before deleting it and run a few tests to
>>>     verify it still works. You'll need get some tests and benchmarks working
>>>     to be able to verify your changes that's why I've said that would be 
>>> step
>>>     0. If you checked that it still produces the same results and the assert
>>>     does not trigger then you can remove the helper.
>>>
>>> That's what I need help,
>>> 1. How to write a assert to replace helper_reset_fpstatus .
>>>   just directly assert? or something else
>>
>> You can't place the assert where helper_reset_fpstatus was.  You need to 
>> place
>> it in each of the helpers, like helper_fadd, that previously has a call to
>> helper_reset_fpstatus preceeding it.
> 
> Why? If we want to verify that clearing fp_status after flags are processed is
> equivalent to clearing flags before fp ops then verifying that the fp_status 
> is
> already cleared when the current helper_reset_fpstatus is called should be
> enough to check that nothing has set the flags in between so the current reset
> helper would be no op. Therefore I thought you could put the assert there for
> checking this. This assert is for debugging and checking the change only and
> not meant to be left there otherwise we lose all the performance gain so it's
> easier to put in the current helper before removing it for this than in every
> fp op helper. What am I missing?

I'm not sure what you are suggesting.

If you are suggesting

 void helper_reset_fpstatus(CPUPPCState *env)
 {
-set_float_exception_flags(0, &env->fp_status);
+assert(get_float_exception_flags(&env->fp_status) == 0);
 }

then, sure, that works.  But we also want to remove that call, so in order to
retain the check for debugging, we need to move the assert into the other 
helpers.


r~

Re: [PATCH 2/2] hw/display/edid: Add missing 'qdev-properties.h' header

2020-05-04 Thread Richard Henderson

On 5/4/20 1:20 AM, Philippe Mathieu-Daudé wrote:
> To use the DEFINE_EDID_PROPERTIES() macro we need the
> definitions from "hw/qdev-properties.h".
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/display/edid.h | 1 +
>  1 file changed, 1 insertion(+)

Does this not currently build?  I'm not sure what you're fixing.


r~

1 2 3 4 5 6 >

1 - 100 of 512 matches

Mail list logo