date:20220524

Re: [PATCH v3 0/2] modules: Improve modinfo.c support

2022-05-24 Thread Gerd Hoffmann

On Tue, May 24, 2022 at 01:49:41PM +0200, Dario Faggioli wrote:
> Hello! Sorry for bringing up an old thread, but I'd have a question
> about this series.
> 
> As far as I can see, the patches were fine, and they were Acked, but
> then the series was never committed... Is this correct?
> 
> If yes, can it be committed (I'm up for rebasing and resending, if it's
> necessary)? If not, would it be possible to know what's missing, so
> that we can continue working on it?

rebase, run through ci, resend is probably the best way forward.
Don't remember any problems, not sure why it wasn't picked up,
maybe paolo (who does the meson + buildsystem stuff) was just busy
so it fell through the cracks,

take care,
  Gerd

Re: Problem running qos-test when building with gcc12 and LTO

2022-05-24 Thread Alex Bennée



Dario Faggioli  writes:

> [[PGP Signed Part:Undecided]]
> On Mon, 2022-05-23 at 19:19 +, Dario Faggioli wrote:
>> As soon as I get rid of _both_ "-flto=auto" _and_ "--enable-lto", the
>> above tests seem to work fine.
>> 
>> When they fail, they fail immediately, while creating the graph, like
>> this:
>> 
>> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
>> QTEST_QEMU_IMG=./qemu-img G_TEST_DBUS_DAEMON=../tests/dbus-vmstate-
>> daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-
>> storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64
>> ./tests/qtest/qos-test --tap -k
>> # random seed: R02S90d4b61102dd94459f986c2367d6d375
>> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-
>> 28822.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-
>> 28822.qmp,id=char0 -mon chardev=char0,mode=control -display none -
>> machine none -accel qtest
>> QOSStack: full stack, cannot pushAborted
>> 
> Ok, apparently, v6.2.0 works (with GCC 12 and LTO), while as said
> v7.0.0 doesn't.
>
> Therefore, I run a bisect, and it pointed at:
>
> 8dcb404bff6d9147765d7dd3e9c8493372186420
> tests/qtest: enable more vhost-user tests by default
>
> I've also confirmed that on v7.0.0 with 8dcb404bff6d914 reverted, the
> test actually works.
>
> As far as downstream packaging is concerned, I'll revert it locally.
> But I'd be happy to help figuring our what is actually going wrong.
>
> I'll try to dig further. Any idea/suggestion anyone has, feel free.
> :-)

Sounds like there are still memory corruption/not initialised issues
that are affected by moving things around.

Does it still trigger errors with my latest virtio cleanup series (which
adds more tests to qos-test):

  Subject: [PATCH  v2 00/15] virtio-gpio and various virtio cleanups
  Date: Tue, 24 May 2022 16:40:41 +0100
  Message-Id: <20220524154056.2896913-1-alex.ben...@linaro.org>


>
> Thanks and Regards


-- 
Alex Bennée

Re: [PATCH v2 4/4] hw/gpio: replace HWADDR_PRIx with PRIx64

2022-05-24 Thread Cédric Le Goater


On 5/25/22 07:34, Jamin Lin wrote:

1. replace HWADDR_PRIx with PRIx64
2. fix indent issue

Signed-off-by: Jamin Lin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.





---
  hw/gpio/aspeed_gpio.c | 8 
  include/hw/gpio/aspeed_gpio.h | 2 +-
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index c834bf19f5..a62a673857 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -561,7 +561,7 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
  reg = &agc->reg_table[idx];
  if (reg->set_idx >= agc->nr_gpio_sets) {
  qemu_log_mask(LOG_GUEST_ERROR, "%s: no getter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
  return 0;
  }
  
@@ -611,7 +611,7 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr offset, uint32_t size)

  break;
  default:
  qemu_log_mask(LOG_GUEST_ERROR, "%s: no getter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
  return 0;
  }
  
@@ -787,7 +787,7 @@ static void aspeed_gpio_write(void *opaque, hwaddr offset, uint64_t data,

  reg = &agc->reg_table[idx];
  if (reg->set_idx >= agc->nr_gpio_sets) {
  qemu_log_mask(LOG_GUEST_ERROR, "%s: no setter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
  return;
  }
  
@@ -872,7 +872,7 @@ static void aspeed_gpio_write(void *opaque, hwaddr offset, uint64_t data,

  break;
  default:
  qemu_log_mask(LOG_GUEST_ERROR, "%s: no setter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
  return;
  }
  aspeed_gpio_update(s, set, set->data_value);
diff --git a/include/hw/gpio/aspeed_gpio.h b/include/hw/gpio/aspeed_gpio.h
index 41b36524d0..904eecf62c 100644
--- a/include/hw/gpio/aspeed_gpio.h
+++ b/include/hw/gpio/aspeed_gpio.h
@@ -67,7 +67,7 @@ enum GPIORegIndexType {
  typedef struct AspeedGPIOReg {
  uint16_t set_idx;
  enum GPIORegType type;
- } AspeedGPIOReg;
+} AspeedGPIOReg;
  
  struct AspeedGPIOClass {

  SysBusDevice parent_obj;

Re: [PATCH v2 2/4] hw/gpio: Add ASPEED GPIO model for AST1030

2022-05-24 Thread Cédric Le Goater


On 5/25/22 07:34, Jamin Lin wrote:

AST1030 integrates one set of Parallel GPIO Controller
with maximum 151 control pins, which are 21 groups
(A~U, exclude pin: M6 M7 Q5 Q6 Q7 R0 R1 R4 R5 R6 R7 S0 S3 S4
S5 S6 S7 ) and the group T and U are input only.

Signed-off-by: Jamin Lin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  hw/arm/aspeed_ast10x0.c | 11 +++
  hw/gpio/aspeed_gpio.c   | 27 +++
  2 files changed, 38 insertions(+)

diff --git a/hw/arm/aspeed_ast10x0.c b/hw/arm/aspeed_ast10x0.c
index 4271549282..3a6b8122b6 100644
--- a/hw/arm/aspeed_ast10x0.c
+++ b/hw/arm/aspeed_ast10x0.c
@@ -113,6 +113,9 @@ static void aspeed_soc_ast1030_init(Object *obj)
  snprintf(typename, sizeof(typename), "aspeed.wdt-%s", socname);
  object_initialize_child(obj, "wdt[*]", &s->wdt[i], typename);
  }
+
+snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname);
+object_initialize_child(obj, "gpio", &s->gpio, typename);
  }
  
  static void aspeed_soc_ast1030_realize(DeviceState *dev_soc, Error **errp)

@@ -260,6 +263,14 @@ static void aspeed_soc_ast1030_realize(DeviceState 
*dev_soc, Error **errp)
  sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt[i]), 0,
  sc->memmap[ASPEED_DEV_WDT] + i * awc->offset);
  }
+
+/* GPIO */
+if (!sysbus_realize(SYS_BUS_DEVICE(&s->gpio), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->memmap[ASPEED_DEV_GPIO]);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio), 0,
+   aspeed_soc_get_irq(s, ASPEED_DEV_GPIO));
  }
  
  static void aspeed_soc_ast1030_class_init(ObjectClass *klass, void *data)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index 4620ea8e8b..5138fe812b 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -819,6 +819,15 @@ static GPIOSetProperties 
ast2600_1_8v_set_props[ASPEED_GPIO_MAX_NR_SETS] = {
  [1] = {0x000f,  0x000f,  {"18E"} },
  };
  
+static GPIOSetProperties ast1030_set_props[ASPEED_GPIO_MAX_NR_SETS] = {

+[0] = {0x,  0x,  {"A", "B", "C", "D"} },
+[1] = {0x,  0x,  {"E", "F", "G", "H"} },
+[2] = {0x,  0x,  {"I", "J", "K", "L"} },
+[3] = {0xff3f,  0xff3f,  {"M", "N", "O", "P"} },
+[4] = {0xff060c1f,  0x00060c1f,  {"Q", "R", "S", "T"} },
+[5] = {0x00ff,  0x,  {"U"} },
+};
+
  static const MemoryRegionOps aspeed_gpio_ops = {
  .read   = aspeed_gpio_read,
  .write  = aspeed_gpio_write,
@@ -971,6 +980,16 @@ static void 
aspeed_gpio_ast2600_1_8v_class_init(ObjectClass *klass, void *data)
  agc->reg_table = aspeed_1_8v_gpios;
  }
  
+static void aspeed_gpio_1030_class_init(ObjectClass *klass, void *data)

+{
+AspeedGPIOClass *agc = ASPEED_GPIO_CLASS(klass);
+
+agc->props = ast1030_set_props;
+agc->nr_gpio_pins = 151;
+agc->nr_gpio_sets = 6;
+agc->reg_table = aspeed_3_3v_gpios;
+}
+
  static const TypeInfo aspeed_gpio_info = {
  .name   = TYPE_ASPEED_GPIO,
  .parent = TYPE_SYS_BUS_DEVICE,
@@ -1008,6 +1027,13 @@ static const TypeInfo aspeed_gpio_ast2600_1_8v_info = {
  .instance_init  = aspeed_gpio_init,
  };
  
+static const TypeInfo aspeed_gpio_ast1030_info = {

+.name   = TYPE_ASPEED_GPIO "-ast1030",
+.parent = TYPE_ASPEED_GPIO,
+.class_init = aspeed_gpio_1030_class_init,
+.instance_init  = aspeed_gpio_init,
+};
+
  static void aspeed_gpio_register_types(void)
  {
  type_register_static(&aspeed_gpio_info);
@@ -1015,6 +1041,7 @@ static void aspeed_gpio_register_types(void)
  type_register_static(&aspeed_gpio_ast2500_info);
  type_register_static(&aspeed_gpio_ast2600_3_3v_info);
  type_register_static(&aspeed_gpio_ast2600_1_8v_info);
+type_register_static(&aspeed_gpio_ast1030_info);
  }
  
  type_init(aspeed_gpio_register_types);

Re: [PATCH v1 1/1] hw/gpio: Add ASPEED GPIO model for AST1030

2022-05-24 Thread Jamin Lin

The 05/11/2022 06:14, Cédric Le Goater wrote:
Hi Cerdic,
> Hello Jamin,
> 
> (Adding a few people that could help with the review)
> 
> On 3/21/22 10:14, Jamin Lin wrote:
> 
> > 1. Add GPIO read/write trace event.
> 
> Do we really need the "DEVICE(s)->canonical_path" parameter ?
> That would be patch 1.
> 
Fixed in v2 patch
> > 2. Support GPIO index mode for write operation.
> > It did not support GPIO index mode for read operation.
> 
> these changes would be in patch 2.
> 
Fixed in v2 patch
> > 3. AST1030 integrates one set of Parallel GPIO Controller
> > with maximum 151 control pins, which are 21 groups
> > (A~U, exclude pin: M6 M7 Q5 Q6 Q7 R0 R1 R4 R5 R6 R7 S0 S3 S4
> > S5 S6 S7 ) and the group T and U are input only.
> 
> and a last patch 3.
> 
Fixed in v2 patch
> > Signed-off-by: Jamin Lin 
> 
> 
> Some minor comments below,
> 
> Thanks,
> 
> C.
> 
I created v2-patches to fix above issues, please help to review.
http://patchwork.ozlabs.org/project/qemu-devel/list/?series=301873
Thanks-Jamin
> 
> > ---
> >   hw/gpio/aspeed_gpio.c | 250 --
> >   hw/gpio/trace-events  |   5 +
> >   include/hw/gpio/aspeed_gpio.h |  16 ++-
> >   3 files changed, 255 insertions(+), 16 deletions(-)
> > 
> > diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
> > index c63634d3d3..3f0bd036b7 100644
> > --- a/hw/gpio/aspeed_gpio.c
> > +++ b/hw/gpio/aspeed_gpio.c
> > @@ -15,6 +15,8 @@
> >   #include "qapi/visitor.h"
> >   #include "hw/irq.h"
> >   #include "migration/vmstate.h"
> > +#include "trace.h"
> > +#include "hw/registerfields.h"
> >   
> >   #define GPIOS_PER_GROUP 8
> >   
> > @@ -203,6 +205,28 @@
> >   #define GPIO_1_8V_MEM_SIZE0x1D8
> >   #define GPIO_1_8V_REG_ARRAY_SIZE  (GPIO_1_8V_MEM_SIZE >> 2)
> >   
> > +/*
> > + * GPIO index mode support
> > + * It only supports write operation
> > + */
> > +REG32(GPIO_INDEX_REG, 0x2AC)
> > +FIELD(GPIO_INDEX_REG, NUMBER, 0, 8)
> > +FIELD(GPIO_INDEX_REG, COMMAND, 12, 1)
> > +FIELD(GPIO_INDEX_REG, TYPE, 16, 4)
> > +FIELD(GPIO_INDEX_REG, DATA_VALUE, 20, 1)
> > +FIELD(GPIO_INDEX_REG, DIRECTION, 20, 1)
> > +FIELD(GPIO_INDEX_REG, INT_ENABLE, 20, 1)
> > +FIELD(GPIO_INDEX_REG, INT_SENS_0, 21, 1)
> > +FIELD(GPIO_INDEX_REG, INT_SENS_1, 22, 1)
> > +FIELD(GPIO_INDEX_REG, INT_SENS_2, 23, 1)
> > +FIELD(GPIO_INDEX_REG, INT_STATUS, 24, 1)
> > +FIELD(GPIO_INDEX_REG, DEBOUNCE_1, 20, 1)
> > +FIELD(GPIO_INDEX_REG, DEBOUNCE_2, 21, 1)
> > +FIELD(GPIO_INDEX_REG, RESET_TOLERANT, 20, 1)
> > +FIELD(GPIO_INDEX_REG, COMMAND_SRC_0, 20, 1)
> > +FIELD(GPIO_INDEX_REG, COMMAND_SRC_1, 21, 1)
> > +FIELD(GPIO_INDEX_REG, INPUT_MASK, 20, 1)
> 
> That's a good idea. We should start switching the models to the registerfields
> interface.
> 
> >   static int aspeed_evaluate_irq(GPIOSets *regs, int gpio_prev_high, int 
> > gpio)
> >   {
> >   uint32_t falling_edge = 0, rising_edge = 0;
> > @@ -523,11 +547,16 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
> > offset, uint32_t size)
> >   uint64_t idx = -1;
> >   const AspeedGPIOReg *reg;
> >   GPIOSets *set;
> > +uint32_t value = 0;
> > +uint64_t debounce_value;
> >   
> >   idx = offset >> 2;
> >   if (idx >= GPIO_DEBOUNCE_TIME_1 && idx <= GPIO_DEBOUNCE_TIME_3) {
> >   idx -= GPIO_DEBOUNCE_TIME_1;
> > -return (uint64_t) s->debounce_regs[idx];
> > +debounce_value = (uint64_t) s->debounce_regs[idx];
> > +trace_aspeed_gpio_read(DEVICE(s)->canonical_path,
> > +   offset, debounce_value);
> > +return debounce_value;
> >   }
> >   
> >   reg = &agc->reg_table[idx];
> > @@ -540,38 +569,193 @@ static uint64_t aspeed_gpio_read(void *opaque, 
> > hwaddr offset, uint32_t size)
> >   set = &s->sets[reg->set_idx];
> >   switch (reg->type) {
> >   case gpio_reg_data_value:
> > -return set->data_value;
> > + value = set->data_value;
> > + break;
> >   case gpio_reg_direction:
> > -return set->direction;
> > +value = set->direction;
> > +break;
> >   case gpio_reg_int_enable:
> > -return set->int_enable;
> > +value = set->int_enable;
> > +break;
> >   case gpio_reg_int_sens_0:
> > -return set->int_sens_0;
> > +value = set->int_sens_0;
> > +break;
> >   case gpio_reg_int_sens_1:
> > -return set->int_sens_1;
> > +value = set->int_sens_1;
> > +break;
> >   case gpio_reg_int_sens_2:
> > -return set->int_sens_2;
> > +value = set->int_sens_2;
> > +break;
> >   case gpio_reg_int_status:
> > -return set->int_status;
> > +value = set->int_status;
> > +break;
> >   case gpio_reg_reset_tolerant:
> > -return set->reset_tol;
> > +value = set->reset_tol;
> > +break;
> >   case gpio_reg_debounce_1:
> > -retu

[PATCH v2 2/4] hw/gpio: Add ASPEED GPIO model for AST1030

2022-05-24 Thread Jamin Lin

AST1030 integrates one set of Parallel GPIO Controller
with maximum 151 control pins, which are 21 groups
(A~U, exclude pin: M6 M7 Q5 Q6 Q7 R0 R1 R4 R5 R6 R7 S0 S3 S4
S5 S6 S7 ) and the group T and U are input only.

Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed_ast10x0.c | 11 +++
 hw/gpio/aspeed_gpio.c   | 27 +++
 2 files changed, 38 insertions(+)

diff --git a/hw/arm/aspeed_ast10x0.c b/hw/arm/aspeed_ast10x0.c
index 4271549282..3a6b8122b6 100644
--- a/hw/arm/aspeed_ast10x0.c
+++ b/hw/arm/aspeed_ast10x0.c
@@ -113,6 +113,9 @@ static void aspeed_soc_ast1030_init(Object *obj)
 snprintf(typename, sizeof(typename), "aspeed.wdt-%s", socname);
 object_initialize_child(obj, "wdt[*]", &s->wdt[i], typename);
 }
+
+snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname);
+object_initialize_child(obj, "gpio", &s->gpio, typename);
 }
 
 static void aspeed_soc_ast1030_realize(DeviceState *dev_soc, Error **errp)
@@ -260,6 +263,14 @@ static void aspeed_soc_ast1030_realize(DeviceState 
*dev_soc, Error **errp)
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt[i]), 0,
 sc->memmap[ASPEED_DEV_WDT] + i * awc->offset);
 }
+
+/* GPIO */
+if (!sysbus_realize(SYS_BUS_DEVICE(&s->gpio), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->memmap[ASPEED_DEV_GPIO]);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio), 0,
+   aspeed_soc_get_irq(s, ASPEED_DEV_GPIO));
 }
 
 static void aspeed_soc_ast1030_class_init(ObjectClass *klass, void *data)
diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index 4620ea8e8b..5138fe812b 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -819,6 +819,15 @@ static GPIOSetProperties 
ast2600_1_8v_set_props[ASPEED_GPIO_MAX_NR_SETS] = {
 [1] = {0x000f,  0x000f,  {"18E"} },
 };
 
+static GPIOSetProperties ast1030_set_props[ASPEED_GPIO_MAX_NR_SETS] = {
+[0] = {0x,  0x,  {"A", "B", "C", "D"} },
+[1] = {0x,  0x,  {"E", "F", "G", "H"} },
+[2] = {0x,  0x,  {"I", "J", "K", "L"} },
+[3] = {0xff3f,  0xff3f,  {"M", "N", "O", "P"} },
+[4] = {0xff060c1f,  0x00060c1f,  {"Q", "R", "S", "T"} },
+[5] = {0x00ff,  0x,  {"U"} },
+};
+
 static const MemoryRegionOps aspeed_gpio_ops = {
 .read   = aspeed_gpio_read,
 .write  = aspeed_gpio_write,
@@ -971,6 +980,16 @@ static void 
aspeed_gpio_ast2600_1_8v_class_init(ObjectClass *klass, void *data)
 agc->reg_table = aspeed_1_8v_gpios;
 }
 
+static void aspeed_gpio_1030_class_init(ObjectClass *klass, void *data)
+{
+AspeedGPIOClass *agc = ASPEED_GPIO_CLASS(klass);
+
+agc->props = ast1030_set_props;
+agc->nr_gpio_pins = 151;
+agc->nr_gpio_sets = 6;
+agc->reg_table = aspeed_3_3v_gpios;
+}
+
 static const TypeInfo aspeed_gpio_info = {
 .name   = TYPE_ASPEED_GPIO,
 .parent = TYPE_SYS_BUS_DEVICE,
@@ -1008,6 +1027,13 @@ static const TypeInfo aspeed_gpio_ast2600_1_8v_info = {
 .instance_init  = aspeed_gpio_init,
 };
 
+static const TypeInfo aspeed_gpio_ast1030_info = {
+.name   = TYPE_ASPEED_GPIO "-ast1030",
+.parent = TYPE_ASPEED_GPIO,
+.class_init = aspeed_gpio_1030_class_init,
+.instance_init  = aspeed_gpio_init,
+};
+
 static void aspeed_gpio_register_types(void)
 {
 type_register_static(&aspeed_gpio_info);
@@ -1015,6 +1041,7 @@ static void aspeed_gpio_register_types(void)
 type_register_static(&aspeed_gpio_ast2500_info);
 type_register_static(&aspeed_gpio_ast2600_3_3v_info);
 type_register_static(&aspeed_gpio_ast2600_1_8v_info);
+type_register_static(&aspeed_gpio_ast1030_info);
 }
 
 type_init(aspeed_gpio_register_types);
-- 
2.17.1

[PATCH v2 4/4] hw/gpio: replace HWADDR_PRIx with PRIx64

2022-05-24 Thread Jamin Lin

1. replace HWADDR_PRIx with PRIx64
2. fix indent issue

Signed-off-by: Jamin Lin 
---
 hw/gpio/aspeed_gpio.c | 8 
 include/hw/gpio/aspeed_gpio.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index c834bf19f5..a62a673857 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -561,7 +561,7 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
 reg = &agc->reg_table[idx];
 if (reg->set_idx >= agc->nr_gpio_sets) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: no getter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
 return 0;
 }
 
@@ -611,7 +611,7 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
 break;
 default:
 qemu_log_mask(LOG_GUEST_ERROR, "%s: no getter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
 return 0;
 }
 
@@ -787,7 +787,7 @@ static void aspeed_gpio_write(void *opaque, hwaddr offset, 
uint64_t data,
 reg = &agc->reg_table[idx];
 if (reg->set_idx >= agc->nr_gpio_sets) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: no setter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
 return;
 }
 
@@ -872,7 +872,7 @@ static void aspeed_gpio_write(void *opaque, hwaddr offset, 
uint64_t data,
 break;
 default:
 qemu_log_mask(LOG_GUEST_ERROR, "%s: no setter for offset 0x%"
-  HWADDR_PRIx"\n", __func__, offset);
+  PRIx64"\n", __func__, offset);
 return;
 }
 aspeed_gpio_update(s, set, set->data_value);
diff --git a/include/hw/gpio/aspeed_gpio.h b/include/hw/gpio/aspeed_gpio.h
index 41b36524d0..904eecf62c 100644
--- a/include/hw/gpio/aspeed_gpio.h
+++ b/include/hw/gpio/aspeed_gpio.h
@@ -67,7 +67,7 @@ enum GPIORegIndexType {
 typedef struct AspeedGPIOReg {
 uint16_t set_idx;
 enum GPIORegType type;
- } AspeedGPIOReg;
+} AspeedGPIOReg;
 
 struct AspeedGPIOClass {
 SysBusDevice parent_obj;
-- 
2.17.1

[PATCH v2 0/4] hw/gpio Add ASPEED GPIO model for AST1030

2022-05-24 Thread Jamin Lin

-v2 changes

Create separate patches to support the following features

1. Add GPIO read/write trace event.
2. Support GPIO index mode for write operation.
It did not support GPIO index mode for read operation.
3. AST1030 integrates one set of Parallel GPIO Controller
with maximum 151 control pins, which are 21 groups
(A~U, exclude pin: M6 M7 Q5 Q6 Q7 R0 R1 R4 R5 R6 R7 S0 S3 S4
S5 S6 S7 ) and the group T and U are input only.
4. replace HWADDR_PRIx with PRIx64

Jamin Lin (4):
  hw/gpio Add GPIO read/write trace event.
  hw/gpio: Add ASPEED GPIO model for AST1030
  hw/gpio support GPIO index mode for write operation.
  hw/gpio: replace HWADDR_PRIx with PRIx64

 hw/arm/aspeed_ast10x0.c   |  11 ++
 hw/gpio/aspeed_gpio.c | 257 +++---
 hw/gpio/trace-events  |   5 +
 include/hw/gpio/aspeed_gpio.h |  16 ++-
 4 files changed, 269 insertions(+), 20 deletions(-)

-- 
2.17.1

[PATCH v2 3/4] hw/gpio support GPIO index mode for write operation.

2022-05-24 Thread Jamin Lin

It did not support GPIO index mode for read operation.

Signed-off-by: Jamin Lin 
---
 hw/gpio/aspeed_gpio.c | 168 ++
 include/hw/gpio/aspeed_gpio.h |  14 +++
 2 files changed, 182 insertions(+)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index 5138fe812b..c834bf19f5 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -16,6 +16,7 @@
 #include "hw/irq.h"
 #include "migration/vmstate.h"
 #include "trace.h"
+#include "hw/registerfields.h"
 
 #define GPIOS_PER_GROUP 8
 
@@ -204,6 +205,28 @@
 #define GPIO_1_8V_MEM_SIZE0x1D8
 #define GPIO_1_8V_REG_ARRAY_SIZE  (GPIO_1_8V_MEM_SIZE >> 2)
 
+/*
+ * GPIO index mode support
+ * It only supports write operation
+ */
+REG32(GPIO_INDEX_REG, 0x2AC)
+FIELD(GPIO_INDEX_REG, NUMBER, 0, 8)
+FIELD(GPIO_INDEX_REG, COMMAND, 12, 1)
+FIELD(GPIO_INDEX_REG, TYPE, 16, 4)
+FIELD(GPIO_INDEX_REG, DATA_VALUE, 20, 1)
+FIELD(GPIO_INDEX_REG, DIRECTION, 20, 1)
+FIELD(GPIO_INDEX_REG, INT_ENABLE, 20, 1)
+FIELD(GPIO_INDEX_REG, INT_SENS_0, 21, 1)
+FIELD(GPIO_INDEX_REG, INT_SENS_1, 22, 1)
+FIELD(GPIO_INDEX_REG, INT_SENS_2, 23, 1)
+FIELD(GPIO_INDEX_REG, INT_STATUS, 24, 1)
+FIELD(GPIO_INDEX_REG, DEBOUNCE_1, 20, 1)
+FIELD(GPIO_INDEX_REG, DEBOUNCE_2, 21, 1)
+FIELD(GPIO_INDEX_REG, RESET_TOLERANT, 20, 1)
+FIELD(GPIO_INDEX_REG, COMMAND_SRC_0, 20, 1)
+FIELD(GPIO_INDEX_REG, COMMAND_SRC_1, 21, 1)
+FIELD(GPIO_INDEX_REG, INPUT_MASK, 20, 1)
+
 static int aspeed_evaluate_irq(GPIOSets *regs, int gpio_prev_high, int gpio)
 {
 uint32_t falling_edge = 0, rising_edge = 0;
@@ -596,6 +619,144 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
 return value;
 }
 
+static void aspeed_gpio_write_index_mode(void *opaque, hwaddr offset,
+uint64_t data, uint32_t size)
+{
+
+AspeedGPIOState *s = ASPEED_GPIO(opaque);
+AspeedGPIOClass *agc = ASPEED_GPIO_GET_CLASS(s);
+const GPIOSetProperties *props;
+GPIOSets *set;
+uint32_t reg_idx_number = FIELD_EX32(data, GPIO_INDEX_REG, NUMBER);
+uint32_t reg_idx_type = FIELD_EX32(data, GPIO_INDEX_REG, TYPE);
+uint32_t reg_idx_command = FIELD_EX32(data, GPIO_INDEX_REG, COMMAND);
+uint32_t set_idx = reg_idx_number / ASPEED_GPIOS_PER_SET;
+uint32_t pin_idx = reg_idx_number % ASPEED_GPIOS_PER_SET;
+uint32_t group_idx = pin_idx / GPIOS_PER_GROUP;
+uint32_t reg_value = 0;
+uint32_t cleared;
+
+set = &s->sets[set_idx];
+props = &agc->props[set_idx];
+
+if (reg_idx_command)
+qemu_log_mask(LOG_GUEST_ERROR, "%s: offset 0x%" PRIx64 "data 0x%"
+PRIx64 "index mode wrong command 0x%x\n",
+__func__, offset, data, reg_idx_command);
+
+switch (reg_idx_type) {
+case gpio_reg_idx_data:
+reg_value = set->data_read;
+reg_value = deposit32(reg_value, pin_idx, 1,
+  FIELD_EX32(data, GPIO_INDEX_REG, DATA_VALUE));
+reg_value &= props->output;
+reg_value = update_value_control_source(set, set->data_value,
+reg_value);
+set->data_read = reg_value;
+aspeed_gpio_update(s, set, reg_value);
+return;
+case gpio_reg_idx_direction:
+reg_value = set->direction;
+reg_value = deposit32(reg_value, pin_idx, 1,
+  FIELD_EX32(data, GPIO_INDEX_REG, DIRECTION));
+/*
+ *   where data is the value attempted to be written to the pin:
+ *pin type  | input mask | output mask | expected value
+ *
+ *   bidirectional  |   1   |   1|  data
+ *   input only |   1   |   0|   0
+ *   output only|   0   |   1|   1
+ *   no pin |   0   |   0|   0
+ *
+ *  which is captured by:
+ *  data = ( data | ~input) & output;
+ */
+reg_value = (reg_value | ~props->input) & props->output;
+set->direction = update_value_control_source(set, set->direction,
+ reg_value);
+break;
+case gpio_reg_idx_interrupt:
+reg_value = set->int_enable;
+reg_value = deposit32(reg_value, pin_idx, 1,
+  FIELD_EX32(data, GPIO_INDEX_REG, INT_ENABLE));
+set->int_enable = update_value_control_source(set, set->int_enable,
+  reg_value);
+reg_value = set->int_sens_0;
+reg_value = deposit32(reg_value, pin_idx, 1,
+  FIELD_EX32(data, GPIO_INDEX_REG, INT_SENS_0));
+set->int_sens_0 = update_value_control_source(set, set->int_sens_0,
+  reg_value);
+reg_value = s

[PATCH v2 1/4] hw/gpio Add GPIO read/write trace event.

2022-05-24 Thread Jamin Lin

Add GPIO read/write trace event for aspeed model.

Signed-off-by: Jamin Lin 
---
 hw/gpio/aspeed_gpio.c | 54 +++
 hw/gpio/trace-events  |  5 
 2 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index 9b736e7a9f..4620ea8e8b 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -15,6 +15,7 @@
 #include "qapi/visitor.h"
 #include "hw/irq.h"
 #include "migration/vmstate.h"
+#include "trace.h"
 
 #define GPIOS_PER_GROUP 8
 
@@ -523,11 +524,15 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
 uint64_t idx = -1;
 const AspeedGPIOReg *reg;
 GPIOSets *set;
+uint32_t value = 0;
+uint64_t debounce_value;
 
 idx = offset >> 2;
 if (idx >= GPIO_DEBOUNCE_TIME_1 && idx <= GPIO_DEBOUNCE_TIME_3) {
 idx -= GPIO_DEBOUNCE_TIME_1;
-return (uint64_t) s->debounce_regs[idx];
+debounce_value = (uint64_t) s->debounce_regs[idx];
+trace_aspeed_gpio_read(offset, debounce_value);
+return debounce_value;
 }
 
 reg = &agc->reg_table[idx];
@@ -540,38 +545,55 @@ static uint64_t aspeed_gpio_read(void *opaque, hwaddr 
offset, uint32_t size)
 set = &s->sets[reg->set_idx];
 switch (reg->type) {
 case gpio_reg_data_value:
-return set->data_value;
+value = set->data_value;
+break;
 case gpio_reg_direction:
-return set->direction;
+value = set->direction;
+break;
 case gpio_reg_int_enable:
-return set->int_enable;
+value = set->int_enable;
+break;
 case gpio_reg_int_sens_0:
-return set->int_sens_0;
+value = set->int_sens_0;
+break;
 case gpio_reg_int_sens_1:
-return set->int_sens_1;
+value = set->int_sens_1;
+break;
 case gpio_reg_int_sens_2:
-return set->int_sens_2;
+value = set->int_sens_2;
+break;
 case gpio_reg_int_status:
-return set->int_status;
+value = set->int_status;
+break;
 case gpio_reg_reset_tolerant:
-return set->reset_tol;
+value = set->reset_tol;
+break;
 case gpio_reg_debounce_1:
-return set->debounce_1;
+value = set->debounce_1;
+break;
 case gpio_reg_debounce_2:
-return set->debounce_2;
+value = set->debounce_2;
+break;
 case gpio_reg_cmd_source_0:
-return set->cmd_source_0;
+value = set->cmd_source_0;
+break;
 case gpio_reg_cmd_source_1:
-return set->cmd_source_1;
+value = set->cmd_source_1;
+break;
 case gpio_reg_data_read:
-return set->data_read;
+value = set->data_read;
+break;
 case gpio_reg_input_mask:
-return set->input_mask;
+value = set->input_mask;
+break;
 default:
 qemu_log_mask(LOG_GUEST_ERROR, "%s: no getter for offset 0x%"
   HWADDR_PRIx"\n", __func__, offset);
 return 0;
 }
+
+trace_aspeed_gpio_read(offset, value);
+return value;
 }
 
 static void aspeed_gpio_write(void *opaque, hwaddr offset, uint64_t data,
@@ -585,6 +607,8 @@ static void aspeed_gpio_write(void *opaque, hwaddr offset, 
uint64_t data,
 GPIOSets *set;
 uint32_t cleared;
 
+trace_aspeed_gpio_write(offset, data);
+
 idx = offset >> 2;
 if (idx >= GPIO_DEBOUNCE_TIME_1 && idx <= GPIO_DEBOUNCE_TIME_3) {
 idx -= GPIO_DEBOUNCE_TIME_1;
diff --git a/hw/gpio/trace-events b/hw/gpio/trace-events
index 1dab99c560..e9cd4a5662 100644
--- a/hw/gpio/trace-events
+++ b/hw/gpio/trace-events
@@ -27,3 +27,8 @@ sifive_gpio_read(uint64_t offset, uint64_t r) "offset 0x%" 
PRIx64 " value 0x%" P
 sifive_gpio_write(uint64_t offset, uint64_t value) "offset 0x%" PRIx64 " value 
0x%" PRIx64
 sifive_gpio_set(int64_t line, int64_t value) "line %" PRIi64 " value %" PRIi64
 sifive_gpio_update_output_irq(int64_t line, int64_t value) "line %" PRIi64 " 
value %" PRIi64
+
+# aspeed_gpio.c
+aspeed_gpio_read(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " value 
0x%" PRIx64
+aspeed_gpio_write(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " 
value 0x%" PRIx64
+
-- 
2.17.1

Re: [PATCH v4 3/3] i386: Add notify VM exit support

2022-05-24 Thread Yuan Yao

On Tue, May 24, 2022 at 10:03:02PM +0800, Chenyi Qiang wrote:
> There are cases that malicious virtual machine can cause CPU stuck (due
> to event windows don't open up), e.g., infinite loop in microcode when
> nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
> IRQ) can be delivered. It leads the CPU to be unavailable to host or
> other VMs. Notify VM exit is introduced to mitigate such kind of
> attacks, which will generate a VM exit if no event window occurs in VM
> non-root mode for a specified amount of time (notify window).
>
> A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
> so that the user can query the capability and set the expected notify
> window when creating VMs. The format of the argument when enabling this
> capability is as follows:
>   Bit 63:32 - notify window specified in qemu command
>   Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
>   enable the feature.)
>
> Because there are some concerns, e.g. a notify VM exit may happen with
> VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
> that would set this bit), which means VM context is corrupted. To avoid
> the false positive and a well-behaved guest gets killed, make this
> feature disabled by default. Users can enable the feature by a new
> machine property:
> qemu -machine notify_vmexit=on,notify_window=0 ...
>
> A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
> it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
> inform the fatal case. Then user space can inject a SHUTDOWN event to
> the target vcpu. This is implemented by injecting a sythesized triple
> fault event.
>
> Signed-off-by: Chenyi Qiang 
> ---
>  hw/i386/x86.c | 45 +
>  include/hw/i386/x86.h |  5 
>  target/i386/kvm/kvm.c | 66 ++-
>  3 files changed, 96 insertions(+), 20 deletions(-)
>
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 4cf107baea..a82f959cb9 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -1296,6 +1296,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor 
> *v, const char *name,
>  qapi_free_SgxEPCList(list);
>  }
>
> +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
> +{
> +X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +return x86ms->notify_vmexit;
> +}
> +
> +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error 
> **errp)
> +{
> +X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +x86ms->notify_vmexit = value;
> +}
> +
> +static void x86_machine_get_notify_window(Object *obj, Visitor *v,
> +const char *name, void *opaque, Error **errp)
> +{
> +X86MachineState *x86ms = X86_MACHINE(obj);
> +uint32_t notify_window = x86ms->notify_window;
> +
> +visit_type_uint32(v, name, ¬ify_window, errp);
> +}
> +
> +static void x86_machine_set_notify_window(Object *obj, Visitor *v,
> +   const char *name, void *opaque, Error **errp)
> +{
> +X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +visit_type_uint32(v, name, &x86ms->notify_window, errp);
> +}
> +
>  static void x86_machine_initfn(Object *obj)
>  {
>  X86MachineState *x86ms = X86_MACHINE(obj);
> @@ -1306,6 +1337,8 @@ static void x86_machine_initfn(Object *obj)
>  x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
>  x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>  x86ms->bus_lock_ratelimit = 0;
> +x86ms->notify_vmexit = false;
> +x86ms->notify_window = 0;
>  }
>
>  static void x86_machine_class_init(ObjectClass *oc, void *data)
> @@ -1361,6 +1394,18 @@ static void x86_machine_class_init(ObjectClass *oc, 
> void *data)
>  NULL, NULL);
>  object_class_property_set_description(oc, "sgx-epc",
>  "SGX EPC device");
> +
> +object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
> +  x86_machine_get_notify_window,
> +  x86_machine_set_notify_window, NULL, NULL);
> +object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
> +"Set the notify window required by notify VM exit");
> +
> +object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
> +   x86_machine_get_notify_vmexit,
> +   x86_machine_set_notify_vmexit);
> +object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
> +"Enable notify VM exit");
>  }
>
>  static const TypeInfo x86_machine_info = {
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index 916cc325ee..571ee8b667 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -80,6 +80,9 @@ struct X86MachineState {
>   * which means no limitation on the guest's bus locks.
>   */
>  uint64_t bus_lock_ratelimit;
> +
> +bool notify_vmex

Re: [PATCH v5 00/43] Add LoongArch softmmu support

2022-05-24 Thread Richard Henderson


On 5/24/22 17:44, yangxiaojuan wrote:


在 2022/5/25 6:41, Richard Henderson 写道:

On 5/24/22 15:32, Richard Henderson wrote:

When the syntax errors are fixed, it does not pass "make check".


When I configure with --enable-debug --enable-sanitizers I get


I got the same error.

The 'make check '  result:

Summary of Failures:

  95/117 qemu:qtest+qtest-loongarch64 / qtest-loongarch64/device-introspect-test 
ERROR   1.20s killed by signal 6 SIGABRT

Ok: 114
Expected Fail:  0
Fail:   1
Unexpected Pass:    0
Skipped:    2
Timeout:    0


We will fix this error as soon as possible.  And  what necessary tests do we 
need to do?
'mak check-tcg' ,  'make check' and 'make docker-test-build',  these are we 
know so far.

I also see the wiki  [1],   should  we need tests all of them? Could you give 
us some advice?
[1] : https://wiki.qemu.org/Testing#Tests_included_in_the_QEMU_source


That's pretty good.  Eventually it would be good to add some tests to tests/avocado, to 
test linux kernel boot.  That can wait for a bit, as it also requires hosting a kernel 
image somewhere.


In this instance I used --enable-sanitizers because without, I was getting SIGFPE for a 
rather unlikely divide-by-zero, and I suspected memory corruption.



r~

Re: [PULL 00/23] riscv-to-apply queue

2022-05-24 Thread Richard Henderson


On 5/24/22 15:44, Alistair Francis wrote:

From: Alistair Francis 

The following changes since commit 3757b0d08b399c609954cf57f273b1167e5d7a8d:

   Merge tag 'pull-request-2022-05-18' of https://gitlab.com/thuth/qemu into 
staging (2022-05-20 08:04:30 -0700)

are available in the Git repository at:

   g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20220525

for you to fetch changes up to 8fe63fe8e512d77583d6798acd2164f1fa1e40ab:

   hw/core: loader: Set is_linux to true for VxWorks uImage (2022-05-24 
10:38:50 +1000)


Third RISC-V PR for QEMU 7.1

  * Fixes for accessing VS hypervisor CSRs
  * Improvements for RISC-V Vector extension
  * Fixes for accessing mtimecmp
  * Add new short-isa-string CPU option
  * Improvements to RISC-V machine error handling
  * Disable the "G" extension by default internally, no functional change
  * Enforce floating point extension requirements
  * Cleanup ISA extension checks
  * Resolve redundant property accessors
  * Fix typo of mimpid cpu option
  * Improvements for virtulisation
  * Add zicsr/zifencei to isa_string
  * Support for VxWorks uImage


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Anup Patel (4):
   target/riscv: Fix csr number based privilege checking
   target/riscv: Fix hstatus.GVA bit setting for traps taken from HS-mode
   target/riscv: Set [m|s]tval for both illegal and virtual instruction 
traps
   hw/riscv: virt: Fix interrupt parent for dynamic platform devices

Atish Patra (1):
   hw/intc: Pass correct hartid while updating mtimecmp

Bernhard Beschow (2):
   hw/vfio/pci-quirks: Resolve redundant property getters
   hw/riscv/sifive_u: Resolve redundant property accessors

Bin Meng (2):
   hw/core: Sync uboot_image.h from U-Boot v2022.01
   hw/core: loader: Set is_linux to true for VxWorks uImage

Dylan Reid (1):
   target/riscv: Fix VS mode hypervisor CSR access

Frank Chang (1):
   target/riscv: Fix typo of mimpid cpu option

Hongren (Zenithal) Zheng (1):
   target/riscv: add zicsr/zifencei to isa_string

Tsukasa OI (9):
   target/riscv: Move Zhinx* extensions on ISA string
   target/riscv: Add short-isa-string option
   hw/riscv: Make CPU config error handling generous (virt/spike)
   hw/riscv: Make CPU config error handling generous (sifive_e/u/opentitan)
   target/riscv: Fix coding style on "G" expansion
   target/riscv: Disable "G" by default
   target/riscv: Change "G" expansion
   target/riscv: FP extension requirements
   target/riscv: Move/refactor ISA extension checks

Weiwei Li (1):
   target/riscv: check 'I' and 'E' after checking 'G' in riscv_cpu_realize

eopXD (1):
   target/riscv: rvv: Fix early exit condition for whole register load/store

  hw/core/uboot_image.h   | 213 +---
  target/riscv/cpu.h  |  12 +-
  hw/core/loader.c|  15 +++
  hw/intc/riscv_aclint.c  |   3 +-
  hw/riscv/opentitan.c|   2 +-
  hw/riscv/sifive_e.c |   2 +-
  hw/riscv/sifive_u.c |  28 +
  hw/riscv/spike.c|   2 +-
  hw/riscv/virt.c |  27 ++--
  hw/vfio/pci-quirks.c|  34 ++---
  target/riscv/cpu.c  |  91 ++
  target/riscv/cpu_helper.c   |   4 +-
  target/riscv/csr.c  |  26 ++--
  target/riscv/translate.c|  17 ++-
  target/riscv/insn_trans/trans_rvv.c.inc |  58 +
  15 files changed, 325 insertions(+), 209 deletions(-)

Re: [PATCH v5 00/43] Add LoongArch softmmu support

2022-05-24 Thread yangxiaojuan



在 2022/5/25 6:41, Richard Henderson 写道:

On 5/24/22 15:32, Richard Henderson wrote:

When the syntax errors are fixed, it does not pass "make check".


When I configure with --enable-debug --enable-sanitizers I get


I got the same error.

The 'make check '  result:

Summary of Failures:

 95/117 qemu:qtest+qtest-loongarch64 / 
qtest-loongarch64/device-introspect-test ERROR   1.20s killed by 
signal 6 SIGABRT

Ok: 114
Expected Fail:  0
Fail:   1
Unexpected Pass:    0
Skipped:    2
Timeout:    0


We will fix this error as soon as possible.  And  what necessary tests 
do we need to do?
'mak check-tcg' ,  'make check' and 'make docker-test-build',  these are 
we know so far.


I also see the wiki  [1],   should  we need tests all of them? Could you 
give us some advice?

[1] : https://wiki.qemu.org/Testing#Tests_included_in_the_QEMU_source

Thanks.
Xiaojuan


$ QTEST_QEMU_BINARY='./qemu-system-loongarch64' 
./tests/qtest/device-introspect-test -v

...
# Testing device 'loongarch_ipi'

=

==911066==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x61393550 at pc 0x7f97cb425c23 bp 0x7ffe6583f4f0 sp 0x7ffe6583ec98


WRITE of size 8 at 0x61393550 thread T0

    #0 0x7f97cb425c22 in __interceptor_memset 
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799


    #1 0x562b21b23916 in qdev_init_gpio_out_named 
../qemu/hw/core/gpio.c:85


    #2 0x562b21b23b89 in qdev_init_gpio_out ../qemu/hw/core/gpio.c:101

    #3 0x562b22562d77 in loongarch_ipi_init 
../qemu/hw/intc/loongarch_ipi.c:187


    #4 0x562b22992ef0 in object_init_with_type ../qemu/qom/object.c:377

    #5 0x562b2299445f in object_initialize_with_type 
../qemu/qom/object.c:519


    #6 0x562b22995b54 in object_new_with_type ../qemu/qom/object.c:734

    #7 0x562b22995c6d in object_new ../qemu/qom/object.c:749

    #8 0x562b22ddc1d3 in qmp_device_list_properties 
../qemu/qom/qom-qmp-cmds.c:146


    #9 0x562b22f4ad2c in qmp_marshal_device_list_properties 
qapi/qapi-commands-qdev.c:66


    #10 0x562b22fa7ab6 in do_qmp_dispatch_bh 
../qemu/qapi/qmp-dispatch.c:128


    #11 0x562b230354b1 in aio_bh_call ../qemu/util/async.c:142

    #12 0x562b23035c09 in aio_bh_poll ../qemu/util/async.c:170

    #13 0x562b22fd6531 in aio_dispatch ../qemu/util/aio-posix.c:421

    #14 0x562b2303714c in aio_ctx_dispatch ../qemu/util/async.c:312

    #15 0x7f97caafdd1a in g_main_dispatch ../../../glib/gmain.c:3417

    #16 0x7f97caafdd1a in g_main_context_dispatch 
../../../glib/gmain.c:4135


    #17 0x562b23089479 in glib_pollfds_poll ../qemu/util/main-loop.c:297

    #18 0x562b23089663 in os_host_main_loop_wait 
../qemu/util/main-loop.c:320


    #19 0x562b23089968 in main_loop_wait ../qemu/util/main-loop.c:596

    #20 0x562b2223edf5 in qemu_main_loop ../qemu/softmmu/runstate.c:726

    #21 0x562b21965c69 in qemu_main ../qemu/softmmu/main.c:36

    #22 0x562b21965c9e in main ../qemu/softmmu/main.c:45

    #23 0x7f97c9354d8f in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58


    #24 0x7f97c9354e3f in __libc_start_main_impl ../csu/libc-start.c:392

    #25 0x562b21965b74 in _start 
(/home/rth/chroot-home/bld-x/qemu-system-loongarch64+0x21b0b74)




0x61393550 is located 48 bytes to the left of 376-byte region 
[0x61393580,0x613936f8)


allocated by thread T0 here:

    #0 0x7f97cb4a0a37 in __interceptor_calloc 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154


    #1 0x7f97cab06c40 in g_malloc0 ../../../glib/gmem.c:155

    #2 0x562b2298fef0 in type_register_internal ../qemu/qom/object.c:143

    #3 0x562b2298ffcd in type_register ../qemu/qom/object.c:152

    #4 0x562b2199c281 in qemu_console_early_init 
../qemu/ui/console.c:2719


    #5 0x562b2224d16e in qemu_create_early_backends 
../qemu/softmmu/vl.c:1975


    #6 0x562b222565ef in qemu_init ../qemu/softmmu/vl.c:3674

    #7 0x562b21965c64 in qemu_main ../qemu/softmmu/main.c:35

    #8 0x562b21965c9e in main ../qemu/softmmu/main.c:45

    #9 0x7f97c9354d8f in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58




SUMMARY: AddressSanitizer: heap-buffer-overflow 
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 
in __interceptor_memset


Shadow bytes around the buggy address:

  0x0c268000a650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa

  0x0c268000a670: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00

  0x0c268000a680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

=>0x0c268000a6a0: 00 00 00 00 fa fa fa fa fa fa[fa]fa fa fa fa fa

  0x0c268000a6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Re: [PATCH v5 00/43] Add LoongArch softmmu support

2022-05-24 Thread yangxiaojuan


Hi, Richard

在 2022/5/25 6:32, Richard Henderson 写道:

On 5/24/22 01:17, Xiaojuan Yang wrote:

Hi All,

As this series only supports running binary files in ELF format, and
does not depend on BIOS and kernel file. so this series are changed 
from RFC to patch vX.



The manual:
   - 
https://github.com/loongson/LoongArch-Documentation/releases/tag/2022.03.17


Old series:
   - 
https://patchew.org/QEMU/20220328125749.2918087-1-yangxiaoj...@loongson.cn/
   - 
https://patchew.org/QEMU/20220106094200.1801206-1-gaos...@loongson.cn/


Need review patches:
   - 0034-hw-intc-Add-LoongArch-extioi-interrupt-controller-EI.patch
   - 0038-hw-loongarch-Add-LoongArch-ls7a-rtc-device-support.patch

This patch need ACPI maintainers review:
   - 0040-hw-loongarch-Add-LoongArch-ls7a-acpi-device-support.patch

Thanks.
Xiaojuan

-
v5:
   - Fixed loongarch extioi device emulation.
   - Fixed loongarch rtc device emulation.
   - Fixed 'make docker-test-build' error.


I had been tempted to accept the patch set as is, and let subsequent 
development happen on mainline, but this patch set does not compile, 
with obvious syntax errors.


When the syntax errors are fixed, it does not pass "make check".

How can you have tested this?

It `s my mistake.  I just tested   `IMAGES='fedora-i386-cross'  make 
docker-test-build `,  I will correct it in v6.


Thanks.
Xiaojuan

Re: Re: [PATCH 3/3] virtio_balloon: Introduce memory recover

2022-05-24 Thread zhenwei pi





On 5/25/22 03:35, Sean Christopherson wrote:

On Fri, May 20, 2022, zhenwei pi wrote:

@@ -59,6 +60,12 @@ enum virtio_balloon_config_read {
VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
  };
  
+/* the request body to commucate with host side */

+struct __virtio_balloon_recover {
+   struct virtio_balloon_recover vbr;
+   __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE];


I assume this is copied from virtio_balloon.pfns, which also uses __virtio32, 
but
isn't that horribly broken?  PFNs are 'unsigned long', i.e. 64 bits on 64-bit 
kernels.
x86-64 at least most definitely generates 64-bit PFNs.  Unless there's magic I'm
missing, page_to_balloon_pfn() will truncate PFNs and feed the host bad info.



Yes, I also noticed this point, I suppose the balloon device can not 
work on a virtual machine which has physical address larger than 16T.


I still let the recover VQ keep aligned with the inflate VQ and deflate 
VQ. I prefer the recover VQ to be workable/unworkable with 
inflate/deflate VQ together. So I leave this to the virtio balloon 
maintainer to decide ...



@@ -494,6 +511,198 @@ static void update_balloon_size_func(struct work_struct 
*work)
queue_work(system_freezable_wq, work);
  }
  
+/*

+ * virtballoon_memory_failure - notified by memory failure, try to fix the
+ *  corrupted page.
+ * The memory failure notifier is designed to call back when the kernel handled
+ * successfully only, WARN_ON_ONCE on the unlikely condition to find out any
+ * error(memory error handling is a best effort, not 100% coverd).
+ */
+static int virtballoon_memory_failure(struct notifier_block *notifier,
+ unsigned long pfn, void *parm)
+{
+   struct virtio_balloon *vb = container_of(notifier, struct 
virtio_balloon,
+memory_failure_nb);
+   struct page *page;
+   struct __virtio_balloon_recover *out_vbr;
+   struct scatterlist sg;
+   unsigned long flags;
+   int err;
+
+   page = pfn_to_online_page(pfn);
+   if (WARN_ON_ONCE(!page))
+   return NOTIFY_DONE;
+
+   if (PageHuge(page))
+   return NOTIFY_DONE;
+
+   if (WARN_ON_ONCE(!PageHWPoison(page)))
+   return NOTIFY_DONE;
+
+   if (WARN_ON_ONCE(page_count(page) != 1))
+   return NOTIFY_DONE;
+
+   get_page(page); /* balloon reference */
+
+   out_vbr = kzalloc(sizeof(*out_vbr), GFP_KERNEL);
+   if (WARN_ON_ONCE(!out_vbr))
+   return NOTIFY_BAD;


Not that it truly matters, but won't failure at this point leak the poisoned 
page?


I'll fix this, thanks!

--
zhenwei pi

Re: [PATCH 0/2] i386: fixup number of logical CPUs when host-cache-info=on

2022-05-24 Thread Alejandro Jimenez


On 5/24/2022 3:48 PM, Moger, Babu wrote:


On 5/24/22 10:19, Igor Mammedov wrote:

On Tue, 24 May 2022 11:10:18 -0400
Igor Mammedov  wrote:

CCing AMD folks as that might be of interest to them


I am trying to recreate the bug on my AMD system here.. Seeing this message..

qemu-system-x86_64: -numa node,nodeid=0,memdev=ram-node0: memdev=ram-node0
is ambiguous

Here is my command line..

#qemu-system-x86_64 -name rhel8 -m 4096 -hda vdisk.qcow2 -enable-kvm -net
nic  -nographic -machine q35,accel=kvm -cpu
host,host-cache-info=on,l3-cache=off -smp
20,sockets=2,dies=1,cores=10,threads=1 -numa
node,nodeid=0,memdev=ram-node0 -numa node,nodeid=1,memdev=ram-node1 -numa
cpu,socket-id=0,node-id=0 -numa cpu,socket-id=1,node-id=1

Am I missing something?

Hi Babu,

Hopefully this will help you reproduce the issue if you are testing on 
Milan/Genoa. Joao (CC'd) pointed out this warning to me late last year, 
while I was working on patches for encoding the topology CPUID leaf in 
different Zen platforms.


What I found from my experiments on Milan, is that the warning will 
appear whenever the NUMA topology requested in QEMU cmdline assigns a 
number of CPUs to each node that is smaller than the default # of CPUs 
sharing a LLC on the host platform. In short, on a Milan host where we 
have 16 CPUs sharing a CCX:


# cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-7,128-135

If a guest is launched with the following arguments:

-cpu host,+topoext \
-smp cpus=64,cores=32,threads=2,sockets=1 \
-numa node,nodeid=0,cpus=0-7 -numa node,nodeid=1,cpus=8-15 \
-numa node,nodeid=2,cpus=16-23 -numa node,nodeid=3,cpus=24-31 \
-numa node,nodeid=4,cpus=32-39 -numa node,nodeid=5,cpus=40-47 \
-numa node,nodeid=6,cpus=48-55 -numa node,nodeid=7,cpus=56-63 \

it assigns 8 cpus to each NUMA node, causing the error above to be 
displayed.


Note that ultimately the guest topology is built based on the NUMA 
information, so the LLC domains on the guest only end up spanning a 
single NUMA node. e.g.:


# cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-7

Hope that helps,
Alejandro






Igor Mammedov (2):
   x86: cpu: make sure number of addressable IDs for processor cores
 meets the spec
   x86: cpu: fixup number of addressable IDs for logical processors
 sharing cache

  target/i386/cpu.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

[PULL 23/23] hw/core: loader: Set is_linux to true for VxWorks uImage

2022-05-24 Thread Alistair Francis

From: Bin Meng 

VxWorks 7 uses the same boot interface as the Linux kernel on Arm
(64-bit only), PowerPC and RISC-V architectures. Add logic to set
is_linux to true for VxWorks uImage for these architectures in
load_uboot_image().

Signed-off-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Message-Id: <20220324134812.541274-2-bmeng...@gmail.com>
Signed-off-by: Alistair Francis 
---
 hw/core/loader.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 8167301f04..edde657ac3 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -696,6 +696,21 @@ static int load_uboot_image(const char *filename, hwaddr 
*ep, hwaddr *loadaddr,
 if (is_linux) {
 if (hdr->ih_os == IH_OS_LINUX) {
 *is_linux = 1;
+} else if (hdr->ih_os == IH_OS_VXWORKS) {
+/*
+ * VxWorks 7 uses the same boot interface as the Linux kernel
+ * on Arm (64-bit only), PowerPC and RISC-V architectures.
+ */
+switch (hdr->ih_arch) {
+case IH_ARCH_ARM64:
+case IH_ARCH_PPC:
+case IH_ARCH_RISCV:
+*is_linux = 1;
+break;
+default:
+*is_linux = 0;
+break;
+}
 } else {
 *is_linux = 0;
 }
-- 
2.35.3

[PULL 22/23] hw/core: Sync uboot_image.h from U-Boot v2022.01

2022-05-24 Thread Alistair Francis

From: Bin Meng 

Sync uboot_image.h from upstream U-Boot v2022.01 release [1].

[1] https://source.denx.de/u-boot/u-boot/-/blob/v2022.01/include/image.h

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
Message-Id: <20220324134812.541274-1-bmeng...@gmail.com>
Signed-off-by: Alistair Francis 
---
 hw/core/uboot_image.h | 213 --
 1 file changed, 142 insertions(+), 71 deletions(-)

diff --git a/hw/core/uboot_image.h b/hw/core/uboot_image.h
index 608022de6e..18ac293359 100644
--- a/hw/core/uboot_image.h
+++ b/hw/core/uboot_image.h
@@ -1,23 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
 /*
+ * (C) Copyright 2008 Semihalf
+ *
  * (C) Copyright 2000-2005
  * Wolfgang Denk, DENX Software Engineering, w...@denx.de.
- *
- * See file CREDITS for list of people who contributed to this
- * project.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of
- * the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, see .
- *
  
  * NOTE: This header file defines an interface to U-Boot. Including
  * this (unmodified) header file in another file is considered normal
@@ -31,50 +17,83 @@
 
 /*
  * Operating System Codes
+ *
+ * The following are exposed to uImage header.
+ * New IDs *MUST* be appended at the end of the list and *NEVER*
+ * inserted for backward compatibility.
  */
-#define IH_OS_INVALID  0   /* Invalid OS   */
-#define IH_OS_OPENBSD  1   /* OpenBSD  */
-#define IH_OS_NETBSD   2   /* NetBSD   */
-#define IH_OS_FREEBSD  3   /* FreeBSD  */
-#define IH_OS_4_4BSD   4   /* 4.4BSD   */
-#define IH_OS_LINUX5   /* Linux*/
-#define IH_OS_SVR4 6   /* SVR4 */
-#define IH_OS_ESIX 7   /* Esix */
-#define IH_OS_SOLARIS  8   /* Solaris  */
-#define IH_OS_IRIX 9   /* Irix */
-#define IH_OS_SCO  10  /* SCO  */
-#define IH_OS_DELL 11  /* Dell */
-#define IH_OS_NCR  12  /* NCR  */
-#define IH_OS_LYNXOS   13  /* LynxOS   */
-#define IH_OS_VXWORKS  14  /* VxWorks  */
-#define IH_OS_PSOS 15  /* pSOS */
-#define IH_OS_QNX  16  /* QNX  */
-#define IH_OS_U_BOOT   17  /* Firmware */
-#define IH_OS_RTEMS18  /* RTEMS*/
-#define IH_OS_ARTOS19  /* ARTOS*/
-#define IH_OS_UNITY20  /* Unity OS */
+enum {
+   IH_OS_INVALID   = 0,/* Invalid OS   */
+   IH_OS_OPENBSD,  /* OpenBSD  */
+   IH_OS_NETBSD,   /* NetBSD   */
+   IH_OS_FREEBSD,  /* FreeBSD  */
+   IH_OS_4_4BSD,   /* 4.4BSD   */
+   IH_OS_LINUX,/* Linux*/
+   IH_OS_SVR4, /* SVR4 */
+   IH_OS_ESIX, /* Esix */
+   IH_OS_SOLARIS,  /* Solaris  */
+   IH_OS_IRIX, /* Irix */
+   IH_OS_SCO,  /* SCO  */
+   IH_OS_DELL, /* Dell */
+   IH_OS_NCR,  /* NCR  */
+   IH_OS_LYNXOS,   /* LynxOS   */
+   IH_OS_VXWORKS,  /* VxWorks  */
+   IH_OS_PSOS, /* pSOS */
+   IH_OS_QNX,  /* QNX  */
+   IH_OS_U_BOOT,   /* Firmware */
+   IH_OS_RTEMS,/* RTEMS*/
+   IH_OS_ARTOS,/* ARTOS*/
+   IH_OS_UNITY,/* Unity OS */
+   IH_OS_INTEGRITY,/* INTEGRITY*/
+   IH_OS_OSE,  /* OSE  */
+   IH_OS_PLAN9,/* Plan 9   */
+   IH_OS_OPENRTOS, /* OpenRTOS */
+   IH_OS_ARM_TRUSTED_FIRMWARE, /* ARM Trusted Firmware */
+   IH_OS_TEE,  /* Trusted Execution Environment */
+   IH_OS_OPENSBI,  /* RISC-V OpenSBI */
+   IH_OS_EFI,  /* EFI Firmware (e.g. GRUB2) */
+
+   IH_OS_COUNT,
+};
 
 /*
  * CPU Architecture Codes (s

[PULL 11/23] target/riscv: FP extension requirements

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

QEMU allowed inconsistent configurations that made floating point
arithmetic effectively unusable.

This commit adds certain checks for consistent FP arithmetic:

-   F requires Zicsr
-   Zfinx requires Zicsr
-   Zfh/Zfhmin require F
-   D requires F
-   V requires D

Because F/D/Zicsr are enabled by default (and an error will not occur unless
we manually disable one or more of prerequisites), this commit just enforces
the user to give consistent combinations.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 
<00e7b1c6060dab32ac7d49813b1ca84d3eb63298.1652583332.git.research_tra...@irq.a4lg.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4ca6a8623f..b960473f7d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -610,11 +610,36 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 cpu->cfg.ext_ifencei = true;
 }
 
+if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) {
+error_setg(errp, "F extension requires Zicsr");
+return;
+}
+
+if ((cpu->cfg.ext_zfh || cpu->cfg.ext_zfhmin) && !cpu->cfg.ext_f) {
+error_setg(errp, "Zfh/Zfhmin extensions require F extension");
+return;
+}
+
+if (cpu->cfg.ext_d && !cpu->cfg.ext_f) {
+error_setg(errp, "D extension requires F extension");
+return;
+}
+
+if (cpu->cfg.ext_v && !cpu->cfg.ext_d) {
+error_setg(errp, "V extension requires D extension");
+return;
+}
+
 if (cpu->cfg.ext_zdinx || cpu->cfg.ext_zhinx ||
 cpu->cfg.ext_zhinxmin) {
 cpu->cfg.ext_zfinx = true;
 }
 
+if (cpu->cfg.ext_zfinx && !cpu->cfg.ext_icsr) {
+error_setg(errp, "Zfinx extension requires Zicsr");
+return;
+}
+
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
 cpu->cfg.ext_zkr = true;
-- 
2.35.3

[PULL 21/23] target/riscv: add zicsr/zifencei to isa_string

2022-05-24 Thread Alistair Francis

From: "Hongren (Zenithal) Zheng" 

Zicsr/Zifencei is not in 'I' since ISA version 20190608,
thus to fully express the capability of the CPU,
they should be exposed in isa_string.

Signed-off-by: Hongren (Zenithal) Zheng 
Tested-by: Jiatai He 
Reviewed-by: Alistair Francis 
Message-Id: 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ce1c257eef..a91253d4bd 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1029,6 +1029,8 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str, int max_str_len)
  *extensions by an underscore.
  */
 struct isa_ext_data isa_edata_arr[] = {
+ISA_EDATA_ENTRY(zicsr, ext_icsr),
+ISA_EDATA_ENTRY(zifencei, ext_ifencei),
 ISA_EDATA_ENTRY(zfh, ext_zfh),
 ISA_EDATA_ENTRY(zfhmin, ext_zfhmin),
 ISA_EDATA_ENTRY(zfinx, ext_zfinx),
-- 
2.35.3

[PULL 20/23] hw/riscv: virt: Fix interrupt parent for dynamic platform devices

2022-05-24 Thread Alistair Francis

From: Anup Patel 

When both APLIC and IMSIC are present in virt machine, the APLIC should
be used as parent interrupt controller for dynamic platform devices.

In case of  multiple sockets, we should prefer interrupt controller of
socket0 for dynamic platform devices.

Fixes: 3029fab64309 ("hw/riscv: virt: Add support for generating
platform FDT entries")
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Message-Id: <20220511144528.393530-9-apa...@ventanamicro.com>
Signed-off-by: Alistair Francis 
---
 hw/riscv/virt.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 244d6408b5..293e9c95b7 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -478,10 +478,12 @@ static void create_fdt_socket_plic(RISCVVirtState *s,
 qemu_fdt_setprop_cell(mc->fdt, plic_name, "phandle",
 plic_phandles[socket]);
 
-platform_bus_add_all_fdt_nodes(mc->fdt, plic_name,
-   memmap[VIRT_PLATFORM_BUS].base,
-   memmap[VIRT_PLATFORM_BUS].size,
-   VIRT_PLATFORM_BUS_IRQ);
+if (!socket) {
+platform_bus_add_all_fdt_nodes(mc->fdt, plic_name,
+   memmap[VIRT_PLATFORM_BUS].base,
+   memmap[VIRT_PLATFORM_BUS].size,
+   VIRT_PLATFORM_BUS_IRQ);
+}
 
 g_free(plic_name);
 
@@ -561,11 +563,6 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 }
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "phandle", *msi_m_phandle);
 
-platform_bus_add_all_fdt_nodes(mc->fdt, imsic_name,
-   memmap[VIRT_PLATFORM_BUS].base,
-   memmap[VIRT_PLATFORM_BUS].size,
-   VIRT_PLATFORM_BUS_IRQ);
-
 g_free(imsic_name);
 
 /* S-level IMSIC node */
@@ -704,10 +701,12 @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
 riscv_socket_fdt_write_id(mc, mc->fdt, aplic_name, socket);
 qemu_fdt_setprop_cell(mc->fdt, aplic_name, "phandle", aplic_s_phandle);
 
-platform_bus_add_all_fdt_nodes(mc->fdt, aplic_name,
-   memmap[VIRT_PLATFORM_BUS].base,
-   memmap[VIRT_PLATFORM_BUS].size,
-   VIRT_PLATFORM_BUS_IRQ);
+if (!socket) {
+platform_bus_add_all_fdt_nodes(mc->fdt, aplic_name,
+   memmap[VIRT_PLATFORM_BUS].base,
+   memmap[VIRT_PLATFORM_BUS].size,
+   VIRT_PLATFORM_BUS_IRQ);
+}
 
 g_free(aplic_name);
 
-- 
2.35.3

[PULL 17/23] target/riscv: Fix csr number based privilege checking

2022-05-24 Thread Alistair Francis

From: Anup Patel 

When hypervisor and VS CSRs are accessed from VS-mode or VU-mode,
the riscv_csrrw_check() function should generate virtual instruction
trap instead illegal instruction trap.

Fixes: 0a42f4c44088 (" target/riscv: Fix CSR perm checking for HS mode")
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
Message-Id: <20220511144528.393530-2-apa...@ventanamicro.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0d5bc2f41d..6dbe9b541f 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3139,7 +3139,7 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
 int read_only = get_field(csrno, 0xC00) == 3;
 int csr_min_priv = csr_ops[csrno].min_priv_ver;
 #if !defined(CONFIG_USER_ONLY)
-int effective_priv = env->priv;
+int csr_priv, effective_priv = env->priv;
 
 if (riscv_has_ext(env, RVH) && env->priv == PRV_S) {
 /*
@@ -3152,7 +3152,11 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
 effective_priv++;
 }
 
-if (!env->debugger && (effective_priv < get_field(csrno, 0x300))) {
+csr_priv = get_field(csrno, 0x300);
+if (!env->debugger && (effective_priv < csr_priv)) {
+if (csr_priv == (PRV_S + 1) && riscv_cpu_virt_enabled(env)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
 return RISCV_EXCP_ILLEGAL_INST;
 }
 #endif
-- 
2.35.3

[PULL 13/23] hw/vfio/pci-quirks: Resolve redundant property getters

2022-05-24 Thread Alistair Francis

From: Bernhard Beschow 

The QOM API already provides getters for uint64 and uint32 values, so reuse
them.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20220301225220.239065-2-shen...@gmail.com>
Signed-off-by: Alistair Francis 
---
 hw/vfio/pci-quirks.c | 34 +-
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 0cf69a8c6d..f0147a050a 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1565,22 +1565,6 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice 
*vdev, Error **errp)
 return 0;
 }
 
-static void vfio_pci_nvlink2_get_tgt(Object *obj, Visitor *v,
- const char *name,
- void *opaque, Error **errp)
-{
-uint64_t tgt = (uintptr_t) opaque;
-visit_type_uint64(v, name, &tgt, errp);
-}
-
-static void vfio_pci_nvlink2_get_link_speed(Object *obj, Visitor *v,
- const char *name,
- void *opaque, Error **errp)
-{
-uint32_t link_speed = (uint32_t)(uintptr_t) opaque;
-visit_type_uint32(v, name, &link_speed, errp);
-}
-
 int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp)
 {
 int ret;
@@ -1618,9 +1602,9 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, 
Error **errp)
nv2reg->size, p);
 QLIST_INSERT_HEAD(&vdev->bars[0].quirks, quirk, next);
 
-object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64",
-vfio_pci_nvlink2_get_tgt, NULL, NULL,
-(void *) (uintptr_t) cap->tgt);
+object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt",
+   (uint64_t *) &cap->tgt,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvidia_gpu_setup_quirk(vdev->vbasedev.name, cap->tgt,
   nv2reg->size);
 free_exit:
@@ -1679,15 +1663,15 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error 
**errp)
 QLIST_INSERT_HEAD(&vdev->bars[0].quirks, quirk, next);
 }
 
-object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64",
-vfio_pci_nvlink2_get_tgt, NULL, NULL,
-(void *) (uintptr_t) captgt->tgt);
+object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt",
+   (uint64_t *) &captgt->tgt,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvlink2_setup_quirk_ssatgt(vdev->vbasedev.name, captgt->tgt,
   atsdreg->size);
 
-object_property_add(OBJECT(vdev), "nvlink2-link-speed", "uint32",
-vfio_pci_nvlink2_get_link_speed, NULL, NULL,
-(void *) (uintptr_t) capspeed->link_speed);
+object_property_add_uint32_ptr(OBJECT(vdev), "nvlink2-link-speed",
+   &capspeed->link_speed,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvlink2_setup_quirk_lnkspd(vdev->vbasedev.name,
   capspeed->link_speed);
 free_exit:
-- 
2.35.3

[PULL 08/23] target/riscv: Fix coding style on "G" expansion

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

Because ext_? members are boolean variables, operator `&&' should be
used instead of `&'.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Reviewed-by: Víctor Colombo 
Message-Id: 
<91633f8349253656dd08bc8dc36498a9c7538b10.1652583332.git.research_tra...@irq.a4lg.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index dc93412395..e439716337 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -596,8 +596,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (cpu->cfg.ext_g && !(cpu->cfg.ext_i & cpu->cfg.ext_m &
-cpu->cfg.ext_a & cpu->cfg.ext_f &
+if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
+cpu->cfg.ext_a && cpu->cfg.ext_f &&
 cpu->cfg.ext_d)) {
 warn_report("Setting G will also set IMAFD");
 cpu->cfg.ext_i = true;
-- 
2.35.3

[PULL 16/23] target/riscv: Fix typo of mimpid cpu option

2022-05-24 Thread Alistair Francis

From: Frank Chang 

"mimpid" cpu option was mistyped to "mipid".

Fixes: 9951ba94 ("target/riscv: Support configuarable marchid, mvendorid, mipid 
CSR values")
Signed-off-by: Frank Chang 
Reviewed-by: Alistair Francis 
Message-Id: <20220523153147.15371-1-frank.ch...@sifive.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h | 2 +-
 target/riscv/cpu.c | 4 ++--
 target/riscv/csr.c | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f5ff7294c6..44975e3e5a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -408,7 +408,7 @@ struct RISCVCPUConfig {
 
 uint32_t mvendorid;
 uint64_t marchid;
-uint64_t mipid;
+uint64_t mimpid;
 
 /* Vendor-specific custom extensions */
 bool ext_XVentanaCondOps;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 87e1eddce6..fe8ceb4133 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -37,7 +37,7 @@
 #define RISCV_CPU_MARCHID   ((QEMU_VERSION_MAJOR << 16) | \
  (QEMU_VERSION_MINOR << 8)  | \
  (QEMU_VERSION_MICRO))
-#define RISCV_CPU_MIPID RISCV_CPU_MARCHID
+#define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
 
 static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
 
@@ -869,7 +869,7 @@ static Property riscv_cpu_properties[] = {
 
 DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
 DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
-DEFINE_PROP_UINT64("mipid", RISCVCPU, cfg.mipid, RISCV_CPU_MIPID),
+DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
 
 DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
 DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 4ea7df02c9..0d5bc2f41d 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -674,13 +674,13 @@ static RISCVException read_marchid(CPURISCVState *env, 
int csrno,
 return RISCV_EXCP_NONE;
 }
 
-static RISCVException read_mipid(CPURISCVState *env, int csrno,
- target_ulong *val)
+static RISCVException read_mimpid(CPURISCVState *env, int csrno,
+  target_ulong *val)
 {
 CPUState *cs = env_cpu(env);
 RISCVCPU *cpu = RISCV_CPU(cs);
 
-*val = cpu->cfg.mipid;
+*val = cpu->cfg.mimpid;
 return RISCV_EXCP_NONE;
 }
 
@@ -3372,7 +3372,7 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 /* Machine Information Registers */
 [CSR_MVENDORID] = { "mvendorid", any,   read_mvendorid },
 [CSR_MARCHID]   = { "marchid",   any,   read_marchid   },
-[CSR_MIMPID]= { "mimpid",any,   read_mipid },
+[CSR_MIMPID]= { "mimpid",any,   read_mimpid},
 [CSR_MHARTID]   = { "mhartid",   any,   read_mhartid   },
 
 [CSR_MCONFIGPTR]  = { "mconfigptr", any,   read_zero,
-- 
2.35.3

[PULL 10/23] target/riscv: Change "G" expansion

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

On ISA version 20190608 or later, "G" expands to "IMAFD_Zicsr_Zifencei".
Both "Zicsr" and "Zifencei" are enabled by default and "G" is supposed to
be (virtually) enabled as well, it should be safe to change its expansion.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1fb76b4295..4ca6a8623f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -598,13 +598,16 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
 cpu->cfg.ext_a && cpu->cfg.ext_f &&
-cpu->cfg.ext_d)) {
-warn_report("Setting G will also set IMAFD");
+cpu->cfg.ext_d &&
+cpu->cfg.ext_icsr && cpu->cfg.ext_ifencei)) {
+warn_report("Setting G will also set IMAFD_Zicsr_Zifencei");
 cpu->cfg.ext_i = true;
 cpu->cfg.ext_m = true;
 cpu->cfg.ext_a = true;
 cpu->cfg.ext_f = true;
 cpu->cfg.ext_d = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.ext_ifencei = true;
 }
 
 if (cpu->cfg.ext_zdinx || cpu->cfg.ext_zhinx ||
-- 
2.35.3

[PULL 15/23] target/riscv: check 'I' and 'E' after checking 'G' in riscv_cpu_realize

2022-05-24 Thread Alistair Francis

From: Weiwei Li 

 - setting ext_g will implicitly set ext_i

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
Message-Id: <20220518012611.6772-1-liwei...@iscas.ac.cn>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 00a068668f..87e1eddce6 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -584,18 +584,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 uint32_t ext = 0;
 
 /* Do some ISA extension error checking */
-if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
-error_setg(errp,
-   "I and E extensions are incompatible");
-return;
-}
-
-if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
-error_setg(errp,
-   "Either I or E extension must be set");
-return;
-}
-
 if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
 cpu->cfg.ext_a && cpu->cfg.ext_f &&
 cpu->cfg.ext_d &&
@@ -610,6 +598,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 cpu->cfg.ext_ifencei = true;
 }
 
+if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
+error_setg(errp,
+   "I and E extensions are incompatible");
+return;
+}
+
+if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
+error_setg(errp,
+   "Either I or E extension must be set");
+return;
+}
+
 if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) {
 error_setg(errp, "F extension requires Zicsr");
 return;
-- 
2.35.3

[PULL 06/23] hw/riscv: Make CPU config error handling generous (virt/spike)

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

If specified CPU configuration is not valid, not just it prints error
message, it aborts and generates core dumps (depends on the operating
system).  This kind of error handling should be used only when a serious
runtime error occurs.

This commit makes error handling on CPU configuration more generous on
virt/spike machines.  It now just prints error message and quits (without
coredumps and aborts).

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 

Signed-off-by: Alistair Francis 
---
 hw/riscv/spike.c | 2 +-
 hw/riscv/virt.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 068ba3493e..e41b6aa9f0 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -230,7 +230,7 @@ static void spike_board_init(MachineState *machine)
 base_hartid, &error_abort);
 object_property_set_int(OBJECT(&s->soc[i]), "num-harts",
 hart_count, &error_abort);
-sysbus_realize(SYS_BUS_DEVICE(&s->soc[i]), &error_abort);
+sysbus_realize(SYS_BUS_DEVICE(&s->soc[i]), &error_fatal);
 
 /* Core Local Interruptor (timer and IPI) for each socket */
 riscv_aclint_swi_create(
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 3326f4db96..244d6408b5 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1351,7 +1351,7 @@ static void virt_machine_init(MachineState *machine)
 base_hartid, &error_abort);
 object_property_set_int(OBJECT(&s->soc[i]), "num-harts",
 hart_count, &error_abort);
-sysbus_realize(SYS_BUS_DEVICE(&s->soc[i]), &error_abort);
+sysbus_realize(SYS_BUS_DEVICE(&s->soc[i]), &error_fatal);
 
 if (!kvm_enabled()) {
 if (s->have_aclint) {
-- 
2.35.3

[PULL 07/23] hw/riscv: Make CPU config error handling generous (sifive_e/u/opentitan)

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

If specified CPU configuration is not valid, not just it prints error
message, it aborts and generates core dumps (depends on the operating
system).  This kind of error handling should be used only when a serious
runtime error occurs.

This commit makes error handling on CPU configuration more generous on
sifive_e/u and opentitan machines.  It now just prints error message and
quits (without coredumps and aborts).

This is separate from spike/virt because it involves different type
(TYPE_RISCV_HART_ARRAY) on sifive_e/u and opentitan machines.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 
<09e61e58a7543da44bdb0e0f5368afc8903b4aa6.1652509778.git.research_tra...@irq.a4lg.com>
Signed-off-by: Alistair Francis 
---
 hw/riscv/opentitan.c | 2 +-
 hw/riscv/sifive_e.c  | 2 +-
 hw/riscv/sifive_u.c  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index 2d401dcb23..4495a2c039 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -142,7 +142,7 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, 
Error **errp)
 object_property_set_int(OBJECT(&s->cpus), "num-harts", ms->smp.cpus,
 &error_abort);
 object_property_set_int(OBJECT(&s->cpus), "resetvec", 0x8080, 
&error_abort);
-sysbus_realize(SYS_BUS_DEVICE(&s->cpus), &error_abort);
+sysbus_realize(SYS_BUS_DEVICE(&s->cpus), &error_fatal);
 
 /* Boot ROM */
 memory_region_init_rom(&s->rom, OBJECT(dev_soc), "riscv.lowrisc.ibex.rom",
diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index dcb87b6cfd..d65d2fd869 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -195,7 +195,7 @@ static void sifive_e_soc_realize(DeviceState *dev, Error 
**errp)
 
 object_property_set_str(OBJECT(&s->cpus), "cpu-type", ms->cpu_type,
 &error_abort);
-sysbus_realize(SYS_BUS_DEVICE(&s->cpus), &error_abort);
+sysbus_realize(SYS_BUS_DEVICE(&s->cpus), &error_fatal);
 
 /* Mask ROM */
 memory_region_init_rom(&s->mask_rom, OBJECT(dev), "riscv.sifive.e.mrom",
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index cc8c7637cb..a2495b5ae7 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -830,8 +830,8 @@ static void sifive_u_soc_realize(DeviceState *dev, Error 
**errp)
 qdev_prop_set_string(DEVICE(&s->u_cpus), "cpu-type", s->cpu_type);
 qdev_prop_set_uint64(DEVICE(&s->u_cpus), "resetvec", 0x1004);
 
-sysbus_realize(SYS_BUS_DEVICE(&s->e_cpus), &error_abort);
-sysbus_realize(SYS_BUS_DEVICE(&s->u_cpus), &error_abort);
+sysbus_realize(SYS_BUS_DEVICE(&s->e_cpus), &error_fatal);
+sysbus_realize(SYS_BUS_DEVICE(&s->u_cpus), &error_fatal);
 /*
  * The cluster must be realized after the RISC-V hart array container,
  * as the container's CPU object is only created on realize, and the
-- 
2.35.3

[PULL 14/23] hw/riscv/sifive_u: Resolve redundant property accessors

2022-05-24 Thread Alistair Francis

From: Bernhard Beschow 

The QOM API already provides accessors for uint32 values, so reuse them.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Message-Id: <20220301225220.239065-3-shen...@gmail.com>
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_u.c | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index a2495b5ae7..e4c814a3ea 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -713,36 +713,20 @@ static void sifive_u_machine_set_start_in_flash(Object 
*obj, bool value, Error *
 s->start_in_flash = value;
 }
 
-static void sifive_u_machine_get_uint32_prop(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
-{
-visit_type_uint32(v, name, (uint32_t *)opaque, errp);
-}
-
-static void sifive_u_machine_set_uint32_prop(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
-{
-visit_type_uint32(v, name, (uint32_t *)opaque, errp);
-}
-
 static void sifive_u_machine_instance_init(Object *obj)
 {
 SiFiveUState *s = RISCV_U_MACHINE(obj);
 
 s->start_in_flash = false;
 s->msel = 0;
-object_property_add(obj, "msel", "uint32",
-sifive_u_machine_get_uint32_prop,
-sifive_u_machine_set_uint32_prop, NULL, &s->msel);
+object_property_add_uint32_ptr(obj, "msel", &s->msel,
+   OBJ_PROP_FLAG_READWRITE);
 object_property_set_description(obj, "msel",
 "Mode Select (MSEL[3:0]) pin state");
 
 s->serial = OTP_SERIAL;
-object_property_add(obj, "serial", "uint32",
-sifive_u_machine_get_uint32_prop,
-sifive_u_machine_set_uint32_prop, NULL, &s->serial);
+object_property_add_uint32_ptr(obj, "serial", &s->serial,
+   OBJ_PROP_FLAG_READWRITE);
 object_property_set_description(obj, "serial", "Board serial number");
 }
 
-- 
2.35.3

[PULL 05/23] target/riscv: Add short-isa-string option

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

Because some operating systems don't correctly parse long ISA extension
string, this commit adds short-isa-string boolean option to disable
generating long ISA extension strings on Device Tree.

For instance, enabling Zfinx and Zdinx extensions and booting Linux (5.17 or
earlier) with FPU support caused a kernel panic.

Operating Systems which short-isa-string might be helpful:

1.  Linux (5.17 or earlier)
2.  FreeBSD (at least 14.0-CURRENT)
3.  OpenBSD (at least current development version)

Signed-off-by: Tsukasa OI 
Acked-by: Alistair Francis 
Message-Id: 
<7c1fe5f06b0a7646a47e9bcdddb1042bb60c69c8.1652181972.git.research_tra...@irq.a4lg.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h | 2 ++
 target/riscv/cpu.c | 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index fe6c9a2c92..f5ff7294c6 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -425,6 +425,8 @@ struct RISCVCPUConfig {
 bool aia;
 bool debug;
 uint64_t resetvec;
+
+bool short_isa_string;
 };
 
 typedef struct RISCVCPUConfig RISCVCPUConfig;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9f38e56316..dc93412395 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -879,6 +879,8 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
 
 DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
+
+DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, cfg.short_isa_string, 
false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1049,7 +1051,9 @@ char *riscv_isa_string(RISCVCPU *cpu)
 }
 }
 *p = '\0';
-riscv_isa_string_ext(cpu, &isa_str, maxlen);
+if (!cpu->cfg.short_isa_string) {
+riscv_isa_string_ext(cpu, &isa_str, maxlen);
+}
 return isa_str;
 }
 
-- 
2.35.3

[PULL 04/23] target/riscv: Move Zhinx* extensions on ISA string

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

This commit moves ISA string conversion for Zhinx and Zhinxmin extensions.
Because extension category ordering of "H" is going to be after "V",
their ordering is going to be valid (on canonical order).

Signed-off-by: Tsukasa OI 
Acked-by: Alistair Francis 
Message-Id: 
<7a988aedb249b6709f9ce5464ff359b60958ca54.1652181972.git.research_tra...@irq.a4lg.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ccacdee215..9f38e56316 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -999,8 +999,6 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str, int max_str_len)
 ISA_EDATA_ENTRY(zfh, ext_zfh),
 ISA_EDATA_ENTRY(zfhmin, ext_zfhmin),
 ISA_EDATA_ENTRY(zfinx, ext_zfinx),
-ISA_EDATA_ENTRY(zhinx, ext_zhinx),
-ISA_EDATA_ENTRY(zhinxmin, ext_zhinxmin),
 ISA_EDATA_ENTRY(zdinx, ext_zdinx),
 ISA_EDATA_ENTRY(zba, ext_zba),
 ISA_EDATA_ENTRY(zbb, ext_zbb),
@@ -1021,6 +1019,8 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
**isa_str, int max_str_len)
 ISA_EDATA_ENTRY(zkt, ext_zkt),
 ISA_EDATA_ENTRY(zve32f, ext_zve32f),
 ISA_EDATA_ENTRY(zve64f, ext_zve64f),
+ISA_EDATA_ENTRY(zhinx, ext_zhinx),
+ISA_EDATA_ENTRY(zhinxmin, ext_zhinxmin),
 ISA_EDATA_ENTRY(svinval, ext_svinval),
 ISA_EDATA_ENTRY(svnapot, ext_svnapot),
 ISA_EDATA_ENTRY(svpbmt, ext_svpbmt),
-- 
2.35.3

[PULL 19/23] target/riscv: Set [m|s]tval for both illegal and virtual instruction traps

2022-05-24 Thread Alistair Francis

From: Anup Patel 

Currently, the [m|s]tval CSRs are set with trapping instruction encoding
only for illegal instruction traps taken at the time of instruction
decoding.

In RISC-V world, a valid instructions might also trap as illegal or
virtual instruction based to trapping bits in various CSRs (such as
mstatus.TVM or hstatus.VTVM).

We improve setting of [m|s]tval CSRs for all types of illegal and
virtual instruction traps.

Signed-off-by: Anup Patel 
Reviewed-by: Frank Chang 
Reviewed-by: Alistair Francis 
Message-Id: <20220511144528.393530-4-apa...@ventanamicro.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  8 +++-
 target/riscv/cpu.c|  2 ++
 target/riscv/cpu_helper.c |  1 +
 target/riscv/translate.c  | 17 +
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 44975e3e5a..f08c3e8813 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -30,6 +30,12 @@
 
 #define TCG_GUEST_DEFAULT_MO 0
 
+/*
+ * RISC-V-specific extra insn start words:
+ * 1: Original instruction opcode
+ */
+#define TARGET_INSN_START_EXTRA_WORDS 1
+
 #define TYPE_RISCV_CPU "riscv-cpu"
 
 #define RISCV_CPU_TYPE_SUFFIX "-" TYPE_RISCV_CPU
@@ -140,7 +146,7 @@ struct CPUArchState {
 target_ulong frm;
 
 target_ulong badaddr;
-uint32_t bins;
+target_ulong bins;
 
 target_ulong guest_phys_fault_addr;
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fe8ceb4133..ce1c257eef 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -406,6 +406,7 @@ void restore_state_to_opc(CPURISCVState *env, 
TranslationBlock *tb,
 } else {
 env->pc = data[0];
 }
+env->bins = data[1];
 }
 
 static void riscv_cpu_reset(DeviceState *dev)
@@ -445,6 +446,7 @@ static void riscv_cpu_reset(DeviceState *dev)
 env->mcause = 0;
 env->miclaim = MIP_SGEIP;
 env->pc = env->resetvec;
+env->bins = 0;
 env->two_stage_lookup = false;
 
 /* Initialized default priorities of local interrupts. */
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b16bfe0182..d99fac9d2d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1371,6 +1371,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 tval = env->badaddr;
 break;
 case RISCV_EXCP_ILLEGAL_INST:
+case RISCV_EXCP_VIRT_INSTRUCTION_FAULT:
 tval = env->bins;
 break;
 default:
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0cd1d9ee94..55a4713af2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -107,6 +107,8 @@ typedef struct DisasContext {
 /* PointerMasking extension */
 bool pm_mask_enabled;
 bool pm_base_enabled;
+/* TCG of the current insn_start */
+TCGOp *insn_start;
 } DisasContext;
 
 static inline bool has_ext(DisasContext *ctx, uint32_t ext)
@@ -236,9 +238,6 @@ static void generate_exception_mtval(DisasContext *ctx, int 
excp)
 
 static void gen_exception_illegal(DisasContext *ctx)
 {
-tcg_gen_st_i32(tcg_constant_i32(ctx->opcode), cpu_env,
-   offsetof(CPURISCVState, bins));
-
 generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
 }
 
@@ -1017,6 +1016,13 @@ static uint32_t opcode_at(DisasContextBase *dcbase, 
target_ulong pc)
 /* Include decoders for factored-out extensions */
 #include "decode-XVentanaCondOps.c.inc"
 
+static inline void decode_save_opc(DisasContext *ctx, target_ulong opc)
+{
+assert(ctx->insn_start != NULL);
+tcg_set_insn_start_param(ctx->insn_start, 1, opc);
+ctx->insn_start = NULL;
+}
+
 static void decode_opc(CPURISCVState *env, DisasContext *ctx, uint16_t opcode)
 {
 /*
@@ -1033,6 +1039,7 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
 
 /* Check for compressed insn */
 if (extract16(opcode, 0, 2) != 3) {
+decode_save_opc(ctx, opcode);
 if (!has_ext(ctx, RVC)) {
 gen_exception_illegal(ctx);
 } else {
@@ -1047,6 +1054,7 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
 opcode32 = deposit32(opcode32, 16, 16,
  translator_lduw(env, &ctx->base,
  ctx->base.pc_next + 2));
+decode_save_opc(ctx, opcode32);
 ctx->opcode = opcode32;
 ctx->pc_succ_insn = ctx->base.pc_next + 4;
 
@@ -1113,7 +1121,8 @@ static void riscv_tr_insn_start(DisasContextBase *dcbase, 
CPUState *cpu)
 {
 DisasContext *ctx = container_of(dcbase, DisasContext, base);
 
-tcg_gen_insn_start(ctx->base.pc_next);
+tcg_gen_insn_start(ctx->base.pc_next, 0);
+ctx->insn_start = tcg_last_op();
 }
 
 static void riscv_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
-- 
2.35.3

[PULL 12/23] target/riscv: Move/refactor ISA extension checks

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

We should separate "check" and "configure" steps as possible.
This commit separates both steps except vector/Zfinx-related checks.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index b960473f7d..00a068668f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -630,14 +630,27 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if ((cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f) && !cpu->cfg.ext_f) {
+error_setg(errp, "Zve32f/Zve64f extensions require F extension");
+return;
+}
+
+/* Set the ISA extensions, checks should have happened above */
 if (cpu->cfg.ext_zdinx || cpu->cfg.ext_zhinx ||
 cpu->cfg.ext_zhinxmin) {
 cpu->cfg.ext_zfinx = true;
 }
 
-if (cpu->cfg.ext_zfinx && !cpu->cfg.ext_icsr) {
-error_setg(errp, "Zfinx extension requires Zicsr");
-return;
+if (cpu->cfg.ext_zfinx) {
+if (!cpu->cfg.ext_icsr) {
+error_setg(errp, "Zfinx extension requires Zicsr");
+return;
+}
+if (cpu->cfg.ext_f) {
+error_setg(errp,
+"Zfinx cannot be supported together with F extension");
+return;
+}
 }
 
 if (cpu->cfg.ext_zk) {
@@ -663,7 +676,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 cpu->cfg.ext_zksh = true;
 }
 
-/* Set the ISA extensions, checks should have happened above */
 if (cpu->cfg.ext_i) {
 ext |= RVI;
 }
@@ -734,20 +746,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 set_vext_version(env, vext_version);
 }
-if ((cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f) && !cpu->cfg.ext_f) {
-error_setg(errp, "Zve32f/Zve64f extension depends upon RVF.");
-return;
-}
 if (cpu->cfg.ext_j) {
 ext |= RVJ;
 }
-if (cpu->cfg.ext_zfinx && ((ext & (RVF | RVD)) || cpu->cfg.ext_zfh ||
-   cpu->cfg.ext_zfhmin)) {
-error_setg(errp,
-"'Zfinx' cannot be supported together with 'F', 'D', 
'Zfh',"
-" 'Zfhmin'");
-return;
-}
 
 set_misa(env, env->misa_mxl, ext);
 }
-- 
2.35.3

[PULL 02/23] target/riscv: rvv: Fix early exit condition for whole register load/store

2022-05-24 Thread Alistair Francis

From: eopXD 

Vector whole register load instructions have EEW encoded in the opcode,
so we shouldn't take SEW here. Vector whole register store instructions
are always EEW=8.

Signed-off-by: eop Chen 
Reviewed-by: Frank Chang 
Acked-by: Alistair Francis 
Message-Id: <165181414065.18540.1482812505333459992...@git.sr.ht>
Signed-off-by: Alistair Francis 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 58 +
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 90327509f7..391c61fe93 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1118,10 +1118,10 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, 
ld_us_check)
 typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
 
 static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
- gen_helper_ldst_whole *fn, DisasContext *s,
- bool is_store)
+ uint32_t width, gen_helper_ldst_whole *fn,
+ DisasContext *s, bool is_store)
 {
-uint32_t evl = (s->cfg_ptr->vlen / 8) * nf / (1 << s->sew);
+uint32_t evl = (s->cfg_ptr->vlen / 8) * nf / width;
 TCGLabel *over = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
 
@@ -1153,38 +1153,42 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
  * load and store whole register instructions ignore vtype and vl setting.
  * Thus, we don't need to check vill bit. (Section 7.9)
  */
-#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, IS_STORE)  \
+#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, WIDTH, IS_STORE)   \
 static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
 { \
 if (require_rvv(s) && \
 QEMU_IS_ALIGNED(a->rd, ARG_NF)) { \
-return ldst_whole_trans(a->rd, a->rs1, ARG_NF, gen_helper_##NAME, \
-s, IS_STORE); \
+return ldst_whole_trans(a->rd, a->rs1, ARG_NF, WIDTH, \
+gen_helper_##NAME, s, IS_STORE);  \
 } \
 return false; \
 }
 
-GEN_LDST_WHOLE_TRANS(vl1re8_v,  1, false)
-GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, false)
-GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, false)
-GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, false)
-GEN_LDST_WHOLE_TRANS(vl2re8_v,  2, false)
-GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, false)
-GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, false)
-GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, false)
-GEN_LDST_WHOLE_TRANS(vl4re8_v,  4, false)
-GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, false)
-GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, false)
-GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, false)
-GEN_LDST_WHOLE_TRANS(vl8re8_v,  8, false)
-GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, false)
-GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, false)
-GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, false)
-
-GEN_LDST_WHOLE_TRANS(vs1r_v, 1, true)
-GEN_LDST_WHOLE_TRANS(vs2r_v, 2, true)
-GEN_LDST_WHOLE_TRANS(vs4r_v, 4, true)
-GEN_LDST_WHOLE_TRANS(vs8r_v, 8, true)
+GEN_LDST_WHOLE_TRANS(vl1re8_v,  1, 1, false)
+GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, 2, false)
+GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, 4, false)
+GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, 8, false)
+GEN_LDST_WHOLE_TRANS(vl2re8_v,  2, 1, false)
+GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, 2, false)
+GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, 4, false)
+GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, 8, false)
+GEN_LDST_WHOLE_TRANS(vl4re8_v,  4, 1, false)
+GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, 2, false)
+GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, 4, false)
+GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, 8, false)
+GEN_LDST_WHOLE_TRANS(vl8re8_v,  8, 1, false)
+GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, 2, false)
+GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, 4, false)
+GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, 8, false)
+
+/*
+ * The vector whole register store instructions are encoded similar to
+ * unmasked unit-stride store of elements with EEW=8.
+ */
+GEN_LDST_WHOLE_TRANS(vs1r_v, 1, 1, true)
+GEN_LDST_WHOLE_TRANS(vs2r_v, 2, 1, true)
+GEN_LDST_WHOLE_TRANS(vs4r_v, 4, 1, true)
+GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1, true)
 
 /*
  *** Vector Integer Arithmetic Instructions
-- 
2.35.3

[PULL 03/23] hw/intc: Pass correct hartid while updating mtimecmp

2022-05-24 Thread Alistair Francis

From: Atish Patra 

timecmp update function should be invoked with hartid for which
timecmp is being updated. The following patch passes the incorrect
hartid to the update function.

Fixes: e2f01f3c2e13 ("hw/intc: Make RISC-V ACLINT mtime MMIO register writable")

Signed-off-by: Atish Patra 
Reviewed-by: Frank Chang 
Reviewed-by: Anup Patel 
Reviewed-by: Alistair Francis 
Message-Id: <20220513221458.1192933-1-ati...@rivosinc.com>
Signed-off-by: Alistair Francis 
---
 hw/intc/riscv_aclint.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
index 0412edc982..e6bceceefd 100644
--- a/hw/intc/riscv_aclint.c
+++ b/hw/intc/riscv_aclint.c
@@ -233,7 +233,8 @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr 
addr,
 continue;
 }
 riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu),
-  i, env->timecmp);
+  mtimer->hartid_base + i,
+  env->timecmp);
 }
 return;
 }
-- 
2.35.3

[PULL 18/23] target/riscv: Fix hstatus.GVA bit setting for traps taken from HS-mode

2022-05-24 Thread Alistair Francis

From: Anup Patel 

Currently, QEMU does not set hstatus.GVA bit for traps taken from
HS-mode into HS-mode which breaks the Xvisor nested MMU test suite
on QEMU. This was working previously.

This patch updates riscv_cpu_do_interrupt() to fix the above issue.

Fixes: 86d0c457396b ("target/riscv: Fixup setting GVA")
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Message-Id: <20220511144528.393530-3-apa...@ventanamicro.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index e1aa4f2097..b16bfe0182 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1367,7 +1367,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 case RISCV_EXCP_INST_PAGE_FAULT:
 case RISCV_EXCP_LOAD_PAGE_FAULT:
 case RISCV_EXCP_STORE_PAGE_FAULT:
-write_gva = true;
+write_gva = env->two_stage_lookup;
 tval = env->badaddr;
 break;
 case RISCV_EXCP_ILLEGAL_INST:
@@ -1434,7 +1434,6 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 /* Trap into HS mode */
 env->hstatus = set_field(env->hstatus, HSTATUS_SPV, false);
 htval = env->guest_phys_fault_addr;
-write_gva = false;
 }
 env->hstatus = set_field(env->hstatus, HSTATUS_GVA, write_gva);
 }
-- 
2.35.3

[PULL 09/23] target/riscv: Disable "G" by default

2022-05-24 Thread Alistair Francis

From: Tsukasa OI 

Because "G" virtual extension expands to "IMAFD", we cannot separately
disable extensions like "F" or "D" without disabling "G".  Because all
"IMAFD" are enabled by default, it's harmless to disable "G" by default.

Signed-off-by: Tsukasa OI 
Reviewed-by: Alistair Francis 
Message-Id: 

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e439716337..1fb76b4295 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -812,7 +812,7 @@ static Property riscv_cpu_properties[] = {
 /* Defaults for standard extensions */
 DEFINE_PROP_BOOL("i", RISCVCPU, cfg.ext_i, true),
 DEFINE_PROP_BOOL("e", RISCVCPU, cfg.ext_e, false),
-DEFINE_PROP_BOOL("g", RISCVCPU, cfg.ext_g, true),
+DEFINE_PROP_BOOL("g", RISCVCPU, cfg.ext_g, false),
 DEFINE_PROP_BOOL("m", RISCVCPU, cfg.ext_m, true),
 DEFINE_PROP_BOOL("a", RISCVCPU, cfg.ext_a, true),
 DEFINE_PROP_BOOL("f", RISCVCPU, cfg.ext_f, true),
-- 
2.35.3

[PULL 01/23] target/riscv: Fix VS mode hypervisor CSR access

2022-05-24 Thread Alistair Francis

From: Dylan Reid 

VS mode access to hypervisor CSRs should generate virtual, not illegal,
instruction exceptions.

Don't return early and indicate an illegal instruction exception when
accessing a hypervisor CSR from VS mode. Instead, fall through to the
`hmode` predicate to return the correct virtual instruction exception.

Signed-off-by: Dylan Reid 
Reviewed-by: Alistair Francis 
Message-Id: <20220506165456.297058-1-dgr...@rivosinc.com>
Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 3500e07f92..4ea7df02c9 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3141,13 +3141,13 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
 #if !defined(CONFIG_USER_ONLY)
 int effective_priv = env->priv;
 
-if (riscv_has_ext(env, RVH) &&
-env->priv == PRV_S &&
-!riscv_cpu_virt_enabled(env)) {
+if (riscv_has_ext(env, RVH) && env->priv == PRV_S) {
 /*
- * We are in S mode without virtualisation, therefore we are in HS 
Mode.
+ * We are in either HS or VS mode.
  * Add 1 to the effective privledge level to allow us to access the
- * Hypervisor CSRs.
+ * Hypervisor CSRs. The `hmode` predicate will determine if access
+ * should be allowed(HS) or if a virtual instruction exception should 
be
+ * raised(VS).
  */
 effective_priv++;
 }
-- 
2.35.3

[PULL 00/23] riscv-to-apply queue

2022-05-24 Thread Alistair Francis

From: Alistair Francis 

The following changes since commit 3757b0d08b399c609954cf57f273b1167e5d7a8d:

  Merge tag 'pull-request-2022-05-18' of https://gitlab.com/thuth/qemu into 
staging (2022-05-20 08:04:30 -0700)

are available in the Git repository at:

  g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20220525

for you to fetch changes up to 8fe63fe8e512d77583d6798acd2164f1fa1e40ab:

  hw/core: loader: Set is_linux to true for VxWorks uImage (2022-05-24 10:38:50 
+1000)


Third RISC-V PR for QEMU 7.1

 * Fixes for accessing VS hypervisor CSRs
 * Improvements for RISC-V Vector extension
 * Fixes for accessing mtimecmp
 * Add new short-isa-string CPU option
 * Improvements to RISC-V machine error handling
 * Disable the "G" extension by default internally, no functional change
 * Enforce floating point extension requirements
 * Cleanup ISA extension checks
 * Resolve redundant property accessors
 * Fix typo of mimpid cpu option
 * Improvements for virtulisation
 * Add zicsr/zifencei to isa_string
 * Support for VxWorks uImage


Anup Patel (4):
  target/riscv: Fix csr number based privilege checking
  target/riscv: Fix hstatus.GVA bit setting for traps taken from HS-mode
  target/riscv: Set [m|s]tval for both illegal and virtual instruction traps
  hw/riscv: virt: Fix interrupt parent for dynamic platform devices

Atish Patra (1):
  hw/intc: Pass correct hartid while updating mtimecmp

Bernhard Beschow (2):
  hw/vfio/pci-quirks: Resolve redundant property getters
  hw/riscv/sifive_u: Resolve redundant property accessors

Bin Meng (2):
  hw/core: Sync uboot_image.h from U-Boot v2022.01
  hw/core: loader: Set is_linux to true for VxWorks uImage

Dylan Reid (1):
  target/riscv: Fix VS mode hypervisor CSR access

Frank Chang (1):
  target/riscv: Fix typo of mimpid cpu option

Hongren (Zenithal) Zheng (1):
  target/riscv: add zicsr/zifencei to isa_string

Tsukasa OI (9):
  target/riscv: Move Zhinx* extensions on ISA string
  target/riscv: Add short-isa-string option
  hw/riscv: Make CPU config error handling generous (virt/spike)
  hw/riscv: Make CPU config error handling generous (sifive_e/u/opentitan)
  target/riscv: Fix coding style on "G" expansion
  target/riscv: Disable "G" by default
  target/riscv: Change "G" expansion
  target/riscv: FP extension requirements
  target/riscv: Move/refactor ISA extension checks

Weiwei Li (1):
  target/riscv: check 'I' and 'E' after checking 'G' in riscv_cpu_realize

eopXD (1):
  target/riscv: rvv: Fix early exit condition for whole register load/store

 hw/core/uboot_image.h   | 213 +---
 target/riscv/cpu.h  |  12 +-
 hw/core/loader.c|  15 +++
 hw/intc/riscv_aclint.c  |   3 +-
 hw/riscv/opentitan.c|   2 +-
 hw/riscv/sifive_e.c |   2 +-
 hw/riscv/sifive_u.c |  28 +
 hw/riscv/spike.c|   2 +-
 hw/riscv/virt.c |  27 ++--
 hw/vfio/pci-quirks.c|  34 ++---
 target/riscv/cpu.c  |  91 ++
 target/riscv/cpu_helper.c   |   4 +-
 target/riscv/csr.c  |  26 ++--
 target/riscv/translate.c|  17 ++-
 target/riscv/insn_trans/trans_rvv.c.inc |  58 +
 15 files changed, 325 insertions(+), 209 deletions(-)

Re: [PATCH v5 00/43] Add LoongArch softmmu support

2022-05-24 Thread Richard Henderson


On 5/24/22 15:32, Richard Henderson wrote:

When the syntax errors are fixed, it does not pass "make check".


When I configure with --enable-debug --enable-sanitizers I get

$ QTEST_QEMU_BINARY='./qemu-system-loongarch64' 
./tests/qtest/device-introspect-test -v
...
# Testing device 'loongarch_ipi'

=

==911066==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61393550 at pc 
0x7f97cb425c23 bp 0x7ffe6583f4f0 sp 0x7ffe6583ec98


WRITE of size 8 at 0x61393550 thread T0

#0 0x7f97cb425c22 in __interceptor_memset 
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799


#1 0x562b21b23916 in qdev_init_gpio_out_named ../qemu/hw/core/gpio.c:85

#2 0x562b21b23b89 in qdev_init_gpio_out ../qemu/hw/core/gpio.c:101

#3 0x562b22562d77 in loongarch_ipi_init ../qemu/hw/intc/loongarch_ipi.c:187

#4 0x562b22992ef0 in object_init_with_type ../qemu/qom/object.c:377

#5 0x562b2299445f in object_initialize_with_type ../qemu/qom/object.c:519

#6 0x562b22995b54 in object_new_with_type ../qemu/qom/object.c:734

#7 0x562b22995c6d in object_new ../qemu/qom/object.c:749

#8 0x562b22ddc1d3 in qmp_device_list_properties 
../qemu/qom/qom-qmp-cmds.c:146

#9 0x562b22f4ad2c in qmp_marshal_device_list_properties 
qapi/qapi-commands-qdev.c:66

#10 0x562b22fa7ab6 in do_qmp_dispatch_bh ../qemu/qapi/qmp-dispatch.c:128

#11 0x562b230354b1 in aio_bh_call ../qemu/util/async.c:142

#12 0x562b23035c09 in aio_bh_poll ../qemu/util/async.c:170

#13 0x562b22fd6531 in aio_dispatch ../qemu/util/aio-posix.c:421

#14 0x562b2303714c in aio_ctx_dispatch ../qemu/util/async.c:312

#15 0x7f97caafdd1a in g_main_dispatch ../../../glib/gmain.c:3417

#16 0x7f97caafdd1a in g_main_context_dispatch ../../../glib/gmain.c:4135

#17 0x562b23089479 in glib_pollfds_poll ../qemu/util/main-loop.c:297

#18 0x562b23089663 in os_host_main_loop_wait ../qemu/util/main-loop.c:320

#19 0x562b23089968 in main_loop_wait ../qemu/util/main-loop.c:596

#20 0x562b2223edf5 in qemu_main_loop ../qemu/softmmu/runstate.c:726

#21 0x562b21965c69 in qemu_main ../qemu/softmmu/main.c:36

#22 0x562b21965c9e in main ../qemu/softmmu/main.c:45

#23 0x7f97c9354d8f in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58

#24 0x7f97c9354e3f in __libc_start_main_impl ../csu/libc-start.c:392

#25 0x562b21965b74 in _start 
(/home/rth/chroot-home/bld-x/qemu-system-loongarch64+0x21b0b74)




0x61393550 is located 48 bytes to the left of 376-byte region 
[0x61393580,0x613936f8)


allocated by thread T0 here:

#0 0x7f97cb4a0a37 in __interceptor_calloc 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154


#1 0x7f97cab06c40 in g_malloc0 ../../../glib/gmem.c:155

#2 0x562b2298fef0 in type_register_internal ../qemu/qom/object.c:143

#3 0x562b2298ffcd in type_register ../qemu/qom/object.c:152

#4 0x562b2199c281 in qemu_console_early_init ../qemu/ui/console.c:2719

#5 0x562b2224d16e in qemu_create_early_backends ../qemu/softmmu/vl.c:1975

#6 0x562b222565ef in qemu_init ../qemu/softmmu/vl.c:3674

#7 0x562b21965c64 in qemu_main ../qemu/softmmu/main.c:35

#8 0x562b21965c9e in main ../qemu/softmmu/main.c:45

#9 0x7f97c9354d8f in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58



SUMMARY: AddressSanitizer: heap-buffer-overflow 
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 in 
__interceptor_memset


Shadow bytes around the buggy address:

  0x0c268000a650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa

  0x0c268000a670: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00

  0x0c268000a680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

=>0x0c268000a6a0: 00 00 00 00 fa fa fa fa fa fa[fa]fa fa fa fa fa

  0x0c268000a6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  0x0c268000a6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa

  0x0c268000a6e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00

  0x0c268000a6f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Shadow byte legend (one shadow byte represents 8 application bytes):

  Addressable:   00

  Partially addressable: 01 02 03 04 05 06 07

  Heap left redzone:   fa

  Freed heap region:   fd

  Stack left redzone:  f1

  Stack mid redzone:   f2

  Stack right redzone: f3

  Stack after return:  f5

  Stack use after scope:   f8

  Global redzone:  f9

  Global init order:   f6

  Poisoned by user:f7

  Container overflow:  fc

  Array cookie:ac

  Intra object redzone:bb

  ASan internal:   fe

  Left alloca redzone: ca

  Right alloca redzo

Re: [PATCH v5 00/43] Add LoongArch softmmu support

2022-05-24 Thread Richard Henderson


On 5/24/22 01:17, Xiaojuan Yang wrote:

Hi All,

As this series only supports running binary files in ELF format, and
does not depend on BIOS and kernel file. so this series are changed from RFC to 
patch vX.


The manual:
   - https://github.com/loongson/LoongArch-Documentation/releases/tag/2022.03.17

Old series:
   - https://patchew.org/QEMU/20220328125749.2918087-1-yangxiaoj...@loongson.cn/
   - https://patchew.org/QEMU/20220106094200.1801206-1-gaos...@loongson.cn/

Need review patches:
   - 0034-hw-intc-Add-LoongArch-extioi-interrupt-controller-EI.patch
   - 0038-hw-loongarch-Add-LoongArch-ls7a-rtc-device-support.patch

This patch need ACPI maintainers review:
   - 0040-hw-loongarch-Add-LoongArch-ls7a-acpi-device-support.patch
 


Thanks.
Xiaojuan

-
v5:
   - Fixed loongarch extioi device emulation.
   - Fixed loongarch rtc device emulation.
   - Fixed 'make docker-test-build' error.


I had been tempted to accept the patch set as is, and let subsequent development happen on 
mainline, but this patch set does not compile, with obvious syntax errors.


When the syntax errors are fixed, it does not pass "make check".

How can you have tested this?


r~

Re: [PATCH v2 0/8] QEMU RISC-V nested virtualization fixes

2022-05-24 Thread Alistair Francis

On Thu, May 12, 2022 at 12:47 AM Anup Patel  wrote:
>
> This series does fixes and improvements to have nested virtualization
> on QEMU RISC-V.
>
> These patches can also be found in riscv_nested_fixes_v2 branch at:
> https://github.com/avpatel/qemu.git
>
> The RISC-V nested virtualization was tested on QEMU RISC-V using
> Xvisor RISC-V which has required hypervisor support to run another
> hypervisor as Guest/VM.
>
> Changes since v1:
>  - Set write_gva to env->two_stage_lookup which ensures that for
>HS-mode to HS-mode trap write_gva is true only for HLV/HSV
>instructions
>  - Included "[PATCH 0/3] QEMU RISC-V priv spec version fixes"
>patches in this series for easy review
>  - Re-worked PATCH7 to force disable extensions if required
>priv spec version is not staisfied
>  - Added new PATCH8 to fix "aia=aplic-imsic" mode of virt machine
>
> Anup Patel (8):
>   target/riscv: Fix csr number based privilege checking
>   target/riscv: Fix hstatus.GVA bit setting for traps taken from HS-mode
>   target/riscv: Set [m|s]tval for both illegal and virtual instruction
> traps
>   target/riscv: Update [m|h]tinst CSR in riscv_cpu_do_interrupt()
>   target/riscv: Don't force update priv spec version to latest
>   target/riscv: Add dummy mcountinhibit CSR for priv spec v1.11 or
> higher
>   target/riscv: Force disable extensions if priv spec version does not
> match
>   hw/riscv: virt: Fix interrupt parent for dynamic platform devices

Thanks!

I have applied some of these patches to riscv-to-apply.next

Alistair

>
>  hw/riscv/virt.c   |  25 +++---
>  target/riscv/cpu.c|  46 +-
>  target/riscv/cpu.h|   8 +-
>  target/riscv/cpu_bits.h   |   3 +
>  target/riscv/cpu_helper.c | 172 --
>  target/riscv/csr.c|  10 ++-
>  target/riscv/instmap.h|  41 +
>  target/riscv/translate.c  |  17 +++-
>  8 files changed, 292 insertions(+), 30 deletions(-)
>
> --
> 2.34.1
>
>

[PATCH v7 14/14] tests: Add postcopy preempt tests

2022-05-24 Thread Peter Xu

Four tests are added for preempt mode:

  - Postcopy plain
  - Postcopy recovery
  - Postcopy tls
  - Postcopy tls+recovery

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 58 
 1 file changed, 58 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 12f1e3a751..ca2082a7d9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -558,6 +558,7 @@ typedef struct {
 
 /* Postcopy specific fields */
 void *postcopy_data;
+bool postcopy_preempt;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1063,6 +1064,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (args->postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 /* We want to pick a speed slow enough that the test completes
  * quickly, but that it doesn't complete precopy even on a slow
  * machine, so also set the downtime.
@@ -1131,6 +1137,26 @@ static void test_postcopy_tls_psk(void)
 test_postcopy_common(&args);
 }
 
+static void test_postcopy_preempt(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_common(&args);
+}
+
+static void test_postcopy_preempt_tls_psk(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common(&args);
+}
+
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -1210,6 +1236,27 @@ static void test_postcopy_recovery_tls_psk(void)
 test_postcopy_recovery_common(&args);
 }
 
+static void test_postcopy_preempt_recovery(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_recovery_common(&args);
+}
+
+/* This contains preempt+recovery+tls test altogether */
+static void test_postcopy_preempt_all(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common(&args);
+}
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2194,6 +2241,17 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/recovery/tls/psk",
test_postcopy_recovery_tls_psk);
 #endif /* CONFIG_GNUTLS */
+
+qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery/plain",
+   test_postcopy_preempt_recovery);
+#ifdef CONFIG_GNUTLS
+qtest_add_func("/migration/postcopy/preempt/tls/psk",
+   test_postcopy_preempt_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
+   test_postcopy_preempt_all);
+#endif /* CONFIG_GNUTLS */
+
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
-- 
2.32.0

[PATCH v7 12/14] tests: Add postcopy tls migration test

2022-05-24 Thread Peter Xu

We just added TLS tests for precopy but not postcopy.  Add the
corresponding test for vanilla postcopy.

Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 61 +---
 1 file changed, 50 insertions(+), 11 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cb53846114..03f7bb0d96 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -555,6 +555,9 @@ typedef struct {
 
 /* Optional: set number of migration passes to wait for */
 unsigned int iterations;
+
+/* Postcopy specific fields */
+void *postcopy_data;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1043,15 +1046,19 @@ test_migrate_tls_x509_finish(QTestState *from,
 
 static int migrate_postcopy_prepare(QTestState **from_ptr,
 QTestState **to_ptr,
-MigrateStart *args)
+MigrateCommon *args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args->start)) {
 return -1;
 }
 
+if (args->start_hook) {
+args->postcopy_data = args->start_hook(from, to);
+}
+
 migrate_set_capability(from, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
@@ -1076,7 +1083,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 return 0;
 }
 
-static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+static void migrate_postcopy_complete(QTestState *from, QTestState *to,
+  MigrateCommon *args)
 {
 wait_for_migration_complete(from);
 
@@ -1087,25 +1095,48 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to)
 read_blocktime(to);
 }
 
+if (args->finish_hook) {
+args->finish_hook(from, to, args->postcopy_data);
+args->postcopy_data = NULL;
+}
+
 test_migrate_end(from, to, true);
 }
 
-static void test_postcopy(void)
+static void test_postcopy_common(MigrateCommon *args)
 {
-MigrateStart args = {};
 QTestState *from, *to;
 
-if (migrate_postcopy_prepare(&from, &to, &args)) {
+if (migrate_postcopy_prepare(&from, &to, args)) {
 return;
 }
 migrate_postcopy_start(from, to);
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, args);
+}
+
+static void test_postcopy(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_common(&args);
+}
+
+static void test_postcopy_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common(&args);
 }
 
 static void test_postcopy_recovery(void)
 {
-MigrateStart args = {
-.hide_stderr = true,
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
 };
 QTestState *from, *to;
 g_autofree char *uri = NULL;
@@ -1161,7 +1192,7 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, &args);
 }
 
 static void test_baddest(void)
@@ -2133,7 +2164,15 @@ int main(int argc, char **argv)
 
 module_call_init(MODULE_INIT_QOM);
 
-qtest_add_func("/migration/postcopy/unix", test_postcopy);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
+#ifdef CONFIG_GNUTLS
+/*
+ * NOTE: psk test is enough for postcopy, as other types of TLS
+ * channels are tested under precopy.  Here what we want to test is the
+ * general postcopy path that has TLS channel enabled.
+ */
+qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
+#endif /* CONFIG_GNUTLS */
 qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
-- 
2.32.0

[PATCH v7 04/14] migration: Postcopy recover with preempt enabled

2022-05-24 Thread Peter Xu

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c| 27 +++
 migration/migration.h| 19 +++
 migration/postcopy-ram.c | 25 +++--
 migration/qemu-file.c| 27 +++
 migration/qemu-file.h|  1 +
 migration/savevm.c   | 26 --
 migration/trace-events   |  2 ++
 7 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 57cc8bc029..8679fc6407 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(¤t_incoming->rp_mutex);
+qemu_mutex_init(¤t_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(¤t_incoming->main_thread_load_event, false);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(¤t_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(¤t_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(&mis->postcopy_pause_sem_dst);
@@ -3513,6 +3515,18 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_file_shutdown(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(&s->state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3568,8 +3582,13 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 return MIG_THR_ERR_FATAL;
 }
 
-/* Try to detect any file errors */
-ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
+/*
+ * Try to detect any file errors.  Note that postcopy_qemufile_src will
+ * be NULL when postcopy preempt is not enabled.
+ */
+ret = qemu_file_get_error_obj_any(s->to_dst_file,
+  s->postcopy_qemufile_src,
+  &local_error);
 if (!ret) {
 /* Everything is fine */
 assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index ff714c235f..9220cec6bd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/*
+ * Used to sync between the ram load main thread and the fast ram load
+ * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+ * fast channel.
+ *
+ * The ram fast load thread will take it mostly for the whole lifecycle
+ * because it needs to continuously read data from the channel, and
+ * it'll only release this mutex if postcopy is interrupted, so that
+ * the ram load main thread will take this mutex over and properly
+ * release the broken channel.
+ */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of temp host huge pages to be used, one for each postcopy
  * channel.
@@ -147,6 +159,13 @@ struct MigrationInc

[PATCH v7 03/14] migration: Postcopy preemption enablement

2022-05-24 Thread Peter Xu

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c| 251 +++--
 migration/trace-events |   7 ++
 4 files changed, 253 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bedb81849c..57cc8bc029 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3200,6 +3200,8 @@ static int postcopy_start(MigrationState *ms)
   MIGRATION_STATUS_FAILED);
 }
 
+trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
 return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index 941c61e543..ff714c235f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
 /* Previously received RAM's RAMBlock pointer */
-RAMBlock *last_recv_block;
+RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index 992bc44f1b..344c20f56f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -295,6 +295,20 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+/*
+ * Cached ramblock/offset values if preempted.  They're only meaningful if
+ * preempted==true below.
+ */
+RAMBlock *ram_block;
+unsigned long ram_page;
+/*
+ * Whether a postcopy preemption just happened.  Will be reset after
+ * precopy recovered to background migration.
+ */
+bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -349,6 +363,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/* Postcopy preemption informations */
+PostcopyPreemptState postcopy_preempt_state;
+/*
+ * Current channel we're using on src VM.  Only valid if postcopy-preempt
+ * is enabled.
+ */
+unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -356,6 +378,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1947,6 +1974,55 @@ void ram_write_tracking_stop(void)
 }
 #endif /* defined(__linux__) */
 
+/*
+ * Check whether tw

[PATCH v7 09/14] migration: Enable TLS for preempt channel

2022-05-24 Thread Peter Xu

This patch is based on the async preempt channel creation.  It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
---
 migration/postcopy-ram.c | 57 ++--
 migration/trace-events   |  1 +
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1bb603051a..54f05fc2fb 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -36,6 +36,7 @@
 #include "socket.h"
 #include "qemu-file-channel.h"
 #include "yank_functions.h"
+#include "tls.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -1552,15 +1553,15 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
+/*
+ * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
+ * setup the error instead.  This helper will free the ERROR if specified.
+ */
 static void
-postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+postcopy_preempt_send_channel_done(MigrationState *s,
+   QIOChannel *ioc, Error *local_err)
 {
-MigrationState *s = opaque;
-QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-Error *local_err = NULL;
-
-if (qio_task_propagate_error(task, &local_err)) {
-/* Something wrong happened.. */
+if (local_err) {
 migrate_set_error(s, local_err);
 error_free(local_err);
 } else {
@@ -1574,7 +1575,47 @@ postcopy_preempt_send_channel_new(QIOTask *task, 
gpointer opaque)
  * postcopy_qemufile_src to know whether it failed or not.
  */
 qemu_sem_post(&s->postcopy_qemufile_src_sem);
-object_unref(OBJECT(ioc));
+}
+
+static void
+postcopy_preempt_tls_handshake(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+Error *local_err = NULL;
+
+qio_task_propagate_error(task, &local_err);
+postcopy_preempt_send_channel_done(s, ioc, local_err);
+}
+
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+QIOChannelTLS *tioc;
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, &local_err)) {
+goto out;
+}
+
+if (migrate_channel_requires_tls_upgrade(ioc)) {
+tioc = migration_tls_client_create(s, ioc, s->hostname, &local_err);
+if (!tioc) {
+goto out;
+}
+trace_postcopy_preempt_tls_handshake();
+qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-preempt");
+qio_channel_tls_handshake(tioc, postcopy_preempt_tls_handshake,
+  s, NULL, NULL);
+/* Setup the channel until TLS handshake finished */
+return;
+}
+
+out:
+/* This handles both good and error cases */
+postcopy_preempt_send_channel_done(s, ioc, local_err);
 }
 
 /* Returns 0 if channel established, -1 for error. */
diff --git a/migration/trace-events b/migration/trace-events
index 0e385c3a07..a34afe7b85 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -287,6 +287,7 @@ postcopy_request_shared_page(const char *sharer, const char 
*rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, 
uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in 
%s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
+postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-- 
2.32.0

[PATCH v7 10/14] migration: Respect postcopy request order in preemption mode

2022-05-24 Thread Peter Xu

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
---
 migration/ram.c | 65 +++--
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9d76db8491..fe302e7734 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -441,8 +441,28 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
-/* Whether current page is explicitly requested by postcopy */
+/*
+ * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+ * postcopy.  When set, the request is "urgent" because the dest QEMU
+ * threads are waiting for us.
+ */
 bool postcopy_requested;
+/*
+ * [POSTCOPY-ONLY] The target channel to use to send current page.
+ *
+ * Note: This may _not_ match with the value in postcopy_requested
+ * above. Let's imagine the case where the postcopy request is exactly
+ * the page that we're sending in progress during precopy. In this case
+ * we'll have postcopy_requested set to true but the target channel
+ * will be the precopy channel (so that we don't split brain on that
+ * specific page since the precopy channel already contains partial of
+ * that page data).
+ *
+ * Besides that specific use case, postcopy_target_channel should
+ * always be equal to postcopy_requested, because by default we send
+ * postcopy pages via postcopy preempt channel.
+ */
+bool postcopy_target_channel;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -496,6 +516,9 @@ static QemuCond decomp_done_cond;
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
  ram_addr_t offset, uint8_t *source_buf);
 
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+ bool postcopy_requested);
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;
@@ -1516,8 +1539,12 @@ retry:
  */
 static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again

[PATCH v7 13/14] tests: Add postcopy tls recovery migration test

2022-05-24 Thread Peter Xu

It's easy to build this upon the postcopy tls test.  Rename the old
postcopy recovery test to postcopy/recovery/plain.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 38 +++-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 03f7bb0d96..12f1e3a751 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1131,17 +1131,15 @@ static void test_postcopy_tls_psk(void)
 test_postcopy_common(&args);
 }
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_recovery_common(MigrateCommon *args)
 {
-MigrateCommon args = {
-.start = {
-.hide_stderr = true,
-},
-};
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
-if (migrate_postcopy_prepare(&from, &to, &args)) {
+/* Always hide errors for postcopy recover tests since they're expected */
+args->start.hide_stderr = true;
+
+if (migrate_postcopy_prepare(&from, &to, args)) {
 return;
 }
 
@@ -1192,7 +1190,24 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to, &args);
+migrate_postcopy_complete(from, to, args);
+}
+
+static void test_postcopy_recovery(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_recovery_common(&args);
+}
+
+static void test_postcopy_recovery_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common(&args);
 }
 
 static void test_baddest(void)
@@ -2173,7 +2188,12 @@ int main(int argc, char **argv)
  */
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 #endif /* CONFIG_GNUTLS */
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery/plain",
+   test_postcopy_recovery);
+#ifdef CONFIG_GNUTLS
+qtest_add_func("/migration/postcopy/recovery/tls/psk",
+   test_postcopy_recovery_tls_psk);
+#endif /* CONFIG_GNUTLS */
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
-- 
2.32.0

[PATCH v7 11/14] tests: Move MigrateCommon upper

2022-05-24 Thread Peter Xu

So that it can be used in postcopy tests too soon.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d33e8060f9..cb53846114 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -485,6 +485,78 @@ typedef struct {
 const char *opts_target;
 } MigrateStart;
 
+/*
+ * A hook that runs after the src and dst QEMUs have been
+ * created, but before the migration is started. This can
+ * be used to set migration parameters and capabilities.
+ *
+ * Returns: NULL, or a pointer to opaque state to be
+ *  later passed to the TestMigrateFinishHook
+ */
+typedef void * (*TestMigrateStartHook)(QTestState *from,
+   QTestState *to);
+
+/*
+ * A hook that runs after the migration has finished,
+ * regardless of whether it succeeded or failed, but
+ * before QEMU has terminated (unless it self-terminated
+ * due to migration error)
+ *
+ * @opaque is a pointer to state previously returned
+ * by the TestMigrateStartHook if any, or NULL.
+ */
+typedef void (*TestMigrateFinishHook)(QTestState *from,
+  QTestState *to,
+  void *opaque);
+
+typedef struct {
+/* Optional: fine tune start parameters */
+MigrateStart start;
+
+/* Required: the URI for the dst QEMU to listen on */
+const char *listen_uri;
+
+/*
+ * Optional: the URI for the src QEMU to connect to
+ * If NULL, then it will query the dst QEMU for its actual
+ * listening address and use that as the connect address.
+ * This allows for dynamically picking a free TCP port.
+ */
+const char *connect_uri;
+
+/* Optional: callback to run at start to set migration parameters */
+TestMigrateStartHook start_hook;
+/* Optional: callback to run at finish to cleanup */
+TestMigrateFinishHook finish_hook;
+
+/*
+ * Optional: normally we expect the migration process to complete.
+ *
+ * There can be a variety of reasons and stages in which failure
+ * can happen during tests.
+ *
+ * If a failure is expected to happen at time of establishing
+ * the connection, then MIG_TEST_FAIL will indicate that the dst
+ * QEMU is expected to stay running and accept future migration
+ * connections.
+ *
+ * If a failure is expected to happen while processing the
+ * migration stream, then MIG_TEST_FAIL_DEST_QUIT_ERR will indicate
+ * that the dst QEMU is expected to quit with non-zero exit status
+ */
+enum {
+/* This test should succeed, the default */
+MIG_TEST_SUCCEED = 0,
+/* This test should fail, dest qemu should keep alive */
+MIG_TEST_FAIL,
+/* This test should fail, dest qemu should fail with abnormal status */
+MIG_TEST_FAIL_DEST_QUIT_ERR,
+} result;
+
+/* Optional: set number of migration passes to wait for */
+unsigned int iterations;
+} MigrateCommon;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -1107,78 +1179,6 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
-/*
- * A hook that runs after the src and dst QEMUs have been
- * created, but before the migration is started. This can
- * be used to set migration parameters and capabilities.
- *
- * Returns: NULL, or a pointer to opaque state to be
- *  later passed to the TestMigrateFinishHook
- */
-typedef void * (*TestMigrateStartHook)(QTestState *from,
-   QTestState *to);
-
-/*
- * A hook that runs after the migration has finished,
- * regardless of whether it succeeded or failed, but
- * before QEMU has terminated (unless it self-terminated
- * due to migration error)
- *
- * @opaque is a pointer to state previously returned
- * by the TestMigrateStartHook if any, or NULL.
- */
-typedef void (*TestMigrateFinishHook)(QTestState *from,
-  QTestState *to,
-  void *opaque);
-
-typedef struct {
-/* Optional: fine tune start parameters */
-MigrateStart start;
-
-/* Required: the URI for the dst QEMU to listen on */
-const char *listen_uri;
-
-/*
- * Optional: the URI for the src QEMU to connect to
- * If NULL, then it will query the dst QEMU for its actual
- * listening address and use that as the connect address.
- * This allows for dynamically picking a free TCP port.
- */
-const char *connect_uri;
-
-/* Optional: callback to run at start to set migration parameters */
-TestMigrateStartHook start_hook;
-/* Optional: callback to run at finish to cleanup */
-TestMigrateFinishHook finish_hook;
-
-/*
- * Optional: normally we

[PATCH v7 01/14] migration: Add postcopy-preempt capability

2022-05-24 Thread Peter Xu

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 23 +++
 migration/migration.h |  1 +
 qapi/migration.json   |  8 +++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 31739b2af9..f15e1593ac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1240,6 +1240,11 @@ static bool migrate_caps_check(bool *cap_list,
 error_setg(errp, "Postcopy is not compatible with ignore-shared");
 return false;
 }
+
+if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+error_setg(errp, "Multifd is not supported in postcopy");
+return false;
+}
 }
 
 if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
@@ -1283,6 +1288,13 @@ static bool migrate_caps_check(bool *cap_list,
 return false;
 }
 
+if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "Postcopy preempt requires postcopy-ram");
+return false;
+}
+}
+
 return true;
 }
 
@@ -2669,6 +2681,15 @@ bool migrate_background_snapshot(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4283,6 +4304,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
 DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
 DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
 DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
 DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
 DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..d2269c826c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -400,6 +400,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index 6130cd9fae..d8c3810ba2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -461,6 +461,12 @@
 #   procedure starts. The VM RAM is saved with running VM.
 #   (since 6.0)
 #
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#requests to preempt precopy stream, so postcopy requests
+#will be handled faster.  This is a performance feature and
+#should not affect the correctness of postcopy migration.
+#(since 7.1)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -474,7 +480,7 @@
'block', 'return-path', 'pause-before-switchover', 'multifd',
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
-   'validate-uuid', 'background-snapshot'] }
+   'validate-uuid', 'background-snapshot', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.32.0

[PATCH v7 06/14] migration: Add property x-postcopy-preempt-break-huge

2022-05-24 Thread Peter Xu

Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption.  By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++
 migration/ram.c   | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index e8ab876c8d..f5f7a0f91f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4376,6 +4376,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
+DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
+  postcopy_preempt_break_huge, true),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..cdad8aceaa 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,6 +340,13 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
+/*
+ * Whether we allow break sending huge pages when postcopy preempt is
+ * enabled.  When disabled, we won't interrupt precopy within sending a
+ * host huge page, which is the old behavior of vanilla postcopy.
+ * NOTE: this parameter is ignored if postcopy preempt is not enabled.
+ */
+bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/ram.c b/migration/ram.c
index 344c20f56f..9d76db8491 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2266,11 +2266,18 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 
 static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
 {
+MigrationState *ms = migrate_get_current();
+
 /* Not enabled eager preempt?  Then never do that. */
 if (!migrate_postcopy_preempt()) {
 return false;
 }
 
+/* If the user explicitly disabled breaking of huge page, skip */
+if (!ms->postcopy_preempt_break_huge) {
+return false;
+}
+
 /* If the ramblock we're sending is a small page?  Never bother. */
 if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
 return false;
-- 
2.32.0

[PATCH v7 08/14] migration: Export tls-[creds|hostname|authz] params to cmdline too

2022-05-24 Thread Peter Xu

It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.

The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.

The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself.  To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:

object_property_del_all(obj);
object_deinit(obj, ti);

So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.

This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d17f435d08..aa4185148c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4379,6 +4379,9 @@ static Property migration_properties[] = {
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
 DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
   postcopy_preempt_break_huge, true),
+DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
+DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
+DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -4412,12 +4415,9 @@ static void migration_class_init(ObjectClass *klass, 
void *data)
 static void migration_instance_finalize(Object *obj)
 {
 MigrationState *ms = MIGRATION_OBJ(obj);
-MigrationParameters *params = &ms->parameters;
 
 qemu_mutex_destroy(&ms->error_mutex);
 qemu_mutex_destroy(&ms->qemu_file_lock);
-g_free(params->tls_hostname);
-g_free(params->tls_creds);
 qemu_sem_destroy(&ms->wait_unplug_sem);
 qemu_sem_destroy(&ms->rate_limit_sem);
 qemu_sem_destroy(&ms->pause_sem);
-- 
2.32.0

[PATCH v7 05/14] migration: Create the postcopy preempt channel asynchronously

2022-05-24 Thread Peter Xu

This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
---
 migration/migration.c| 16 
 migration/migration.h|  7 +
 migration/postcopy-ram.c | 56 +++-
 migration/postcopy-ram.h |  1 +
 4 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8679fc6407..e8ab876c8d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3063,6 +3063,12 @@ static int postcopy_start(MigrationState *ms)
 int64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
+
+if (postcopy_preempt_wait_channel(ms)) {
+migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+return -1;
+}
+
 if (!migrate_pause_before_switchover()) {
 migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -3544,6 +3550,14 @@ static MigThrError postcopy_pause(MigrationState *s)
 if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
 /* Woken up by a recover procedure. Give it a shot */
 
+if (postcopy_preempt_wait_channel(s)) {
+/*
+ * Preempt enabled, and new channel create failed; loop
+ * back to wait for another recovery.
+ */
+continue;
+}
+
 /*
  * Firstly, let's wake up the return path now, with a new
  * return path channel.
@@ -4407,6 +4421,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_sem_destroy(&ms->postcopy_pause_sem);
 qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
 qemu_sem_destroy(&ms->rp_state.rp_sem);
+qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
 error_free(ms->error);
 }
 
@@ -4456,6 +4471,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(&ms->rp_state.rp_sem, 0);
 qemu_sem_init(&ms->rate_limit_sem, 0);
 qemu_sem_init(&ms->wait_unplug_sem, 0);
+qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
 qemu_mutex_init(&ms->qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 9220cec6bd..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,13 @@ struct MigrationState {
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
 QEMUFile *postcopy_qemufile_src;
+/*
+ * It is posted when the preempt channel is established.  Note: this is
+ * used for both the start or recover of a postcopy migration.  We'll
+ * post to this sem every time a new preempt channel is created in the
+ * main thread, and we keep post() and wait() in pair.
+ */
+QemuSemaphore postcopy_qemufile_src_sem;
 QIOChannelBuffer *bioc;
 /*
  * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b3c81b46f6..1bb603051a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1552,10 +1552,50 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
-QIOChannel *ioc;
+MigrationState *s = opaque;
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, &local_err)) {
+/* Something wrong happened.. */
+migrate_set_error(s, local_err);
+error_free(local_err);
+} else

[PATCH v7 07/14] migration: Add helpers to detect TLS capability

2022-05-24 Thread Peter Xu

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls().  No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/channel.c   | 9 ++---
 migration/migration.c | 1 +
 migration/multifd.c   | 4 +---
 migration/tls.c   | 9 +
 migration/tls.h   | 4 
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index a162d00fea..bf1ff1f2a5 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,9 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_process_incoming(s, ioc, &local_err);
 } else {
 migration_ioc_register_yank(ioc);
@@ -70,10 +68,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_connect(s, ioc, hostname, &error);
 
 if (!error) {
diff --git a/migration/migration.c b/migration/migration.c
index f5f7a0f91f..d17f435d08 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/target_page.h"
 #include "io/channel-buffer.h"
+#include "io/channel-tls.h"
 #include "migration/colo.h"
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 9282ab6aa4..265a169c5c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -831,9 +831,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 migrate_get_current()->hostname, error);
 
 if (!error) {
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 multifd_tls_channel_connect(p, ioc, &error);
 if (!error) {
 /*
diff --git a/migration/tls.c b/migration/tls.c
index 32c384a8b6..73e8c9d3c2 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -166,3 +166,12 @@ void migration_tls_channel_connect(MigrationState *s,
   NULL,
   NULL);
 }
+
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
+{
+if (!migrate_use_tls()) {
+return false;
+}
+
+return !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+}
diff --git a/migration/tls.h b/migration/tls.h
index de4fe2cafd..98e23c9b0e 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -37,4 +37,8 @@ void migration_tls_channel_connect(MigrationState *s,
QIOChannel *ioc,
const char *hostname,
Error **errp);
+
+/* Whether the QIO channel requires further TLS handshake? */
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
+
 #endif
-- 
2.32.0

[PATCH v7 00/14] migration: Postcopy Preemption

2022-05-24 Thread Peter Xu

This is v7 of postcopy preempt series.  It can also be found here:

  https://github.com/xzpeter/qemu/tree/postcopy-preempt

RFC: https://lore.kernel.org/qemu-devel/20220119080929.39485-1-pet...@redhat.com
V1:  https://lore.kernel.org/qemu-devel/20220216062809.57179-1-pet...@redhat.com
V2:  https://lore.kernel.org/qemu-devel/20220301083925.33483-1-pet...@redhat.com
V3:  https://lore.kernel.org/qemu-devel/20220330213908.26608-1-pet...@redhat.com
V4:  https://lore.kernel.org/qemu-devel/20220331150857.74406-1-pet...@redhat.com
V5:  https://lore.kernel.org/qemu-devel/20220425233847.10393-1-pet...@redhat.com
V6:  https://lore.kernel.org/qemu-devel/20220517195730.32312-1-pet...@redhat.com

v7:
- Add more R-bs
- Drop "if" optimization in find_dirty_block() to make sure both fields are
  reset properly [Dave]
- s/migrate_channel_requires_tls/migrate_channel_requires_tls_upgrade/ [Dan]
- Rewrite the test case to use [start|finish]_hook [Dan]

Abstract


This series added a new migration capability called "postcopy-preempt".  It can
be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
postcopy page requests handling process.

Below are some initial postcopy page request latency measurements after the
new series applied.

For each page size, I measured page request latency for three cases:

  (a) Vanilla:the old postcopy
  (b) Preempt no-break-huge:  preempt enabled, x-postcopy-preempt-break-huge=off
  (c) Preempt full:   preempt enabled, x-postcopy-preempt-break-huge=on
  (this is the default option when preempt enabled)

Here x-postcopy-preempt-break-huge parameter is just added in v2 so as to
conditionally disable the behavior to break sending a precopy huge page for
debugging purpose.  So when it's off, postcopy will not preempt precopy
sending a huge page, but still postcopy will use its own channel.

I tested it separately to give a rough idea on which part of the change
helped how much of it.  The overall benefit should be the comparison
between case (a) and (c).

  |---+-+---+--|
  | Page size | Vanilla | Preempt no-break-huge | Preempt full |
  |---+-+---+--|
  | 4K|   10.68 |   N/A [*] | 0.57 |
  | 2M|   10.58 |  5.49 | 5.02 |
  | 1G| 2046.65 |   933.185 |  649.445 |
  |---+-+---+--|
  [*]: This case is N/A because 4K page does not contain huge page at all

[1] 
https://github.com/xzpeter/small-stuffs/blob/master/tools/huge_vm/uffd-latency.bpf

TODO List
=

Avoid precopy write() blocks postcopy
-

I didn't prove this, but I always think the write() syscalls being blocked
for precopy pages can affect postcopy services.  If we can solve this
problem then my wild guess is we can further reduce the average page
latency.

Two solutions at least in mind: (1) we could have made the write side of
the migration channel NON_BLOCK too, or (2) multi-threads on send side,
just like multifd, but we may use lock to protect which page to send too
(e.g., the core idea is we should _never_ rely anything on the main thread,
multifd has that dependency on queuing pages only on main thread).

That can definitely be done and thought about later.

Multi-channel for preemption threads


Currently the postcopy preempt feature use only one extra channel and one
extra thread on dest (no new thread on src QEMU).  It should be mostly good
enough for major use cases, but when the postcopy queue is long enough
(e.g. hundreds of vCPUs faulted on different pages) logically we could
still observe more delays in average.  Whether growing threads/channels can
solve it is debatable, but sounds worthwhile a try.  That's yet another
thing we can think about after this patchset lands.

Logically the design provides space for that - the receiving postcopy
preempt thread can understand all ram-layer migration protocol, and for
multi channel and multi threads we could simply grow that into multile
threads handling the same protocol (with multiple PostcopyTmpPage).  The
source needs more thoughts on synchronizations, though, but it shouldn't
affect the whole protocol layer, so should be easy to keep compatible.

Please review, thanks.

Peter Xu (14):
  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled
  migration: Create the postcopy preempt channel asynchronously
  migration: Add property x-postcopy-preempt-break-huge
  migration: Add helpers to detect TLS capability
  migration: Export tls-[creds|hostname|authz] params to cmdline too
  migration: Enable TLS for preempt channel
  migration: Respect postcopy request

[PATCH v7 02/14] migration: Postcopy preemption preparation on channel creation

2022-05-24 Thread Peter Xu

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Signed-off-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
---
 migration/migration.c| 62 +++
 migration/migration.h|  8 
 migration/postcopy-ram.c | 92 ++--
 migration/postcopy-ram.h | 10 +
 migration/ram.c  | 25 ---
 migration/ram.h  |  4 +-
 migration/savevm.c   | 20 -
 migration/socket.c   | 22 +-
 migration/socket.h   |  1 +
 migration/trace-events   |  5 ++-
 10 files changed, 218 insertions(+), 31 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f15e1593ac..bedb81849c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
+if (mis->postcopy_qemufile_dst) {
+migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+qemu_fclose(mis->postcopy_qemufile_dst);
+mis->postcopy_qemufile_dst = NULL;
+}
+
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error 
**errp)
 migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 bool start_migration;
+QEMUFile *f;
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_fopen_channel_input(ioc);
+f = qemu_fopen_channel_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 
 /*
  * Common migration only needs one channel, so we can start
- * right now.  Multifd needs more than one channel, we wait.
+ * right now.  Some features need more than one channel, we wait.
  */
-start_migration = !migrate_use_multifd();
+start_migration = !migration_needs_multiple_sockets();
 } else {
 /* Multiple connections */
-assert(migrate_use_multifd());
-start_migration = multifd_recv_new_channel(ioc, &local_err);
+assert(migration_needs_multiple_sockets());
+if (migrate_use_multifd()) {
+start_migration = multifd_recv_new_channel(ioc, &local_err);
+} else if (migrate_postcopy_preempt()) {
+f = qemu_fopen_channel_input(ioc);
+start_migration = postcopy_preempt_new_channel(mis, f);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 bool migration_has_all_channels(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
-bool all_channels;
 
-all_channels = multifd_recv_all_channels_created();
+if (!mis->from_src_file) {
+return false;
+}
+
+if (migrate_use_multifd()) {
+return multifd_recv_all_channels_created();
+}
+
+if (migrate_postcopy_preempt()) {
+return mis->postcopy_qemufile_dst != NULL;
+}
 
-return all_channels && mis->from_src_file != NULL;
+return true;
 }
 
 /*
@@ -1885,6 +1911,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3280,6 +3312,11 @@ static void migration_completion(MigrationState *s)
 qemu_savevm_state_complete_postcopy(s->to_dst_file);
 qemu_mutex_unlock_iothread();
 
+/* Shutdown the postcopy fast path thread */
+if (migrate_postcopy_preempt()) {
+postcopy_preempt_shutdown_file(s);
+}
+
 trace_migration_completion_postcopy_end_after_complete();
 } else {
 goto fail;
@@ -4167,6 +4204,15 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 }
 }
 
+/* This needs

Re: [PATCH 02/20] migration: switch to use QIOChannelNull for dummy channel

2022-05-24 Thread Eric Blake

On Tue, May 24, 2022 at 12:02:17PM +0100, Daniel P. Berrangé wrote:
> This removes one further custom impl of QEMUFile, in favour of a
> QIOChannel based impl.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  migration/ram.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3] block/gluster: correctly set max_pdiscard

2022-05-24 Thread Eric Blake

On Fri, May 20, 2022 at 09:59:22AM +0200, Fabian Ebner wrote:
> On 64-bit platforms, assigning SIZE_MAX to the int64_t max_pdiscard
> results in a negative value, and the following assertion would trigger
> down the line (it's not the same max_pdiscard, but computed from the
> other one):
> qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion
> `max_pdiscard >= bs->bl.request_alignment' failed.
> 
> On 32-bit platforms, it's fine to keep using SIZE_MAX.
> 
> The assertion in qemu_gluster_co_pdiscard() is checking that the value
> of 'bytes' can safely be passed to glfs_discard_async(), which takes a
> size_t for the argument in question, so it is kept as is. And since
> max_pdiscard is still <= SIZE_MAX, relying on max_pdiscard is still
> fine.
> 
> Fixes: 0c8022876f ("block: use int64_t instead of int in driver discard 
> handlers")
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Fabian Ebner 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 01/20] io: add a QIOChannelNull equivalent to /dev/null

2022-05-24 Thread Eric Blake

On Tue, May 24, 2022 at 12:02:16PM +0100, Daniel P. Berrangé wrote:
> This is for code which needs a portable equivalent to a QIOChannelFile
> connected to /dev/null.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  include/io/channel-null.h |  55 +++
>  io/channel-null.c | 237 ++
>  io/meson.build|   1 +
>  io/trace-events   |   3 +
>  tests/unit/meson.build|   1 +
>  tests/unit/test-io-channel-null.c |  95 
>  6 files changed, 392 insertions(+)
>  create mode 100644 include/io/channel-null.h
>  create mode 100644 io/channel-null.c
>  create mode 100644 tests/unit/test-io-channel-null.c

> +/**
> + * QIOChannelNull:
> + *
> + * The QIOChannelNull object provides a channel implementation
> + * that discards all writes and returns zero bytes for all reads.

That describes the behavior of /dev/zero, not /dev/null, where reads
always fail with EOF.

> + */
> +
> +struct QIOChannelNull {
> +QIOChannel parent;
> +bool closed;
> +};
> +

> diff --git a/io/channel-null.c b/io/channel-null.c

> +
> +static ssize_t
> +qio_channel_null_readv(QIOChannel *ioc,
> +   const struct iovec *iov,
> +   size_t niov,
> +   int **fds G_GNUC_UNUSED,
> +   size_t *nfds G_GNUC_UNUSED,
> +   Error **errp)
> +{
> +QIOChannelNull *nioc = QIO_CHANNEL_NULL(ioc);
> +
> +if (nioc->closed) {
> +error_setg_errno(errp, EINVAL,
> + "Channel is closed");
> +return -1;
> +}
> +
> +return 0;
> +}

But this behavior is returning early EOF instead of using iov_memset()
to read all zeroes the way /dev/zero would.

> +++ b/tests/unit/test-io-channel-null.c

> +static void test_io_channel_null_io(void)
> +{
> +g_autoptr(QIOChannelNull) null = qio_channel_null_new();
> +char buf[1024];
> +GIOCondition gotcond = 0;
> +Error *local_err = NULL;
> +
> +g_assert(qio_channel_write(QIO_CHANNEL(null),
> +   "Hello World", 11,
> +   &error_abort) == 11);

I still cringe seeing tests inside g_assert(), but this is not the
first instance of it.

> +
> +g_assert(qio_channel_read(QIO_CHANNEL(null),
> +  buf, sizeof(buf),
> +  &error_abort) == 0);

Okay, you're testing for /dev/null behavior of early EOF.

Other than misleading comments, this looks reasonable.  But those
comments are core enough as to what this channel does that I don't
feel comfortable giving R-b yet.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 06/10] block: Make 'bytes' param of bdrv_co_{pread,pwrite,preadv,pwritev}() an int64_t

2022-05-24 Thread Eric Blake

On Thu, May 19, 2022 at 03:48:36PM +0100, Alberto Faria wrote:
> For consistency with other I/O functions, and in preparation to
> implement bdrv_{pread,pwrite}() using generated_co_wrapper.
> 
> unsigned int fits in int64_t, so all callers remain correct.
> 
> Signed-off-by: Alberto Faria 
> ---
>  block/coroutines.h   | 4 ++--
>  include/block/block_int-io.h | 8 ++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v6 11/13] tests: Add postcopy tls migration test

2022-05-24 Thread Peter Xu

On Thu, May 19, 2022 at 11:11:34AM +0100, Daniel P. Berrangé wrote:
> On Tue, May 17, 2022 at 03:57:28PM -0400, Peter Xu wrote:
> > We just added TLS tests for precopy but not postcopy.  Add the
> > corresponding test for vanilla postcopy.
> > 
> > Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
> > will only use unix sockets as channel.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  tests/qtest/migration-test.c | 50 +++-
> >  1 file changed, 43 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index d33e8060f9..e8304aa454 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -481,6 +481,10 @@ typedef struct {
> >  bool only_target;
> >  /* Use dirty ring if true; dirty logging otherwise */
> >  bool use_dirty_ring;
> > +/* Whether use TLS channels for postcopy test? */
> > +bool postcopy_tls;
> > +/* Used only if postcopy_tls==true, to cache the data object */
> > +void *postcopy_tls_data;
> 
> Rather than adding these fields, I think it would be preferrable to
> pass the hooks in the same way I did for the precopy tests.

I can give it a shot.

Ideally I think we should rename MigrationCommon to MigrationPrecopy and
keep all the precopy stuff there, meanwhile we could have MigrationPostcopy
which will also include MigrationStart but keeps the postcopy bits around.
Then I'd need to move start_hook and so into MigrationStart.  But let me
start from simple..

-- 
Peter Xu

Re: [PATCH] tests: Bump Fedora image version for cross-compilation

2022-05-24 Thread Marc-André Lureau

On Tue, May 24, 2022 at 8:11 PM Konstantin Kostiuk  wrote:
>
> There are 2 reason for the bump:
>  - Fedora 33 is unsupported anymore
>  - Some changes in the guest agent required updates of
>mingw-headers
>
> Signed-off-by: Konstantin Kostiuk 

Reviewed-by: Marc-André Lureau 

> ---
>  tests/docker/dockerfiles/fedora-win32-cross.docker | 2 +-
>  tests/docker/dockerfiles/fedora-win64-cross.docker | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tests/docker/dockerfiles/fedora-win32-cross.docker 
> b/tests/docker/dockerfiles/fedora-win32-cross.docker
> index 84a8f5524d..a06bd29e8e 100644
> --- a/tests/docker/dockerfiles/fedora-win32-cross.docker
> +++ b/tests/docker/dockerfiles/fedora-win32-cross.docker
> @@ -1,4 +1,4 @@
> -FROM registry.fedoraproject.org/fedora:33
> +FROM registry.fedoraproject.org/fedora:35
>
>  # Please keep this list sorted alphabetically
>  ENV PACKAGES \
> diff --git a/tests/docker/dockerfiles/fedora-win64-cross.docker 
> b/tests/docker/dockerfiles/fedora-win64-cross.docker
> index d7ed8eb1cf..b71624330f 100644
> --- a/tests/docker/dockerfiles/fedora-win64-cross.docker
> +++ b/tests/docker/dockerfiles/fedora-win64-cross.docker
> @@ -1,4 +1,4 @@
> -FROM registry.fedoraproject.org/fedora:33
> +FROM registry.fedoraproject.org/fedora:35
>
>  # Please keep this list sorted alphabetically
>  ENV PACKAGES \
> --
> 2.25.1
>

Re: [PATCH v4 11/14] softmmu/memory: add memory_region_try_add_subregion function

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 


On Fri, Mar 4, 2022 at 7:00 PM Damien Hedde 
wrote:

>
>
>
> On 3/3/22 14:32, Philippe Mathieu-Daudé wrote:
> > On 23/2/22 10:12, Damien Hedde wrote:
> >> Hi Philippe,
> >>
> >> I suppose it is ok if I change your mail in the reviewed by ?
> >
> > No, the email is fine (git tools should take care of using the
> > correct email via the .mailmap entry, see commit 90f285fd83).
> >
> >> Thanks,
> >> Damien
> >>
>
> ok.
>
> Looks like git keeps as-is the "*-by:" entries untouched when cc-ing them.
>
> --
> Damien
>
>

Re: [PATCH v4 09/14] none-machine: allow cold plugging sysbus devices

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, Mar 3, 2022 at 10:46 PM Philippe Mathieu-Daudé <
philippe.mathieu.da...@gmail.com> wrote:

> On 23/2/22 10:07, Damien Hedde wrote:
> > Allow plugging any sysbus device on this machine (the sysbus
> > devices still need to be 'user-creatable').
> >
> > This commit is needed to use the 'none' machine as a base, and
> > subsequently to dynamically populate it with sysbus devices using
> > qapi commands.
> >
> > Note that this only concern cold-plug: sysbus devices cann't be hot
>
> "can not" is easier to understand for non-native / not good level of
> English speakers IMHO.
>
> > plugged because the sysbus bus does not support it.
> >
> > Signed-off-by: Damien Hedde 
> > ---
> >   hw/core/null-machine.c | 4 
> >   1 file changed, 4 insertions(+)
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>

Re: [PATCH v4 13/14] hw/mem/system-memory: add a memory sysbus device

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Wed, Feb 23, 2022 at 5:14 PM Damien Hedde 
wrote:

> This device can be used to create a memory wrapped into a
> sysbus device.
> This device has one property 'readonly' which allows
> to choose between a ram or a rom.
>
> The purpose for this device is to be used with qapi command
> device_add.
>
> Signed-off-by: Damien Hedde 
> ---
>  include/hw/mem/sysbus-memory.h | 28 
>  hw/mem/sysbus-memory.c | 80 ++
>  hw/mem/meson.build |  2 +
>  3 files changed, 110 insertions(+)
>  create mode 100644 include/hw/mem/sysbus-memory.h
>  create mode 100644 hw/mem/sysbus-memory.c
>
> diff --git a/include/hw/mem/sysbus-memory.h
> b/include/hw/mem/sysbus-memory.h
> new file mode 100644
> index 00..5c596f8b4f
> --- /dev/null
> +++ b/include/hw/mem/sysbus-memory.h
> @@ -0,0 +1,28 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * SysBusDevice Memory
> + *
> + * Copyright (c) 2021 Greensocs
> + */
> +
> +#ifndef HW_SYSBUS_MEMORY_H
> +#define HW_SYSBUS_MEMORY_H
> +
> +#include "hw/sysbus.h"
> +#include "qom/object.h"
> +
> +#define TYPE_SYSBUS_MEMORY "sysbus-memory"
> +OBJECT_DECLARE_SIMPLE_TYPE(SysBusMemoryState, SYSBUS_MEMORY)
> +
> +struct SysBusMemoryState {
> +/*  */
> +SysBusDevice parent_obj;
> +uint64_t size;
> +bool readonly;
> +
> +/*  */
> +MemoryRegion mem;
> +};
> +
> +#endif /* HW_SYSBUS_MEMORY_H */
> diff --git a/hw/mem/sysbus-memory.c b/hw/mem/sysbus-memory.c
> new file mode 100644
> index 00..f1ad7ba7ec
> --- /dev/null
> +++ b/hw/mem/sysbus-memory.c
> @@ -0,0 +1,80 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * SysBusDevice Memory
> + *
> + * Copyright (c) 2021 Greensocs
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/mem/sysbus-memory.h"
> +#include "hw/qdev-properties.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qapi/error.h"
> +
> +static Property sysbus_memory_properties[] = {
> +DEFINE_PROP_UINT64("size", SysBusMemoryState, size, 0),
> +DEFINE_PROP_BOOL("readonly", SysBusMemoryState, readonly, false),
> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void sysbus_memory_realize(DeviceState *dev, Error **errp)
> +{
> +SysBusMemoryState *s = SYSBUS_MEMORY(dev);
> +gchar *name;
> +
> +if (!s->size) {
> +error_setg(errp, "'size' must be non-zero.");
> +return;
> +}
> +
> +/*
> + * We impose having an id (which is unique) because we need to
> generate
> + * a unique name for the memory region.
> + * memory_region_init_ram/rom() will abort() (in qemu_ram_set_idstr()
> + * function if 2 system-memory devices are created with the same name
> + * for the memory region).
> + */
> +if (!dev->id) {
> +error_setg(errp, "system-memory device must have an id.");
> +return;
> +}
> +name = g_strdup_printf("%s.region", dev->id);
> +
> +if (s->readonly) {
> +memory_region_init_rom(&s->mem, OBJECT(dev), name, s->size, errp);
> +} else {
> +memory_region_init_ram(&s->mem, OBJECT(dev), name, s->size, errp);
> +}
> +
> +g_free(name);
> +if (*errp) {
> +return;
> +}
> +
> +sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->mem);
> +}
> +
> +static void sysbus_memory_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +dc->user_creatable = true;
> +dc->realize = sysbus_memory_realize;
> +device_class_set_props(dc, sysbus_memory_properties);
> +}
> +
> +static const TypeInfo sysbus_memory_info = {
> +.name  = TYPE_SYSBUS_MEMORY,
> +.parent= TYPE_SYS_BUS_DEVICE,
> +.instance_size = sizeof(SysBusMemoryState),
> +.class_init= sysbus_memory_class_init,
> +};
> +
> +static void sysbus_memory_register_types(void)
> +{
> +type_register_static(&sysbus_memory_info);
> +}
> +
> +type_init(sysbus_memory_register_types)
> diff --git a/hw/mem/meson.build b/hw/mem/meson.build
> index 82f86d117e..04c74e12f2 100644
> --- a/hw/mem/meson.build
> +++ b/hw/mem/meson.build
> @@ -7,3 +7,5 @@ mem_ss.add(when: 'CONFIG_NVDIMM', if_true:
> files('nvdimm.c'))
>  softmmu_ss.add_all(when: 'CONFIG_MEM_DEVICE', if_true: mem_ss)
>
>  softmmu_ss.add(when: 'CONFIG_SPARSE_MEM', if_true: files('sparse-mem.c'))
> +
> +softmmu_ss.add(files('sysbus-memory.c'))
> --
> 2.35.1
>
>
>

Re: [PATCH v4 14/14] hw: set user_creatable on opentitan/sifive_e devices

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Fri, Mar 4, 2022 at 11:23 PM Philippe Mathieu-Daudé <
philippe.mathieu.da...@gmail.com> wrote:

> On 23/2/22 10:07, Damien Hedde wrote:
> > The devices are:
> > + ibex-timer
> > + ibex-uart
> > + riscv.aclint.swi
> > + riscv.aclint.mtimer
> > + riscv.hart_array
> > + riscv.sifive.e.prci
> > + riscv.sifive.plic
> > + riscv.sifive.uart
> > + sifive_soc.gpio
> > + unimplemented-device
> >
> > These devices are clean regarding error handling in realize.
> >
> > They are all sysbus devices, so setting user-creatable will only
> > enable cold-plugging them on machine having explicitely allowed them
> > (only _none_ machine does that).
> >
> > Note that this commit include the ricv_array which embeds cpus. There
>
> Typo "includes" I guess.
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
> > are some deep internal constraints about them: you cannot create more
> > cpus than the machine's maxcpus. TCG accelerator's code will for example
> > assert if a user try to create too many cpus.
> >
> > Signed-off-by: Damien Hedde 
> > ---
> >
> > I can also split this patch if you think it's better.
> > But it is mostly a one-line fix per file.
> >
> > This patch requires first some cleanups in order to fix error errors
> > and some more memory leaks that could happend in legit user-related
> > life cycles: a miss-configuration should not be a fatal error anymore.
> >
> https://lore.kernel.org/qemu-devel/20220218164646.132112-1-damien.he...@greensocs.com
> > ---
> >   hw/char/ibex_uart.c | 1 +
> >   hw/char/sifive_uart.c   | 1 +
> >   hw/gpio/sifive_gpio.c   | 1 +
> >   hw/intc/riscv_aclint.c  | 2 ++
> >   hw/intc/sifive_plic.c   | 1 +
> >   hw/misc/sifive_e_prci.c | 8 
> >   hw/misc/unimp.c | 1 +
> >   hw/riscv/riscv_hart.c   | 1 +
> >   hw/timer/ibex_timer.c   | 1 +
> >   9 files changed, 17 insertions(+)
>
>

Re: [PATCH v4 08/14] none-machine: add 'ram-addr' property

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Fri, Mar 4, 2022 at 12:36 AM Damien Hedde 
wrote:

>
>
> On 3/3/22 15:41, Philippe Mathieu-Daudé wrote:
> > On 23/2/22 10:07, Damien Hedde wrote:
> >> Add the property to configure a the base address of the ram.
> >> The default value remains zero.
> >>
> >> This commit is needed to use the 'none' machine as a base, and
> >> subsequently to dynamically populate it using qapi commands. Having
> >> a non null 'ram' is really hard to workaround because of the actual
> >> constraints on the generic loader: it prevents loading binaries
> >> bigger than ram_size (with a null ram, we cannot load anything).
> >> For now we need to be able to use the existing ram creation
> >> feature of the none machine with a configurable base address.
> >>
> >> Signed-off-by: Damien Hedde 
> >> ---
> >>   hw/core/null-machine.c | 34 --
> >>   1 file changed, 32 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/core/null-machine.c b/hw/core/null-machine.c
> >> index 7eb258af07..5fd1cc0218 100644
> >> --- a/hw/core/null-machine.c
> >> +++ b/hw/core/null-machine.c
> >> @@ -16,9 +16,11 @@
> >>   #include "hw/boards.h"
> >>   #include "exec/address-spaces.h"
> >>   #include "hw/core/cpu.h"
> >> +#include "qapi/visitor.h"
> >>   struct NoneMachineState {
> >>   MachineState parent;
> >> +uint64_t ram_addr;
> >>   };
> >>   #define TYPE_NONE_MACHINE MACHINE_TYPE_NAME("none")
> >> @@ -26,6 +28,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(NoneMachineState,
> >> NONE_MACHINE)
> >>   static void machine_none_init(MachineState *mch)
> >>   {
> >> +NoneMachineState *nms = NONE_MACHINE(mch);
> >>   CPUState *cpu = NULL;
> >>   /* Initialize CPU (if user asked for it) */
> >> @@ -37,9 +40,13 @@ static void machine_none_init(MachineState *mch)
> >>   }
> >>   }
> >> -/* RAM at address zero */
> >> +/* RAM at configured address (default: 0) */
> >>   if (mch->ram) {
> >> -memory_region_add_subregion(get_system_memory(), 0, mch->ram);
> >> +memory_region_add_subregion(get_system_memory(), nms->ram_addr,
> >> +mch->ram);
> >> +} else if (nms->ram_addr) {
> >> +error_report("'ram-addr' has been specified but the size is
> >> zero");
> >
> > I'm not sure about this error message, IIUC we can get here if no ram
> > backend is provided, not if we have one zero-sized. Otherwise LGTM.
>
> You're most probably right. Keeping the ram_size to 0 is just one way of
> getting here. I can replace the message by a more generic formulation
> "'ram-addr' has been specified but the machine has no ram"
>
>
>

Re: [RFC PATCH 2/2] arm/virt: Add aspeed-i2c controller and MCTP EP to enable MCTP testing

2022-05-24 Thread Ben Widawsky

On 22-05-20 18:01:28, Jonathan Cameron wrote:
> As the only I2C emulation in QEMU that supports being both
> a master and a slave, suitable for MCTP over i2c is aspeed-i2c
> add this controller to the arm virt model and hook up our new
> i2c_mctp_cxl_fmapi device.
> 
> The current Linux driver for aspeed-i2c has a hard requirement on
> a reset controller.  Throw down the simplest reset controller
> I could find so as to avoid need to make any chance to the kernel
> code.

s/chance/change

> 
> Patch also builds appropriate device tree.  Unfortunately for CXL
> we need to use ACPI (no DT bindings yet defined). Enabling this will
> either require appropriate support for MCTP on an i2c master that
> has ACPI bindings, or modifications of the kernel driver to support
> ACPI with aspeed-i2c (which might be a little controversial ;)

I'm naive to what DT defines, but I assume what's there already is insufficient
to make the bindings for CXL. I say this because I believe it wouldn't be too
bad at all to make a cxl_dt.ko, and it's certainly less artificial than
providing ACPI support for things which don't naturally have ACPI support.

> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/arm/Kconfig|  1 +
>  hw/arm/virt.c | 77 +++
>  include/hw/arm/virt.h |  2 ++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 219262a8da..4a733298cd 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -30,6 +30,7 @@ config ARM_VIRT
>  select ACPI_VIOT
>  select VIRTIO_MEM_SUPPORTED
>  select ACPI_CXL
> +select I2C_MCTP_CXL_FMAPI
>  
>  config CHEETAH
>  bool
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d818131b57..ea04279515 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -80,6 +80,9 @@
>  #include "hw/char/pl011.h"
>  #include "hw/cxl/cxl.h"
>  #include "qemu/guest-random.h"
> +#include "hw/i2c/i2c.h"
> +#include "hw/i2c/aspeed_i2c.h"
> +#include "hw/misc/i2c_mctp_cxl_fmapi.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>  static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -156,6 +159,8 @@ static const MemMapEntry base_memmap[] = {
>  [VIRT_PVTIME] = { 0x090a, 0x0001 },
>  [VIRT_SECURE_GPIO] ={ 0x090b, 0x1000 },
>  [VIRT_MMIO] =   { 0x0a00, 0x0200 },
> +[VIRT_I2C] ={ 0x0b00, 0x4000 },
> +[VIRT_RESET_FAKE] = { 0x0b004000, 0x0010 },
>  /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size 
> */
>  [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
>  [VIRT_SECURE_MEM] = { 0x0e00, 0x0100 },
> @@ -192,6 +197,7 @@ static const int a15irqmap[] = {
>  [VIRT_GPIO] = 7,
>  [VIRT_SECURE_UART] = 8,
>  [VIRT_ACPI_GED] = 9,
> +[VIRT_I2C] = 10,
>  [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
>  [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
>  [VIRT_SMMU] = 74,/* ...to 74 + NUM_SMMU_IRQS - 1 */
> @@ -1996,6 +2002,75 @@ static void virt_cpu_post_init(VirtMachineState *vms, 
> MemoryRegion *sysmem)
>  }
>  }
>  
> +static void create_mctp_test(MachineState *ms)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(ms);
> +MemoryRegion *sysmem = get_system_memory();
> +AspeedI2CState *aspeedi2c;
> +struct DeviceState  *dev;
> +char *nodename_i2c_master;
> +char *nodename_i2c_sub;
> +char *nodename_reset;
> +uint32_t clk_phandle, reset_phandle;
> +MemoryRegion *sysmem2;
> +   
> +dev = qdev_new("aspeed.i2c-ast2600");
> +aspeedi2c = ASPEED_I2C(dev);
> +object_property_set_link(OBJECT(dev), "dram", OBJECT(ms->ram), 
> &error_fatal);
> +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vms->memmap[VIRT_I2C].base);
> +sysbus_connect_irq(SYS_BUS_DEVICE(&aspeedi2c->busses[0]), 0, 
> qdev_get_gpio_in(vms->gic, vms->irqmap[VIRT_I2C]));
> +
> +/* I2C bus DT */
> +reset_phandle = qemu_fdt_alloc_phandle(ms->fdt);
> +nodename_reset = g_strdup_printf("/reset@%" PRIx64, 
> vms->memmap[VIRT_RESET_FAKE].base);
> +qemu_fdt_add_subnode(ms->fdt, nodename_reset);
> +qemu_fdt_setprop_string(ms->fdt, nodename_reset, "compatible", 
> "snps,dw-low-reset");
> +qemu_fdt_setprop_sized_cells(ms->fdt, nodename_reset, "reg",
> + 2, vms->memmap[VIRT_RESET_FAKE].base,
> + 2, vms->memmap[VIRT_RESET_FAKE].size);
> +qemu_fdt_setprop_cell(ms->fdt, nodename_reset, "#reset-cells", 0x1);
> +qemu_fdt_setprop_cell(ms->fdt, nodename_reset, "phandle", reset_phandle);
> +sysmem2 =  g_new(MemoryRegion, 1);
> +memory_region_init_ram(sysmem2, NULL, "reset", 
> vms->memmap[VIRT_RESET_FAKE].size, NULL);
> +memory_region_add_subregion(sysmem, vms->memmap[VIRT_RESET_FAKE].base, 
> s

Re: [PATCH v4 12/14] add sysbus-mmio-map qapi command

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Wed, Feb 23, 2022 at 5:37 PM Damien Hedde 
wrote:

> This command allows to map an mmio region of sysbus device onto
> the system memory. Its behavior mimics the sysbus_mmio_map()
> function apart from the automatic unmap (the C function unmaps
> the region if it is already mapped).
> For the qapi function we consider it is an error to try to map
> an already mapped function. If unmapping is required, it is
> probably better to add a sysbus-mmip-unmap command.
>
> This command is still experimental (hence the 'unstable' feature),
> as it is related to the sysbus device creation through qapi commands.
>
> This command is required to be able to dynamically build a machine
> from scratch as there is no qapi-way of doing a memory mapping.
>
> Signed-off-by: Damien Hedde 
> ---
> Cc: Alistair Francis 
>
> v4:
>  + integrate priority parameter
>  + use 'unstable' feature flag instead of 'x-' prefix
>  + bump version to 7.0
>  + dropped Alistair's reviewed-by as a consequence
> ---
>  qapi/qdev.json   | 31 ++
>  hw/core/sysbus.c | 49 
>  2 files changed, 80 insertions(+)
>
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 2e2de41499..4830e87a90 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -160,3 +160,34 @@
>  ##
>  { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
>'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @sysbus-mmio-map:
> +#
> +# Map a sysbus device mmio onto the main system bus.
> +#
> +# @device: the device's QOM path
> +#
> +# @mmio: The mmio number to be mapped (defaults to 0).
> +#
> +# @addr: The base address for the mapping.
> +#
> +# @priority: The priority of the mapping (defaults to 0).
> +#
> +# Features:
> +# @unstable: Command is meant to map sysbus devices
> +#while in preconfig mode.
> +#
> +# Since: 7.0
> +#
> +# Returns: Nothing on success
> +#
> +##
> +
> +{ 'command': 'sysbus-mmio-map',
> +  'data': { 'device': 'str',
> +'*mmio': 'uint8',
> +'addr': 'uint64',
> +'*priority': 'int32' },
> +  'features': ['unstable'],
> +  'allow-preconfig' : true }
> diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
> index 05c1da3d31..df1f1f43a5 100644
> --- a/hw/core/sysbus.c
> +++ b/hw/core/sysbus.c
> @@ -23,6 +23,7 @@
>  #include "hw/sysbus.h"
>  #include "monitor/monitor.h"
>  #include "exec/address-spaces.h"
> +#include "qapi/qapi-commands-qdev.h"
>
>  static void sysbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
>  static char *sysbus_get_fw_dev_path(DeviceState *dev);
> @@ -154,6 +155,54 @@ static void sysbus_mmio_map_common(SysBusDevice *dev,
> int n, hwaddr addr,
>  }
>  }
>
> +void qmp_sysbus_mmio_map(const char *device,
> + bool has_mmio, uint8_t mmio,
> + uint64_t addr,
> + bool has_priority, int32_t priority,
> + Error **errp)
> +{
> +Object *obj = object_resolve_path_type(device, TYPE_SYS_BUS_DEVICE,
> NULL);
> +SysBusDevice *dev;
> +
> +if (phase_get() != PHASE_MACHINE_INITIALIZED) {
> +error_setg(errp, "The command is permitted only when "
> + "the machine is in initialized phase");
> +return;
> +}
> +
> +if (obj == NULL) {
> +error_setg(errp, "Device '%s' not found", device);
> +return;
> +}
> +dev = SYS_BUS_DEVICE(obj);
> +
> +if (!has_mmio) {
> +mmio = 0;
> +}
> +if (!has_priority) {
> +priority = 0;
> +}
> +
> +if (mmio >= dev->num_mmio) {
> +error_setg(errp, "MMIO index '%u' does not exist in '%s'",
> +   mmio, device);
> +return;
> +}
> +
> +if (dev->mmio[mmio].addr != (hwaddr)-1) {
> +error_setg(errp, "MMIO index '%u' is already mapped", mmio);
> +return;
> +}
> +
> +if (!memory_region_try_add_subregion(get_system_memory(), addr,
> + dev->mmio[mmio].memory, priority,
> + errp)) {
> +return;
> +}
> +
> +dev->mmio[mmio].addr = addr;
> +}
> +
>  void sysbus_mmio_unmap(SysBusDevice *dev, int n)
>  {
>  assert(n >= 0 && n < dev->num_mmio);
> --
> 2.35.1
>
>
>

Re: Problem running qos-test when building with gcc12 and LTO

2022-05-24 Thread Dario Faggioli

On Mon, 2022-05-23 at 19:19 +, Dario Faggioli wrote:
> As soon as I get rid of _both_ "-flto=auto" _and_ "--enable-lto", the
> above tests seem to work fine.
> 
> When they fail, they fail immediately, while creating the graph, like
> this:
> 
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img G_TEST_DBUS_DAEMON=../tests/dbus-vmstate-
> daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-
> storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64
> ./tests/qtest/qos-test --tap -k
> # random seed: R02S90d4b61102dd94459f986c2367d6d375
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-
> 28822.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-
> 28822.qmp,id=char0 -mon chardev=char0,mode=control -display none -
> machine none -accel qtest
> QOSStack: full stack, cannot pushAborted
> 
Ok, apparently, v6.2.0 works (with GCC 12 and LTO), while as said
v7.0.0 doesn't.

Therefore, I run a bisect, and it pointed at:

8dcb404bff6d9147765d7dd3e9c8493372186420
tests/qtest: enable more vhost-user tests by default

I've also confirmed that on v7.0.0 with 8dcb404bff6d914 reverted, the
test actually works.

As far as downstream packaging is concerned, I'll revert it locally.
But I'd be happy to help figuring our what is actually going wrong.

I'll try to dig further. Any idea/suggestion anyone has, feel free. :-)

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v4 07/14] none-machine: add the NoneMachineState structure

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Wed, Feb 23, 2022 at 5:59 PM Damien Hedde 
wrote:

> The none machine was using the parent state structure.
> We'll need a custom state to add a field in the following commit.
>
> Signed-off-by: Damien Hedde 
> ---
>  hw/core/null-machine.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/hw/core/null-machine.c b/hw/core/null-machine.c
> index f586a4bef5..7eb258af07 100644
> --- a/hw/core/null-machine.c
> +++ b/hw/core/null-machine.c
> @@ -17,6 +17,13 @@
>  #include "exec/address-spaces.h"
>  #include "hw/core/cpu.h"
>
> +struct NoneMachineState {
> +MachineState parent;
> +};
> +
> +#define TYPE_NONE_MACHINE MACHINE_TYPE_NAME("none")
> +OBJECT_DECLARE_SIMPLE_TYPE(NoneMachineState, NONE_MACHINE)
> +
>  static void machine_none_init(MachineState *mch)
>  {
>  CPUState *cpu = NULL;
> @@ -42,8 +49,10 @@ static void machine_none_init(MachineState *mch)
>  }
>  }
>
> -static void machine_none_machine_init(MachineClass *mc)
> +static void machine_none_class_init(ObjectClass *oc, void *data)
>  {
> +MachineClass *mc = MACHINE_CLASS(oc);
> +
>  mc->desc = "empty machine";
>  mc->init = machine_none_init;
>  mc->max_cpus = 1;
> @@ -56,4 +65,15 @@ static void machine_none_machine_init(MachineClass *mc)
>  mc->no_sdcard = 1;
>  }
>
> -DEFINE_MACHINE("none", machine_none_machine_init)
> +static const TypeInfo none_machine_info = {
> +.name  = TYPE_NONE_MACHINE,
> +.parent= TYPE_MACHINE,
> +.instance_size = sizeof(NoneMachineState),
> +.class_init= machine_none_class_init,
> +};
> +
> +static void none_machine_register_types(void)
> +{
> +type_register_static(&none_machine_info);
> +}
> +type_init(none_machine_register_types);
> --
> 2.35.1
>
>
>

Re: [PATCH v4 05/14] qapi/device_add: handle the rom_order_override when cold-plugging

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Wed, Feb 23, 2022 at 5:18 PM Damien Hedde 
wrote:

> rom_set_order_override() and rom_reset_order_override() were called
> in qemu_create_cli_devices() to set the rom_order_override value
> once and for all when creating the devices added on CLI.
>
> Unfortunately this won't work with qapi commands.
>
> Move the calls inside device_add so that it will be done in every
> case:
> + CLI option: -device
> + QAPI command: device_add
>
> rom_[set|reset]_order_override() are implemented in hw/core/loader.c
> They either do nothing or call fw_cfg_[set|reset]_order_override().
> The later functions are implemented in hw/nvram/fw_cfg.c and only
> change an integer value of a "global" variable.
> In consequence, there are no complex side effects involved and we can
> safely move them from outside the -device option loop to the inner
> function.
>
> Signed-off-by: Damien Hedde 
> ---
>  softmmu/qdev-monitor.c | 11 +++
>  softmmu/vl.c   |  2 --
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 47a89aee20..9ec3e0ebff 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -43,6 +43,7 @@
>  #include "hw/qdev-properties.h"
>  #include "hw/clock.h"
>  #include "hw/boards.h"
> +#include "hw/loader.h"
>
>  /*
>   * Aliases were a bad idea from the start.  Let's keep them
> @@ -671,6 +672,10 @@ DeviceState *qdev_device_add_from_qdict(const QDict
> *opts,
>  return NULL;
>  }
>
> +if (!is_hotplug) {
> +rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
> +}
> +
>  /* create device */
>  dev = qdev_new(driver);
>
> @@ -712,6 +717,9 @@ DeviceState *qdev_device_add_from_qdict(const QDict
> *opts,
>  if (!qdev_realize(DEVICE(dev), bus, errp)) {
>  goto err_del_dev;
>  }
> +if (!is_hotplug) {
> +rom_reset_order_override();
> +}
>  return dev;
>
>  err_del_dev:
> @@ -719,6 +727,9 @@ err_del_dev:
>  object_unparent(OBJECT(dev));
>  object_unref(OBJECT(dev));
>  }
> +if (!is_hotplug) {
> +rom_reset_order_override();
> +}
>  return NULL;
>  }
>
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 50337d68b9..b91ae1b8ae 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2680,7 +2680,6 @@ static void qemu_create_cli_devices(void)
>  }
>
>  /* init generic devices */
> -rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
>  qemu_opts_foreach(qemu_find_opts("device"),
>device_init_func, NULL, &error_fatal);
>  QTAILQ_FOREACH(opt, &device_opts, next) {
> @@ -2697,7 +2696,6 @@ static void qemu_create_cli_devices(void)
>  object_unref(OBJECT(dev));
>  loc_pop(&opt->loc);
>  }
> -rom_reset_order_override();
>  }
>
>  static void qemu_machine_creation_done(void)
> --
> 2.35.1
>
>
>

Re: [PATCH v5 3/6] vl: support machine-initialized target in phase_until()

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, May 19, 2022 at 11:36 PM Damien Hedde 
wrote:

> phase_until() now supports the following transitions:
> + accel-created -> machine-initialized
> + machine-initialized -> machine-ready
>
> As a consequence we can now support the use of qmp_exit_preconfig()
> from phases _accel-created_ and _machine-initialized_.
>
> This commit is a preparation to support cold plugging a device
> using qapi (which will be introduced in a following commit). For this
> we need fine grain control of the phase.
>
> Signed-off-by: Damien Hedde 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>
> v5: update due to refactor of previous commit
> ---
>  softmmu/vl.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 7f8d15b5b8..ea15e37973 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2698,8 +2698,9 @@ static void qemu_machine_creation_done(void)
>
>  void qmp_x_exit_preconfig(Error **errp)
>  {
> -if (phase_check(PHASE_MACHINE_INITIALIZED)) {
> -error_setg(errp, "The command is permitted only before machine
> initialization");
> +if (phase_check(PHASE_MACHINE_READY)) {
> +error_setg(errp, "The command is permitted only before"
> + " machine is ready");
>  return;
>  }
>  phase_until(PHASE_MACHINE_READY, errp);
> @@ -2707,9 +2708,6 @@ void qmp_x_exit_preconfig(Error **errp)
>
>  static void qemu_phase_ready(Error **errp)
>  {
> -qemu_init_board();
> -/* phase is now PHASE_MACHINE_INITIALIZED. */
> -qemu_create_cli_devices();
>  cxl_fixed_memory_window_link_targets(errp);
>  qemu_machine_creation_done();
>  /* Phase is now PHASE_MACHINE_READY. */
> @@ -2749,6 +2747,24 @@ bool phase_until(MachineInitPhase phase, Error
> **errp)
>
>  switch (cur_phase) {
>  case PHASE_ACCEL_CREATED:
> +qemu_init_board();
> +/* Phase is now PHASE_MACHINE_INITIALIZED. */
> +/*
> + * Handle CLI devices now in order leave this case in a state
> + * where we can cold plug devices with QMP. The following call
> + * handles the CLI options:
> + * + -fw_cfg (has side effects on device cold plug)
> + * + -device
> + */
> +qemu_create_cli_devices();
> +/*
> + * At this point all CLI options are handled apart:
> + * + -S (autostart)
> + * + -incoming
> + */
> +break;
> +
> +case PHASE_MACHINE_INITIALIZED:
>  qemu_phase_ready(errp);
>  break;
>
> --
> 2.36.1
>
>
>

Re: [PATCH v5 2/6] machine&vl: introduce phase_until() to handle phase transitions

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, May 19, 2022 at 11:41 PM Damien Hedde 
wrote:

> phase_until() is implemented in vl.c and is meant to be used
> to make startup progress up to a specified phase being reached().
> At this point, no behavior change is introduced: phase_until()
> only supports a single double transition corresponding
> to the functionality of qmp_exit_preconfig():
> + accel-created -> machine-initialized -> machine-ready
>
> As a result qmp_exit_preconfig() now uses phase_until().
>
> This commit is a preparation to support cold plugging a device
> using qapi command (which will be introduced in a following commit).
> For this we need fine grain control of the phase.
>
> Signed-off-by: Damien Hedde 
> ---
>
> v5:
>   + refactor to avoid indentation change
> ---
>  include/hw/qdev-core.h | 14 +
>  softmmu/vl.c   | 46 ++
>  2 files changed, 60 insertions(+)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index e29c705b74..5f73d06408 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -909,4 +909,18 @@ extern bool phase_check(MachineInitPhase phase);
>   */
>  extern void phase_advance(MachineInitPhase phase);
>
> +/**
> + * @phase_until:
> + * @phase: the target phase
> + * @errp: error report
> + *
> + * Make the machine init progress until the target phase is reached.
> + *
> + * Its is a no-op is the target phase is the current or an earlier
> + * phase.
> + *
> + * Returns true in case of success.
> + */
> +extern bool phase_until(MachineInitPhase phase, Error **errp);
> +
>  #endif
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 84a31eba76..7f8d15b5b8 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2702,11 +2702,17 @@ void qmp_x_exit_preconfig(Error **errp)
>  error_setg(errp, "The command is permitted only before machine
> initialization");
>  return;
>  }
> +phase_until(PHASE_MACHINE_READY, errp);
> +}
>
> +static void qemu_phase_ready(Error **errp)
> +{
>  qemu_init_board();
> +/* phase is now PHASE_MACHINE_INITIALIZED. */
>  qemu_create_cli_devices();
>  cxl_fixed_memory_window_link_targets(errp);
>  qemu_machine_creation_done();
> +/* Phase is now PHASE_MACHINE_READY. */
>
>  if (loadvm) {
>  load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
> @@ -2729,6 +2735,46 @@ void qmp_x_exit_preconfig(Error **errp)
>  }
>  }
>
> +bool phase_until(MachineInitPhase phase, Error **errp)
> +{
> +ERRP_GUARD();
> +if (!phase_check(PHASE_ACCEL_CREATED)) {
> +error_setg(errp, "Phase transition is not supported until
> accelerator"
> +   " is created");
> +return false;
> +}
> +
> +while (!phase_check(phase)) {
> +MachineInitPhase cur_phase = phase_get();
> +
> +switch (cur_phase) {
> +case PHASE_ACCEL_CREATED:
> +qemu_phase_ready(errp);
> +break;
> +
> +default:
> +/*
> + * If we end up here, it is because we miss a case above.
> + */
> +error_setg(&error_abort, "Requested phase transition is not"
> +   " implemented");
> +return false;
> +}
> +
> +if (*errp) {
> +return false;
> +}
> +
> +/*
> + * Ensure we made some progress.
> + * With the default case above, it should be enough to prevent
> + * any infinite loop.
> + */
> +assert(cur_phase < phase_get());
> +}
> +return true;
> +}
> +
>  void qemu_init(int argc, char **argv, char **envp)
>  {
>  QemuOpts *opts;
> --
> 2.36.1
>
>
>

Re: [PATCH v5 4/6] qapi/device_add: compute is_hotplug flag

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, May 19, 2022 at 11:37 PM Damien Hedde 
wrote:

> Instead of checking the phase everytime, just store the result
> in a flag. We will use more of it in the following commit.
>
> Signed-off-by: Damien Hedde 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  softmmu/qdev-monitor.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 12fe60c467..d68ef883b5 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -619,6 +619,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict
> *opts,
>  char *id;
>  DeviceState *dev = NULL;
>  BusState *bus = NULL;
> +bool is_hotplug = phase_check(PHASE_MACHINE_READY);
>
>  driver = qdict_get_try_str(opts, "driver");
>  if (!driver) {
> @@ -662,7 +663,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict
> *opts,
>  return NULL;
>  }
>
> -if (phase_check(PHASE_MACHINE_READY) && bus &&
> !qbus_is_hotpluggable(bus)) {
> +if (is_hotplug && bus && !qbus_is_hotpluggable(bus)) {
>  error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);
>  return NULL;
>  }
> @@ -676,7 +677,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict
> *opts,
>  dev = qdev_new(driver);
>
>  /* Check whether the hotplug is allowed by the machine */
> -if (phase_check(PHASE_MACHINE_READY)) {
> +if (is_hotplug) {
>  if (!qdev_hotplug_allowed(dev, errp)) {
>  goto err_del_dev;
>  }
> --
> 2.36.1
>
>
>

Re: [PATCH v5 6/6] qapi/device_add: Allow execution in machine initialized phase

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, May 19, 2022 at 11:37 PM Damien Hedde 
wrote:

> From: Mirela Grujic 
>
> This commit allows to use the QMP command to add a cold-plugged
> device like we can do with the CLI option -device.
>
> Note: for device_add command in qdev.json adding the 'allow-preconfig'
> option has no effect because the command appears to bypass QAPI (see
> TODO at qapi/qdev.json:61). The option is added there solely to
> document the intent.
> For the same reason, the flags have to be explicitly set in
> monitor_init_qmp_commands() when the device_add command is registered.
>
> Signed-off-by: Mirela Grujic 
> Signed-off-by: Damien Hedde 
> ---
>
> v4:
>  + use phase_until()
>  + add missing flag in hmp-commands.hx
> ---
>  qapi/qdev.json | 3 ++-
>  monitor/misc.c | 2 +-
>  softmmu/qdev-monitor.c | 4 
>  hmp-commands.hx| 1 +
>  4 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 26cd10106b..2e2de41499 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -77,7 +77,8 @@
>  { 'command': 'device_add',
>'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
>'gen': false, # so we can get the additional arguments
> -  'features': ['json-cli', 'json-cli-hotplug'] }
> +  'features': ['json-cli', 'json-cli-hotplug'],
> +  'allow-preconfig': true }
>
>  ##
>  # @device_del:
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 6c5bb82d3b..d3d413d70c 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -233,7 +233,7 @@ static void monitor_init_qmp_commands(void)
>  qmp_init_marshal(&qmp_commands);
>
>  qmp_register_command(&qmp_commands, "device_add",
> - qmp_device_add, 0, 0);
> + qmp_device_add, QCO_ALLOW_PRECONFIG, 0);
>
>  QTAILQ_INIT(&qmp_cap_negotiation_commands);
>  qmp_register_command(&qmp_cap_negotiation_commands,
> "qmp_capabilities",
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 7cbee2b0d8..c53f62be51 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -855,6 +855,10 @@ void qmp_device_add(QDict *qdict, QObject **ret_data,
> Error **errp)
>  QemuOpts *opts;
>  DeviceState *dev;
>
> +if (!phase_until(PHASE_MACHINE_INITIALIZED, errp)) {
> +return;
> +}
> +
>  opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, errp);
>  if (!opts) {
>  return;
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 03e6a73d1f..0091b8e2dd 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -672,6 +672,7 @@ ERST
>  .help   = "add device, like -device on the command line",
>  .cmd= hmp_device_add,
>  .command_completion = device_add_completion,
> +.flags  = "p",
>  },
>
>  SRST
> --
> 2.36.1
>
>
>

Re: [PATCH v5 1/6] machine: add phase_get() and document phase_check()/advance()

2022-05-24 Thread Jim Shu

Tested-by: Jim Shu 

On Thu, May 19, 2022 at 11:41 PM Damien Hedde 
wrote:

> phase_get() returns the current phase, we'll use it in next
> commit.
>
> Signed-off-by: Damien Hedde 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/qdev-core.h | 19 +++
>  hw/core/qdev.c |  5 +
>  2 files changed, 24 insertions(+)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 92c3d65208..e29c705b74 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -887,7 +887,26 @@ typedef enum MachineInitPhase {
>  PHASE_MACHINE_READY,
>  } MachineInitPhase;
>
> +/*
> + * phase_get:
> + * Returns the current phase
> + */
> +MachineInitPhase phase_get(void);
> +
> +/**
> + * phase_check:
> + * Test if current phase is at least @phase.
> + *
> + * Returns true if this is the case.
> + */
>  extern bool phase_check(MachineInitPhase phase);
> +
> +/**
> + * @phase_advance:
> + * Update the current phase to @phase.
> + *
> + * Must only be used to make a single phase step.
> + */
>  extern void phase_advance(MachineInitPhase phase);
>
>  #endif
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 84f3019440..632dc0a4be 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -910,6 +910,11 @@ Object *qdev_get_machine(void)
>
>  static MachineInitPhase machine_phase;
>
> +MachineInitPhase phase_get(void)
> +{
> +return machine_phase;
> +}
> +
>  bool phase_check(MachineInitPhase phase)
>  {
>  return machine_phase >= phase;
> --
> 2.36.1
>
>
>

Re: [PATCH v4 00/14] Initial support for machine creation via QMP

2022-05-24 Thread Jim Shu

Hi all,

Thanks for the work!

I'm from SiFive and we are very interested in this feature.
QMP/QAPI configurable QEMU machine is a useful feature in our use case.
With this feature, we can both model our versatile FPGA-based platforms
more easily and model a new platform without modification of source code.
It is helpful for early software development of SoC prototyping.
We think this feature is also helpful to the QEMU community.

Also, I have tested this patchset (v4) and newer v5 patchset [1] with
Damien's firmware [2] and it works correctly.

p.s. QMP option "-qmp socket,path=./qmpsocket,server" in v5 patchset
instruction may not work?
I use the option "-qmp unix:./qmpsocket,server" instead.

[1] [PATCH v5 0/6] QAPI support for device cold-plug
https://lore.kernel.org/qemu-devel/20220519153402.41540-1-damien.he...@greensocs.com/

[2] Test firmware for patchset
v5: https://github.com/GreenSocs/qemu-qmp-machines/tree/master/arm-virt
v4:
https://github.com/GreenSocs/qemu-qmp-machines/tree/eba16dab8b587e624d65c5c302aeef424bece3a0

On Thu, Mar 3, 2022 at 7:02 PM Damien Hedde 
wrote:

> Ping !
>
> It would be good to have some feedback on 1st and 2nd part.
>
> Thanks,
> Damien
>
> On 2/23/22 10:06, Damien Hedde wrote:
> > Hi,
> >
> > This series adds initial support to build a machine using QMP/QAPI
> > commands. With this series, one can start from the 'none' machine,
> > create cpus, sysbus devices, memory map them and wire interrupts.
> >
> > Sorry for the huge cc list on this cover-letter. Apart from people
> > who attended the kvm call about this topic, I've cc'ed you only
> > according to MAINTAINERS file.
> >
> > The series is divided in 4 parts which are independent of each other,
> > but we need the 4 parts to be able to use this mechanism:
> > + Patches 1 to 6 allow to use the qapi command device_add to cold
> >plug devices (like CLI -device do)
> > + Patches 7 to 10 modify the 'none' machine which serves as base
> >machine.
> > + Patches 11 to 13 handle memory mapping and memory creation
> > + Patches 14 allows dynamic cold plug of opentitan/sifive_e machine
> >to build some example. This last patch is based on a cleanup
> >series: it probably works without it, but some config errors are
> >not handled (see based-on below).
> >
> > Only patch 11 is reviewed-by.
> >
> > v4:
> > + cold plugging approach changed in order not to conflict with
> >startup. I do not add additional command to handle this so that
> >we can change everything easily.
> > + device_add in cold plug context is also now equivalent to -device
> >CLI regarding -fw_cfg. I also added patches to modify the 'none'
> >machine.
> > + reworked most of the none machine part
> > + updated the sybus-mmio-map command patch
> >
> > Note that there are still lot of limitations (for example if you try
> > to create more cpus than the _max_cpus_, tcg will abort()).
> > Basically all tasks done by machine init reading some parameters are
> > really tricky: for example, loading complex firmware. But we have to
> > start by something and all this is not accessible unless the user
> > asked for none machine and -preconfig.
> >
> > I can maintain the code introduced here. I'm not sure what's the
> > process. Is there something else to do than propose a patch to
> > MAINTAINERS ?
> > If there is a global agreement on moving on with these feature, it
> > would be great to have a login on qemu wiki so I can document
> > limitations and the work being done to solve them.
> >
> > A simple test can be done with the following scenario which build
> > a machine subset of the opentitan.
> >
> > $ cat commands.qmp
> > // RAM 0x1000
> > device_add driver=sysbus-memory id=ram size=0x4000 readonly=false
> > sysbus-mmio-map device=ram addr=268435456
> > // CPUS
> > device_add driver=riscv.hart_array id=cpus
> cpu-type=lowrisc-ibex-riscv-cpu num-harts=1 resetvec=0x8080
> > // ROM 0x8000
> > device_add driver=sysbus-memory id=rom size=0x4000 readonly=true
> > sysbus-mmio-map device=rom addr=32768
> > // PLIC 0x4800
> > device_add driver=riscv.sifive.plic id=plic hart-config=M hartid-base=0
> num-sources=180 num-priorities=3 priority-base=0x0 pending-base=0x1000
> enable-base=0x2000 enable-stride=32 context-base=0x20 context-stride=8
> aperture-size=0x4005000
> > sysbus-mmio-map device=plic addr=1207959552
> > qom-set path=plic property=unnamed-gpio-out[1]
> value=cpus/harts[0]/unnamed-gpio-in[11]
> > // UART 0x4000
> > device_add driver=ibex-uart id=uart chardev=serial0
> > sysbus-mmio-map device=uart addr=1073741824
> > qom-set path=uart property=sysbus-irq[1] value=plic/unnamed-gpio-in[2]
> > // FIRMWARE
> > device_add driver=loader cpu-num=0 file=/path/to/firmware.elf
> > x-exit-preconfig
> >
> > $ qemu-system-riscv32 -display none -M none -preconfig -serial stdio
> -qmp unix:/tmp/qmp-sock,server
> >
> > In another terminal, you'll need to send the commands with, for example:
> > $ grep -v '^//' commands.qmp

Re: [PATCH 0/2] i386: fixup number of logical CPUs when host-cache-info=on

2022-05-24 Thread Moger, Babu



On 5/24/22 10:19, Igor Mammedov wrote:
> On Tue, 24 May 2022 11:10:18 -0400
> Igor Mammedov  wrote:
>
> CCing AMD folks as that might be of interest to them

I am trying to recreate the bug on my AMD system here.. Seeing this message..

qemu-system-x86_64: -numa node,nodeid=0,memdev=ram-node0: memdev=ram-node0
is ambiguous

Here is my command line..

#qemu-system-x86_64 -name rhel8 -m 4096 -hda vdisk.qcow2 -enable-kvm -net
nic  -nographic -machine q35,accel=kvm -cpu
host,host-cache-info=on,l3-cache=off -smp
20,sockets=2,dies=1,cores=10,threads=1 -numa
node,nodeid=0,memdev=ram-node0 -numa node,nodeid=1,memdev=ram-node1 -numa
cpu,socket-id=0,node-id=0 -numa cpu,socket-id=1,node-id=1

Am I missing something?


>
>> Igor Mammedov (2):
>>   x86: cpu: make sure number of addressable IDs for processor cores
>> meets the spec
>>   x86: cpu: fixup number of addressable IDs for logical processors
>> sharing cache
>>
>>  target/i386/cpu.c | 20 
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
-- 
Thanks
Babu Moger

Re: [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error()

2022-05-24 Thread Vladimir Sementsov-Ogievskiy


On 5/19/22 17:19, Konstantin Khlebnikov wrote:

Repalce virtio_error() with macro which forms structured Error and
reports it as device runtime-error in addition to present actions.

Signed-off-by: Konstantin Khlebnikov 
---
  hw/virtio/virtio.c |9 +++--
  include/hw/virtio/virtio.h |   10 +-
  2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..638d779bf2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3642,13 +3642,10 @@ void virtio_device_set_child_bus_name(VirtIODevice 
*vdev, char *bus_name)
  vdev->bus_name = g_strdup(bus_name);
  }
  
-void G_GNUC_PRINTF(2, 3) virtio_error(VirtIODevice *vdev, const char *fmt, ...)

+void virtio_fatal_error(VirtIODevice *vdev, Error *err)
  {
-va_list ap;
-
-va_start(ap, fmt);
-error_vreport(fmt, ap);
-va_end(ap);
+qdev_report_runtime_error(&vdev->parent_obj, err);
+error_report_err(err);
  
  if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {

  vdev->status = vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..a165e35b0b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -16,6 +16,7 @@
  
  #include "exec/memory.h"

  #include "hw/qdev-core.h"
+#include "qapi/error.h"
  #include "net/net.h"
  #include "migration/vmstate.h"
  #include "qemu/event_notifier.h"
@@ -172,7 +173,14 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, 
size_t config_size);
  
  void virtio_cleanup(VirtIODevice *vdev);
  
-void virtio_error(VirtIODevice *vdev, const char *fmt, ...) G_GNUC_PRINTF(2, 3);

+#define virtio_error(vdev, fmt, ...) {  \
+Error *_err = NULL; \
+error_setg(&_err, (fmt), ## __VA_ARGS__);   \
+virtio_fatal_error(vdev, _err); \
+} while (0)
+
+/* Reports and frees error, breaks device */
+void virtio_fatal_error(VirtIODevice *vdev, Error *err);
  
  /* Set the child bus name. */

  void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name);



Hmm. So we create temporary Error object just to pass it to 
qdev_report_runtime_error..

I think we can avoid introducing this intermediate Error object together with 
new macro:
just convert argument list to string with help of g_strdup_vprintf() in 
original virtio_error(), error_report this string and pass to 
qdev_report_runtime_error() (which should be simplified to get just a string).

--
Best regards,
Vladimir

Re: [PATCH 3/3] virtio_balloon: Introduce memory recover

2022-05-24 Thread Sean Christopherson

On Fri, May 20, 2022, zhenwei pi wrote:
> @@ -59,6 +60,12 @@ enum virtio_balloon_config_read {
>   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
>  };
>  
> +/* the request body to commucate with host side */
> +struct __virtio_balloon_recover {
> + struct virtio_balloon_recover vbr;
> + __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE];

I assume this is copied from virtio_balloon.pfns, which also uses __virtio32, 
but
isn't that horribly broken?  PFNs are 'unsigned long', i.e. 64 bits on 64-bit 
kernels.
x86-64 at least most definitely generates 64-bit PFNs.  Unless there's magic I'm
missing, page_to_balloon_pfn() will truncate PFNs and feed the host bad info.

> @@ -494,6 +511,198 @@ static void update_balloon_size_func(struct work_struct 
> *work)
>   queue_work(system_freezable_wq, work);
>  }
>  
> +/*
> + * virtballoon_memory_failure - notified by memory failure, try to fix the
> + *  corrupted page.
> + * The memory failure notifier is designed to call back when the kernel 
> handled
> + * successfully only, WARN_ON_ONCE on the unlikely condition to find out any
> + * error(memory error handling is a best effort, not 100% coverd).
> + */
> +static int virtballoon_memory_failure(struct notifier_block *notifier,
> +   unsigned long pfn, void *parm)
> +{
> + struct virtio_balloon *vb = container_of(notifier, struct 
> virtio_balloon,
> +  memory_failure_nb);
> + struct page *page;
> + struct __virtio_balloon_recover *out_vbr;
> + struct scatterlist sg;
> + unsigned long flags;
> + int err;
> +
> + page = pfn_to_online_page(pfn);
> + if (WARN_ON_ONCE(!page))
> + return NOTIFY_DONE;
> +
> + if (PageHuge(page))
> + return NOTIFY_DONE;
> +
> + if (WARN_ON_ONCE(!PageHWPoison(page)))
> + return NOTIFY_DONE;
> +
> + if (WARN_ON_ONCE(page_count(page) != 1))
> + return NOTIFY_DONE;
> +
> + get_page(page); /* balloon reference */
> +
> + out_vbr = kzalloc(sizeof(*out_vbr), GFP_KERNEL);
> + if (WARN_ON_ONCE(!out_vbr))
> + return NOTIFY_BAD;

Not that it truly matters, but won't failure at this point leak the poisoned 
page?

Re: [PATCH v3 4/8] hmp: add basic "info stats" implementation

2022-05-24 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> From: Mark Kanda 
> 
> Add an HMP command to retrieve statistics collected at run-time.
> The command will retrieve and print either all VM-level statistics,
> or all vCPU-level statistics for the currently selected CPU.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hmp-commands-info.hx  |  13 +++
>  include/monitor/hmp.h |   1 +
>  monitor/hmp-cmds.c| 187 ++
>  3 files changed, 201 insertions(+)

One minor comment below...

Reviewed-by: Dr. David Alan Gilbert 

> 
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index adfa085a9b..221feab8c0 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -894,3 +894,16 @@ SRST
>``info via``
>  Show guest mos6522 VIA devices.
>  ERST
> +
> +{
> +.name   = "stats",
> +.args_type  = "target:s",
> +.params = "target",
> +.help   = "show statistics; target is either vm or vcpu",
> +.cmd= hmp_info_stats,
> +},
> +
> +SRST
> +  ``stats``
> +Show runtime-collected statistics
> +ERST
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 96d014826a..2e89a97bd6 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -133,5 +133,6 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict 
> *qdict);
>  void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
>  void hmp_human_readable_text_helper(Monitor *mon,
>  HumanReadableText *(*qmp_handler)(Error 
> **));
> +void hmp_info_stats(Monitor *mon, const QDict *qdict);
>  
>  #endif
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 93061a11af..5950133a11 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -40,6 +40,7 @@
>  #include "qapi/qapi-commands-pci.h"
>  #include "qapi/qapi-commands-rocker.h"
>  #include "qapi/qapi-commands-run-state.h"
> +#include "qapi/qapi-commands-stats.h"
>  #include "qapi/qapi-commands-tpm.h"
>  #include "qapi/qapi-commands-ui.h"
>  #include "qapi/qapi-visit-net.h"
> @@ -52,6 +53,7 @@
>  #include "ui/console.h"
>  #include "qemu/cutils.h"
>  #include "qemu/error-report.h"
> +#include "hw/core/cpu.h"
>  #include "hw/intc/intc.h"
>  #include "migration/snapshot.h"
>  #include "migration/misc.h"
> @@ -2233,3 +2235,188 @@ void hmp_info_memory_size_summary(Monitor *mon, const 
> QDict *qdict)
>  }
>  hmp_handle_error(mon, err);
>  }
> +
> +static void print_stats_schema_value(Monitor *mon, StatsSchemaValue *value)
> +{
> +const char *prefix = "";
> +monitor_printf(mon, "%s (%s", value->name, 
> StatsType_str(value->type));
> +
> +if (value->has_unit && value->unit == STATS_UNIT_SECONDS &&
> +(value->exponent == 0 || value->base == 10) &&
> +value->exponent >= -9 && value->exponent <= 0 &&
> +value->exponent % 3 == 0) {
> +
> +static const char *si_prefix[] = { "", "milli", "micro", "nano" };
> +prefix = si_prefix[value->exponent / -3];
> +
> +} else if (value->has_unit && value->unit == STATS_UNIT_BYTES &&
> +   (value->exponent == 0 || value->base == 2) &&
> +   value->exponent >= 0 && value->exponent <= 40 &&
> +   value->exponent % 10 == 0) {
> +
> +static const char *si_prefix[] = {
> +"", "kilo", "mega", "giga", "tera" };

Could we add this list, and the second scale one to something general,
say util/cutils.c?  There's already a size_to_str and freq_to_str in
there.

Dave

> +prefix = si_prefix[value->exponent / 10];
> +
> +} else if (value->exponent) {
> +/* Print the base and exponent as "x ^" */
> +monitor_printf(mon, ", * %d^%d", value->base,
> +   value->exponent);
> +}
> +
> +if (value->has_unit) {
> +monitor_printf(mon, " %s%s", prefix, StatsUnit_str(value->unit));
> +}
> +
> +/* Print bucket size for linear histograms */
> +if (value->type == STATS_TYPE_LINEAR_HISTOGRAM && 
> value->has_bucket_size) {
> +monitor_printf(mon, ", bucket size=%d", value->bucket_size);
> +}
> +monitor_printf(mon, ")");
> +}
> +
> +static StatsSchemaValueList *find_schema_value_list(
> +StatsSchemaList *list, StatsProvider provider,
> +StatsTarget target)
> +{
> +StatsSchemaList *node;
> +
> +for (node = list; node; node = node->next) {
> +if (node->value->provider == provider &&
> +node->value->target == target) {
> +return node->value->stats;
> +}
> +}
> +return NULL;
> +}
> +
> +static void print_stats_results(Monitor *mon, StatsTarget target,
> +StatsResult *result,
> +StatsSchemaList *schema)
> +{
> +/* Find provider schema */
> +StatsSchemaValueList *schema_value_list =
> +find_schema_value_list(schema, result->provider, target);
> +StatsList *stats_list;
> +
> +

[PATCH v6 6/8] s390x/pci: enable adapter event notification for interpreted devices

2022-05-24 Thread Matthew Rosato

Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 20 ++---
 hw/s390x/s390-pci-inst.c| 40 +++--
 hw/s390x/s390-pci-kvm.c | 30 +
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 14 
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 816d17af99..e66a0dfbef 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 rc = SCLP_RC_NO_ACTION_REQUIRED;
 break;
 default:
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1082,6 +1085,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 } else {
 DPRINTF("zPCI interpretation facilities missing.\n");
 pbdev->interp = false;
+pbdev->forwarding_assist = false;
 }
 }
 pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
@@ -1090,11 +1094,13 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (!pbdev->interp) {
 /* Do vfio passthrough but intercept for I/O */
 pbdev->fh |= FH_SHM_VFIO;
+pbdev->forwarding_assist = false;
 }
 } else {
 pbdev->fh |= FH_SHM_EMUL;
 /* Always intercept emulated devices */
 pbdev->interp = false;
+pbdev->forwarding_assist = false;
 }
 
 if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1244,7 +1250,10 @@ static void s390_pcihost_reset(DeviceState *dev)
 /* Process all pending unplug requests */
 QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
 if (pbdev->unplug_requested) {
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1382,7 +1391,10 @@ static void s390_pci_device_reset(DeviceState *dev)
 break;
 }
 
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1428,6 +1440,8 @@ static Property s390_pci_device_properties[] = {
 DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
 DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
 DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
+DEFINE_PROP_BOOL("forwarding_assist", S390PCIBusDevice, forwarding_assist,
+ true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 651ec38635..20a9bcc7af 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1066,6 +1066,32 @@ static void fmb_update(void *opaque)
 timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+rc = s390_pci_kvm_aif_enable(pbdev, fib, pbdev->forwarding_assist);
+if (rc) {
+DPRINTF("Failed to enable interrupt forwarding\n");
+return rc;
+}
+
+return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+rc = s390_pci_kvm_aif_disable(pbdev);
+if (rc) {
+DPRINTF("Failed to disable interrupt forwarding\n");
+return rc;
+}
+
+return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 uintptr_t ra)
 {
@@ -1120,7 +1146,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, 
uint64_t fiba, uint8_t ar,
 
 switch (oc) {
 case ZPCI_MOD_FC_REG_INT:
-if (pbdev->summary_ind) {
+if (pbdev->interp) {
+if (mpcifc_reg_int_interp(pbdev, &fib)) {
+cc = ZPCI_PCI_LS_ERR;
+s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+}
+

Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event

2022-05-24 Thread Vladimir Sementsov-Ogievskiy


First, cover letter is absent. Konstantin, could you please provide a 
description what the whole series does?

Second, add maintainers to CC:
+Micheal
+Eric
+Markus

On 5/19/22 17:19, Konstantin Khlebnikov wrote:

This event represents device runtime errors to give time and
reason why device is broken.

Signed-off-by: Konstantin Khlebnikov 
---


The patch itself seems good to me:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

[PATCH v6 7/8] s390x/pci: let intercept devices have separate PCI groups

2022-05-24 Thread Matthew Rosato

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 19 ++--
 hw/s390x/s390-pci-vfio.c| 40 ++---
 include/hw/s390x/s390-pci-bus.h |  6 -
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index e66a0dfbef..5342f7899f 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -748,13 +748,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus 
*bus, int32_t devfn)
 object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
 S390PCIGroup *group;
 S390pciState *s = s390_get_phb();
 
 group = g_new0(S390PCIGroup, 1);
 group->id = id;
+group->host_id = host_id;
 QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
 return group;
 }
@@ -772,12 +773,25 @@ S390PCIGroup *s390_group_find(int id)
 return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+S390PCIGroup *group;
+S390pciState *s = s390_get_phb();
+
+QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+return group;
+}
+}
+return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
 S390PCIGroup *group;
 ClpRspQueryPciGrp *resgrp;
 
-group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
 resgrp = &group->zpci_group;
 resgrp->fr = 1;
 resgrp->dasm = 0;
@@ -825,6 +839,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
NULL, g_free);
 s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
 s->bus_no = 0;
+s->next_sim_grp = ZPCI_SIM_GRP_START;
 QTAILQ_INIT(&s->pending_sei);
 QTAILQ_INIT(&s->zpci_devs);
 QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 4bf0a7e22d..985980f021 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -150,13 +150,18 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
 struct vfio_info_cap_header *hdr;
 struct vfio_device_info_cap_zpci_group *cap;
+S390pciState *s = s390_get_phb();
 ClpRspQueryPciGrp *resgrp;
 VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+uint8_t start_gid = pbdev->zpci_fn.pfgid;
 
 hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-/* If capability not provided, just use the default group */
-if (hdr == NULL) {
+/*
+ * If capability not provided or the underlying hostdev is simulated, just
+ * use the default group.
+ */
+if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
 trace_s390_pci_clp_cap(vpci->vbasedev.name,
VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -165,11 +170,40 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 }
 cap = (void *) hdr;
 
+/*
+ * For an intercept device, let's use an existing simulated group if one
+ * one was already created for other intercept devices in this group.
+ * If not, create a new simulated group if any are still available.
+ * If all else fails, just fall back on the default group.
+ */
+if (!pbdev->interp) {
+pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+if (pbdev->pci_group) {
+/* Use existing simulated group */
+pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+return;
+} else {
+if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+/* All out of simulated groups, use default */
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+return;
+} else {
+/* We can assign a new simulated group */
+pbdev->zpci_fn.pfgid = s->next_sim_grp;
+s->next_sim_grp++;
+/* Fall through to create the new sim group using CLP info */
+}
+}
+}
+
 /* See if the PC

[PATCH v6 5/8] s390x/pci: don't fence interpreted devices without MSI-X

2022-05-24 Thread Matthew Rosato

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Reviewed-by: Thomas Huth 
Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 156051e6e9..816d17af99 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -881,6 +881,10 @@ static int s390_pci_msix_init(S390PCIBusDevice *pbdev)
 
 static void s390_pci_msix_free(S390PCIBusDevice *pbdev)
 {
+if (pbdev->msix.entries == 0) {
+return;
+}
+
 memory_region_del_subregion(&pbdev->iommu->mr, &pbdev->msix_notify_mr);
 object_unparent(OBJECT(&pbdev->msix_notify_mr));
 }
@@ -1093,7 +1097,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 pbdev->interp = false;
 }
 
-if (s390_pci_msix_init(pbdev)) {
+if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
 error_setg(errp, "MSI-X support is mandatory "
"in the S390 architecture");
 return;
-- 
2.27.0

[PATCH v6 4/8] s390x/pci: enable for load/store intepretation

2022-05-24 Thread Matthew Rosato

If the appropriate CPU facilty is available as well as the necessary
ZPCI_OP ioctl, then the underlying KVM host will enable load/store
intepretation for any guest device without a SHM bit in the guest
function handle.  For a device that will be using interpretation
support, ensure the guest function handle matches the host function
handle; this value is re-checked every time the guest issues a SET PCI FN
to enable the guest device as it is the only opportunity to reflect
function handle changes.

By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel.  When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/meson.build|  1 +
 hw/s390x/s390-pci-bus.c | 66 -
 hw/s390x/s390-pci-inst.c| 16 
 hw/s390x/s390-pci-kvm.c | 23 
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 24 
 target/s390x/kvm/kvm.c  |  7 
 target/s390x/kvm/kvm_s390x.h|  1 +
 8 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index feefe0717e..f291016fee 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
   'pv.c',
+  's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tod-tcg.c',
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 4b2bdd94b3..156051e6e9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -16,6 +16,7 @@
 #include "qapi/visitor.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -971,12 +972,51 @@ static void s390_pci_update_subordinate(PCIDevice *dev, 
uint32_t nr)
 }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+uint32_t idx, fh;
+
+if (!s390_pci_get_host_fh(pbdev, &fh)) {
+return -EPERM;
+}
+
+/*
+ * The host device is already in an enabled state, but we always present
+ * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+ * Therefore, mask off the enable bit from the passthrough handle until
+ * the guest issues a CLP SET PCI FN later to enable the device.
+ */
+pbdev->fh = fh & ~FH_MASK_ENABLE;
+
+/* Next, see if the idx is already in-use */
+idx = pbdev->fh & FH_MASK_INDEX;
+if (pbdev->idx != idx) {
+if (s390_pci_find_dev_by_idx(s, idx)) {
+return -EINVAL;
+}
+/*
+ * Update the idx entry with the passed through idx
+ * If the relinquished idx is lower than next_idx, use it
+ * to replace next_idx
+ */
+g_hash_table_remove(s->zpci_table, &pbdev->idx);
+if (idx < s->next_idx) {
+s->next_idx = idx;
+}
+pbdev->idx = idx;
+g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+}
+
+return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
   Error **errp)
 {
 S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
 PCIDevice *pdev = NULL;
 S390PCIBusDevice *pbdev = NULL;
+int rc;
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
 PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1062,35 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 set_pbdev_info(pbdev);
 
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-pbdev->fh |= FH_SHM_VFIO;
+/*
+ * By default, interpretation is always requested; if the available
+ * facilities indicate it is not available, fallback to the
+ * interception model.
+ */
+if (pbdev->interp) {
+if (s390_pci_kvm_interp_allowed()) {
+rc = s390_pci_interp_plug(s, pbdev);
+if (rc) {
+error_setg(errp, "Plug failed for zPCI device in "
+   "interpretation mode: %d", rc);
+return;
+}
+} else {
+DPRINTF("zPCI interpretation facilities missing.\n");
+pbdev->interp = false;
+}
+}
 pbdev->iommu->dma_limit = s3

[PATCH v6 8/8] s390x/pci: reflect proper maxstbl for groups of interpreted devices

2022-05-24 Thread Matthew Rosato

The maximum supported store block length might be different depending
on whether the instruction is interpretively executed (firmware-reported
maximum) or handled via userspace intercept (host kernel API maximum).
Choose the best available value during group creation.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 985980f021..212dd053f7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 resgrp->msia = cap->msi_addr;
 resgrp->mui = cap->mui;
 resgrp->i = cap->noi;
-resgrp->maxstbl = cap->maxstbl;
+if (pbdev->interp && hdr->version >= 2) {
+resgrp->maxstbl = cap->imaxstbl;
+} else {
+resgrp->maxstbl = cap->maxstbl;
+}
 resgrp->version = cap->version;
 resgrp->dtsm = ZPCI_DTSM;
 }
-- 
2.27.0

[PATCH v6 0/8] s390x/pci: zPCI interpretation support

2022-05-24 Thread Matthew Rosato

For QEMU, the majority of the work in enabling instruction interpretation   
is handled via SHM bit settings (to indicate to firmware whether or not
interpretive execution facilities are to be used) + a new KVM ioctl is
used to setup firmware-interpreted forwarding of Adapter Event
Notifications.

This series also adds a new, optional 'interpret' parameter to zpci which   
can be used to disable interpretation support (interpret=off) as well as
an 'forwarding_assist' parameter to determine whether or not the firmware   
assist will be used for adapter event delivery (default when
interpretation is in use) or whether the host will be responsible for
delivering all adapter event notifications (forwarding_assist=off).

The ZPCI_INTERP CPU feature is added beginning with the z14 model to
enable this support.

As a consequence of implementing zPCI interpretation, ISM devices now   
become eligible for passthrough (but only when zPCI interpretation is   
available). 

>From the perspective of guest configuration, you passthrough zPCI devices  
> 
in the same manner as before, with intepretation support being used by  
default if available in kernel+qemu.

Associated kernel series:   
https://lore.kernel.org/kvm/20220524185907.140285-1-mjros...@linux.ibm.com/ 

   
Changelog v5->v6:
- Update linux headers (KVM_CAP_S390_ZPCI_OP changed)
- Move featoff to ccw_machine_7_0_instance_options() (Thomas)
- s390_pci_get_host_fh: s/unsigned int/uint32_t/ (Thomas)
- s390_pci_kvm_interp_allowed: add !s390_is_pv() check (Pierre)
- Fail guest SET PCI FN (enable) if we cannot get the host fh
  or if the retrieved host FH is not enabled (Pierre)
- bugfix: don't free msix if we never initialized it

Matthew Rosato (8):
  Update linux headers
  target/s390x: add zpci-interp to cpu models
  s390x/pci: add routine to get host function handle from CLP info
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: let intercept devices have separate PCI groups
  s390x/pci: reflect proper maxstbl for groups of interpreted devices

 hw/s390x/meson.build|   1 +
 hw/s390x/s390-pci-bus.c | 111 ++--
 hw/s390x/s390-pci-inst.c|  56 +++-
 hw/s390x/s390-pci-kvm.c |  53 
 hw/s390x/s390-pci-vfio.c| 129 +++-
 hw/s390x/s390-virtio-ccw.c  |   1 +
 include/hw/s390x/s390-pci-bus.h |   8 +-
 include/hw/s390x/s390-pci-kvm.h |  38 
 include/hw/s390x/s390-pci-vfio.h|   5 ++
 linux-headers/asm-s390/kvm.h|   1 +
 linux-headers/linux/kvm.h   |  32 +++
 linux-headers/linux/vfio.h  |   4 +-
 linux-headers/linux/vfio_zdev.h |   7 ++
 target/s390x/cpu_features_def.h.inc |   1 +
 target/s390x/gen-features.c |   2 +
 target/s390x/kvm/kvm.c  |   8 ++
 target/s390x/kvm/kvm_s390x.h|   1 +
 17 files changed, 426 insertions(+), 32 deletions(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

-- 
2.27.0

[PATCH v6 3/8] s390x/pci: add routine to get host function handle from CLP info

2022-05-24 Thread Matthew Rosato

In order to interface with the underlying host zPCI device, we need
to know it's function handle.  Add a routine to grab this from the
vfio CLP capabilities chain.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c | 83 ++--
 include/hw/s390x/s390-pci-vfio.h |  5 ++
 2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e29..4bf0a7e22d 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
 pbdev->zpci_fn.pft = 0;
 }
 
+static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
+uint32_t *fh)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_device_info_cap_zpci_base *cap;
+VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+/* Can only get the host fh with version 2 or greater */
+if (hdr == NULL || hdr->version < 2) {
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+return false;
+}
+cap = (void *) hdr;
+
+*fh = cap->fh;
+return true;
+}
+
 static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 struct vfio_device_info *info)
 {
@@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
 memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
 }
 
-/*
- * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
- * capabilities that contain information about CLP features provided by the
- * underlying host.
- * On entry, defaults have already been placed into the guest CLP response
- * buffers.  On exit, defaults will have been overwritten for any CLP features
- * found in the capability chain; defaults will remain for any CLP features not
- * found in the chain.
- */
-void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
+uint32_t argsz)
 {
-g_autofree struct vfio_device_info *info = NULL;
+struct vfio_device_info *info = g_malloc0(argsz);
 VFIOPCIDevice *vfio_pci;
-uint32_t argsz;
 int fd;
 
-argsz = sizeof(*info);
-info = g_malloc0(argsz);
-
 vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
 fd = vfio_pci->vbasedev.fd;
 
@@ -250,7 +259,8 @@ retry:
 
 if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
 trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
-return;
+free(info);
+return NULL;
 }
 
 if (info->argsz > argsz) {
@@ -259,6 +269,47 @@ retry:
 goto retry;
 }
 
+return info;
+}
+
+/*
+ * Get the host function handle from the vfio CLP capabilities chain.  Returns
+ * true if a fh value was placed into the provided buffer.  Returns false
+ * if a fh could not be obtained (ioctl failed or capabilitiy version does
+ * not include the fh)
+ */
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+g_autofree struct vfio_device_info *info = NULL;
+
+assert(fh);
+
+info = get_device_info(pbdev, sizeof(*info));
+if (!info) {
+return false;
+}
+
+return get_host_fh(pbdev, info, fh);
+}
+
+/*
+ * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
+ * capabilities that contain information about CLP features provided by the
+ * underlying host.
+ * On entry, defaults have already been placed into the guest CLP response
+ * buffers.  On exit, defaults will have been overwritten for any CLP features
+ * found in the capability chain; defaults will remain for any CLP features not
+ * found in the chain.
+ */
+void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+{
+g_autofree struct vfio_device_info *info = NULL;
+
+info = get_device_info(pbdev, sizeof(*info));
+if (!info) {
+return;
+}
+
 /*
  * Find the CLP features provided and fill in the guest CLP responses.
  * Always call s390_pci_read_base first as information from this could
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef50..ae1b126ff7 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
   S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +34,10 @@ static inline

[PATCH v6 2/8] target/s390x: add zpci-interp to cpu models

2022-05-24 Thread Matthew Rosato

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio-ccw.c  | 1 +
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c | 2 ++
 target/s390x/kvm/kvm.c  | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 047cca0487..b33310a135 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -806,6 +806,7 @@ static void ccw_machine_7_0_instance_options(MachineState 
*machine)
 static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_0 };
 
 ccw_machine_7_1_instance_options(machine);
+s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
 s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
 }
 
diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: 
Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general 
registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index c03ec2c9a9..f991646c01 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
 S390_FEAT_HPMA2,
 S390_FEAT_SIE_KSS,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
 S390_FEAT_GROUP_MSA_EXT_8,
 S390_FEAT_MULTIPLE_EPOCH,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 53098bf541..314b0a9039 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2293,6 +2293,7 @@ static int kvm_to_feat[][2] = {
 { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
 { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
 { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+{ KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0

[PATCH v6 1/8] Update linux headers

2022-05-24 Thread Matthew Rosato

This is a placeholder that pulls in unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato 
---
 linux-headers/asm-s390/kvm.h|  1 +
 linux-headers/linux/kvm.h   | 32 
 linux-headers/linux/vfio.h  |  4 ++--
 linux-headers/linux/vfio_zdev.h |  7 +++
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index f053b8304a..d8259ff9a1 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
 #define KVM_S390_VM_CPU_FEAT_PFMFI 11
 #define KVM_S390_VM_CPU_FEAT_SIGPIF12
 #define KVM_S390_VM_CPU_FEAT_KSS   13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
 struct kvm_s390_vm_cpu_feat {
__u64 feat[16];
 };
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 0d05d02ee4..3013371078 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1150,6 +1150,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DISABLE_QUIRKS2 213
 /* #define KVM_CAP_VM_TSC_CONTROL 214 */
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
+#define KVM_CAP_S390_ZPCI_OP 216
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -2066,4 +2067,35 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP _IOW(KVMIO,  0xd0, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+   /* in */
+   __u32 fh;   /* target device */
+   __u8  op;   /* operation to perform */
+   __u8  pad[3];
+   union {
+   /* for KVM_S390_ZPCIOP_REG_AEN */
+   struct {
+   __u64 ibv;  /* Guest addr of interrupt bit vector */
+   __u64 sb;   /* Guest addr of summary bit */
+   __u32 flags;
+   __u32 noi;  /* Number of interrupts */
+   __u8 isc;   /* Guest interrupt subclass */
+   __u8 sbo;   /* Offset of guest summary bit vector */
+   __u16 pad;
+   } reg_aen;
+   __u64 reserved[8];
+   } u;
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_REG_AEN0
+#define KVM_S390_ZPCIOP_DEREG_AEN  1
+
+/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
+#define KVM_S390_ZPCIOP_REGAEN_HOST(1 << 0)
+
+
 #endif /* __LINUX_KVM_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e9f7795c39..ede44b5572 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -643,7 +643,7 @@ enum {
 };
 
 /**
- * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
+ * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *   struct vfio_pci_hot_reset_info)
  *
  * Return: 0 on success, -errno on failure:
@@ -770,7 +770,7 @@ struct vfio_device_ioeventfd {
 #define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 16)
 
 /**
- * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ * VFIO_DEVICE_FEATURE - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
  *struct vfio_device_feature)
  *
  * Get, set, or probe feature data of the device.  The feature is selected
diff --git a/linux-headers/linux/vfio_zdev.h b/linux-headers/linux/vfio_zdev.h
index b4309397b6..77f2aff1f2 100644
--- a/linux-headers/linux/vfio_zdev.h
+++ b/linux-headers/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
__u16 fmb_length;   /* Measurement Block Length (in bytes) */
__u8 pft;   /* PCI Function Type */
__u8 gid;   /* PCI function group ID */
+   /* End of version 1 */
+   __u32 fh;   /* PCI function handle */
+   /* End of version 2 */
 };
 
 /**
@@ -47,6 +50,10 @@ struct vfio_device_info_cap_zpci_group {
__u16 noi;  /* Maximum number of MSIs */
__u16 maxstbl;  /* Maximum Store Block Length */
__u8 version;   /* Supported PCI Version */
+   /* End of version 1 */
+   __u8 reserved;
+   __u16 imaxstbl; /* Maximum Interpreted Store Block Length */
+   /* End of version 2 */
 };
 
 /**
-- 
2.27.0

Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

2022-05-24 Thread David Hildenbrand

On 20.05.22 09:06, zhenwei pi wrote:
> Hi,
> 
> I'm trying to recover hardware corrupted page by virtio balloon, the
> workflow of this feature like this:
> 
> Guest  5.MF -> 6.RVQ FE10.Unpoison page
> /   \/
> ---+-+--+---
>| |  |
> 4.MCE7.RVQ BE   9.RVQ Event
>  QEMU /   \   /
>  3.SIGBUS  8.Remap
> /
> +
> |
> +--2.MF
>  Host   /
>1.HW error
> 
> 1, HardWare page error occurs randomly.
> 2, host side handles corrupted page by Memory Failure mechanism, sends
>SIGBUS to the user process if early-kill is enabled.
> 3, QEMU handles SIGBUS, if the address belongs to guest RAM, then:
> 4, QEMU tries to inject MCE into guest.
> 5, guest handles memory failure again.
> 
> 1-5 is already supported for a long time, the next steps are supported
> in this patch(also related driver patch):
> 
> 6, guest balloon driver gets noticed of the corrupted PFN, and sends
>request to host side by Recover VQ FrontEnd.
> 7, QEMU handles request from Recover VQ BackEnd, then:
> 8, QEMU remaps the corrupted HVA fo fix the memory failure, then:
> 9, QEMU acks the guest side the result by Recover VQ.
> 10, guest unpoisons the page if the corrupted page gets recoverd
> successfully.
> 
> Test:
> This patch set can be tested with QEMU(also in developing):
> https://github.com/pizhenwei/qemu/tree/balloon-recover
> 
> Emulate MCE by QEMU(guest RAM normal page only, hugepage is not supported):
> virsh qemu-monitor-command vm --hmp mce 0 9 0xbdc0 0xd 0x61646678 
> 0x8c
> 
> The guest works fine(on Intel Platinum 8260):
>  mce: [Hardware Error]: Machine check events logged
>  Memory failure: 0x61646: recovery action for dirty LRU page: Recovered
>  virtio_balloon virtio5: recovered pfn 0x61646
>  Unpoison: Unpoisoned page 0x61646 by virtio-balloon
>  MCE: Killing stress:24502 due to hardware memory corruption fault at 
> 7f5be2e5a010
> 
> And the 'HardwareCorrupted' in /proc/meminfo also shows 0 kB.
> 
> About the protocol of virtio balloon recover VQ, it's undefined and in
> developing currently:
> - 'struct virtio_balloon_recover' defines the structure which is used to
>   exchange message between guest and host.
> - '__le32 corrupted_pages' in struct virtio_balloon_config is used in the next
>   step:
>   1, a VM uses RAM of 2M huge page, once a MCE occurs, the 2M becomes
>  unaccessible. Reporting 512 * 4K 'corrupted_pages' to the guest, the 
> guest
>  has a chance to isolate the 512 pages ahead of time.
> 
>   2, after migrating to another host, the corrupted pages are actually 
> recovered,
>  once the guest gets the 'corrupted_pages' with 0, then the guest could
>  unpoison all the poisoned pages which are recorded in the balloon driver.
> 

Hi,

I'm still on vacation this week, I'll try to have a look when I'm back
(and flushed out my overflowing inbox :D).


-- 
Thanks,

David / dhildenb

Re: [PATCH v6 0/8] VSX MMA Implementation

2022-05-24 Thread Daniel Henrique Barboza


Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 5/24/22 11:05, Lucas Mateus Castro(alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

Based-on: https://gitlab.com/danielhb/qemu/-/tree/ppc-next

This patch series is a patch series of the Matrix-Multiply Assist (MMA)
instructions implementation from the PowerISA 3.1

This patch series was created based on Victor's target/ppc: Fix FPSCR.FI
bit patch series changes as that series changed do_check_float_status,
which is called by the GER helper functions.

These and the VDIV/VMOD implementation are the last new PowerISA 3.1
instructions left to be implemented.

The XVFGER instructions accumulate the exception status and at the end
set the FPSCR and take a Program interrupt on a trap-enabled exception,
previous versions were based on Victor's rework of FPU exceptions, but
as that patch was rejected this version worked around the fact that
OX/UX/XX and invalid instructions were handled in different functions
by disabling all enable bits then re-enabling them and calling the mtfsf
deferred exception helper.

v6 changes:
 - Rebased on ppc-next
 - Wrapped lines to stay <= 80 characters

v5 changes:
 - Changed VSXGER16 accumulation to negate the multiplication and
   accumulation in independent if's (if necessary) and sum their
   values.

v4 changes:
 - Changed VSXGER16 accumulation to always use float32_sum and negate
   the elements according to the type of accumulation

v3 changes:
 - GER helpers now use ppc_acc_t instead of ppc_vsr_t for passing acc
 - Removed do_ger_XX3 and updated the decodetree to pass the masks in
   32 bits instructions
 - Removed unnecessary rounding mode function
 - Moved float32_neg to fpu_helper.c and renamed it bfp32_negate to
   make it clearer that it's a 32 bit version of the PowerISA
   bfp_NEGATE
 - Negated accumulation now a subtraction
 - Changed exception handling by disabling all enable FPSCR enable
   bits to set all FPSCR bits (except FEX) correctly, then re-enable
   them and call do_fpscr_check_status to raise the exception
   accordingly and set FEX if necessary

v2 changes:
 - Changed VSXGER, VSXGER16 and XVIGER macros to functions
 - Set rounding mode in floating-point instructions based on RN
   before operations
 - Separated accumulate and with saturation instructions in
   different helpers
 - Used FIELD, FIELD_EX32 and FIELD_DP32 for packing/unpacking masks


Joel Stanley (1):
   linux-user: Add PowerPC ISA 3.1 and MMA to hwcap

Lucas Mateus Castro (alqotel) (7):
   target/ppc: Implement xxm[tf]acc and xxsetaccz
   target/ppc: Implemented xvi*ger* instructions
   target/ppc: Implemented pmxvi*ger* instructions
   target/ppc: Implemented xvf*ger*
   target/ppc: Implemented xvf16ger*
   target/ppc: Implemented pmxvf*ger*
   target/ppc: Implemented [pm]xvbf16ger2*

  linux-user/elfload.c|   4 +
  target/ppc/cpu.h|  13 ++
  target/ppc/fpu_helper.c | 329 +++-
  target/ppc/helper.h |  33 +++
  target/ppc/insn32.decode|  52 +
  target/ppc/insn64.decode|  79 +++
  target/ppc/int_helper.c | 130 +++
  target/ppc/internal.h   |  15 ++
  target/ppc/translate/vsx-impl.c.inc | 130 +++
  9 files changed, 783 insertions(+), 2 deletions(-)

Re: [PATCH v3 2/8] kvm: Support for querying fd-based stats

2022-05-24 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> From: Mark Kanda 
> 
> Add support for querying fd-based KVM stats - as introduced by Linux kernel
> commit:
> 
> cb082bfab59a ("KVM: stats: Add fd-based API to read binary stats data")
> 
> This allows the user to analyze the behavior of the VM without access
> to debugfs.
> 
> Signed-off-by: Mark Kanda 
> Signed-off-by: Paolo Bonzini 
> ---
>  accel/kvm/kvm-all.c | 403 
>  qapi/stats.json |   2 +-
>  2 files changed, 404 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 32e177bd26..6a6bbe2994 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -47,6 +47,7 @@
>  #include "kvm-cpus.h"
>  
>  #include "hw/boards.h"
> +#include "monitor/stats.h"
>  
>  /* This check must be after config-host.h is included */
>  #ifdef CONFIG_EVENTFD
> @@ -2310,6 +2311,9 @@ bool kvm_dirty_ring_enabled(void)
>  return kvm_state->kvm_dirty_ring_size ? true : false;
>  }
>  
> +static void query_stats_cb(StatsResultList **result, StatsTarget target, 
> Error **errp);
> +static void query_stats_schemas_cb(StatsSchemaList **result, Error **errp);
> +
>  static int kvm_init(MachineState *ms)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2638,6 +2642,10 @@ static int kvm_init(MachineState *ms)
>  }
>  }
>  
> +if (kvm_check_extension(kvm_state, KVM_CAP_BINARY_STATS_FD)) {
> +add_stats_callbacks(query_stats_cb, query_stats_schemas_cb);
> +}
> +
>  return 0;
>  
>  err:
> @@ -3697,3 +3705,398 @@ static void kvm_type_init(void)
>  }
>  
>  type_init(kvm_type_init);
> +
> +typedef struct StatsArgs {
> +union StatsResultsType {
> +StatsResultList **stats;
> +StatsSchemaList **schema;
> +} result;
> +Error **errp;
> +} StatsArgs;
> +
> +static StatsList *add_kvmstat_entry(struct kvm_stats_desc *pdesc,
> +uint64_t *stats_data,
> +StatsList *stats_list,
> +Error **errp)
> +{
> +
> +StatsList *stats_entry;
> +Stats *stats;
> +uint64List *val_list = NULL;

A comment here something like:
/* Only add stats that we understand */  ?
> +switch (pdesc->flags & KVM_STATS_TYPE_MASK) {
> +case KVM_STATS_TYPE_CUMULATIVE:
> +case KVM_STATS_TYPE_INSTANT:
> +case KVM_STATS_TYPE_PEAK:
> +case KVM_STATS_TYPE_LINEAR_HIST:
> +case KVM_STATS_TYPE_LOG_HIST:
> +break;
> +default:
> +return stats_list;
> +}
> +
> +switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +case KVM_STATS_UNIT_NONE:
> +case KVM_STATS_UNIT_BYTES:
> +case KVM_STATS_UNIT_CYCLES:
> +case KVM_STATS_UNIT_SECONDS:
> +break;
> +default:
> +return stats_list;
> +}
> +
> +switch (pdesc->flags & KVM_STATS_BASE_MASK) {
> +case KVM_STATS_BASE_POW10:
> +case KVM_STATS_BASE_POW2:
> +break;
> +default:
> +return stats_list;
> +}
> +
> +/* Alloc and populate data list */
> +stats_entry = g_new0(StatsList, 1);
> +stats = g_new0(Stats, 1);
> +stats->name = g_strdup(pdesc->name);
> +stats->value = g_new0(StatsValue, 1);;
> +
> +if (pdesc->size == 1) {
> +stats->value->u.scalar = *stats_data;
> +stats->value->type = QTYPE_QNUM;
> +} else {
> +int i;
> +for (i = 0; i < pdesc->size; i++) {
> +uint64List *val_entry = g_new0(uint64List, 1);
> +val_entry->value = stats_data[i];
> +val_entry->next = val_list;
> +val_list = val_entry;
> +}
> +stats->value->u.list = val_list;
> +stats->value->type = QTYPE_QLIST;
> +}
> +
> +stats_entry->value = stats;
> +stats_entry->next = stats_list;

Can all that use QAPI_LIST_PREPEND?

> +return stats_entry;
> +}
> +
> +static StatsSchemaValueList *add_kvmschema_entry(struct kvm_stats_desc 
> *pdesc,
> + StatsSchemaValueList *list,
> + Error **errp)
> +{
> +StatsSchemaValueList *schema_entry = g_new0(StatsSchemaValueList, 1);
> +schema_entry->value = g_new0(StatsSchemaValue, 1);
> +
> +switch (pdesc->flags & KVM_STATS_TYPE_MASK) {
> +case KVM_STATS_TYPE_CUMULATIVE:
> +schema_entry->value->type = STATS_TYPE_CUMULATIVE;
> +break;
> +case KVM_STATS_TYPE_INSTANT:
> +schema_entry->value->type = STATS_TYPE_INSTANT;
> +break;
> +case KVM_STATS_TYPE_PEAK:
> +schema_entry->value->type = STATS_TYPE_PEAK;
> +break;
> +case KVM_STATS_TYPE_LINEAR_HIST:
> +schema_entry->value->type = STATS_TYPE_LINEAR_HISTOGRAM;
> +schema_entry->value->bucket_size = pdesc->bucket_size;
> +schema_entry->value->has_bucket_size = true;
> +break;
> +case KVM_STATS_TYPE_LOG_HIST:
> +sch

Re: [PATCH v6 10/13] migration: Respect postcopy request order in preemption mode

2022-05-24 Thread Peter Xu

On Mon, May 23, 2022 at 11:56:14AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > With preemption mode on, when we see a postcopy request that was requesting
> > for exactly the page that we have preempted before (so we've partially sent
> > the page already via PRECOPY channel and it got preempted by another
> > postcopy request), currently we drop the request so that after all the
> > other postcopy requests are serviced then we'll go back to precopy stream
> > and start to handle that.
> > 
> > We dropped the request because we can't send it via postcopy channel since
> > the precopy channel already contains partial of the data, and we can only
> > send a huge page via one channel as a whole.  We can't split a huge page
> > into two channels.
> > 
> > That's a very corner case and that works, but there's a change on the order
> > of postcopy requests that we handle since we're postponing this (unlucky)
> > postcopy request to be later than the other queued postcopy requests.  The
> > problem is there's a possibility that when the guest was very busy, the
> > postcopy queue can be always non-empty, it means this dropped request will
> > never be handled until the end of postcopy migration. So, there's a chance
> > that there's one dest QEMU vcpu thread waiting for a page fault for an
> > extremely long time just because it's unluckily accessing the specific page
> > that was preempted before.
> > 
> > The worst case time it needs can be as long as the whole postcopy migration
> > procedure.  It's extremely unlikely to happen, but when it happens it's not
> > good.
> > 
> > The root cause of this problem is because we treat pss->postcopy_requested
> > variable as with two meanings bound together, as the variable shows:
> > 
> >   1. Whether this page request is urgent, and,
> >   2. Which channel we should use for this page request.
> > 
> > With the old code, when we set postcopy_requested it means either both (1)
> > and (2) are true, or both (1) and (2) are false.  We can never have (1)
> > and (2) to have different values.
> > 
> > However it doesn't necessarily need to be like that.  It's very legal that
> > there's one request that has (1) very high urgency, but (2) we'd like to
> > use the precopy channel.  Just like the corner case we were discussing
> > above.
> > 
> > To differenciate the two meanings better, introduce a new field called
> > postcopy_target_channel, showing which channel we should use for this page
> > request, so as to cover the old meaning (2) only.  Then we leave the
> > postcopy_requested variable to stand only for meaning (1), which is the
> > urgency of this page request.
> > 
> > With this change, we can easily boost priority of a preempted precopy page
> > as long as we know that page is also requested as a postcopy page.  So with
> > the new approach in get_queued_page() instead of dropping that request, we
> > send it right away with the precopy channel so we get back the ordering of
> > the page faults just like how they're requested on dest.
> > 
> > Alongside, I touched up find_dirty_block() to only set the postcopy fields
> > in the pss section if we're going through a postcopy migration.  That's a
> > very light optimization and shouldn't affect much.
> > 
> > Reported-by: manish.mis...@nutanix.com
> > Signed-off-by: Peter Xu 
> 
> So I think this is OK; getting a bit complicated!

Yes it is.  I added some more comment, hopefully it'll help a little bit.

> 
> Reviewed-by: Dr. David Alan Gilbert 

Thanks!

> >  static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool 
> > *again)
> >  {
> > -/* This is not a postcopy requested page */
> > -pss->postcopy_requested = false;
> > +if (migration_in_postcopy()) {
> > +/*
> > + * This is not a postcopy requested page, mark it "not urgent", and
> > + * use precopy channel to send it.
> > + */
> > +pss->postcopy_requested = false;
> > +pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
> > +}
> 
> Do you need the 'if' here?

Hmm good question..  precopy should always have these two fields cleared
anyway so I wanted to avoid setting them every time, but I just noticed
that pss is not initialized at all when used..

static int ram_find_and_save_block(RAMState *rs)
{
PageSearchStatus pss;
...
}

So either we'd reset pss explicitly on these fields, or simpler - let me
drop the if..

Thanks,

-- 
Peter Xu

Re: [PATCH] aio_wait_kick: add missing memory barrier

2022-05-24 Thread Vladimir Sementsov-Ogievskiy


On 5/24/22 20:30, Emanuele Giuseppe Esposito wrote:

It seems that aio_wait_kick always required a memory barrier
or atomic operation in the caller, but nobody actually
took care of doing it.

Let's put the barrier in the function instead, and pair it
with another one in AIO_WAIT_WHILE. Read aio_wait_kick()
comment for further explanation.

Suggested-by: Paolo Bonzini
Signed-off-by: Emanuele Giuseppe Esposito


Thanks!

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

1 2 3 4 >

1 - 100 of 347 matches

Mail list logo