date:20230511

Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ

2023-05-11 Thread Michael S. Tsirkin

On Thu, May 11, 2023 at 09:44:51PM +, Bernhard Beschow wrote:
> 
> 
> Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland 
> :
> >Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
> >calling PCI bus IRQ handler function directly. In order to preserve the
> >existing behaviour update pci_qdev_realize() so that it automatically 
> >connects
> >the PCI device IRQ to the PCI bus IRQ handler.
> >
> >Finally add a "QEMU interface" description documenting the new PCI device IRQ
> >gpio next to the declaration of TYPE_PCI_DEVICE.
> >
> >Signed-off-by: Mark Cave-Ayland 
> >---
> > hw/pci/pci.c | 12 ++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> >diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >index 9471f996a7..3da1481eb5 100644
> >--- a/hw/pci/pci.c
> >+++ b/hw/pci/pci.c
> >@@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
> > 
> > void pci_set_irq(PCIDevice *pci_dev, int level)
> > {
> >-int intx = pci_intx(pci_dev);
> >-pci_irq_handler(pci_dev, intx, level);
> >+qemu_set_irq(pci_dev->irq, level);
> > }
> > 
> > /* Special hooks used by device assignment */
> >@@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> >**errp)
> > pci_set_power(pci_dev, true);
> > 
> > pci_dev->msi_trigger = pci_msi_trigger;
> >+
> >+/* Connect device IRQ to bus */
> >+qdev_connect_gpio_out(DEVICE(pci_dev), 0,
> >+  pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);
> 
> I think this is confusing a few things. In my understanding -- unlike
> ISA -- PCI considers interrupt lanes only for PCI slots but not for
> buses.
> So for example each PCI slot could have its own direct
> connections (up to four, intA..intD) to the interrupt controller. IOW
> interrupt lanes and PCI buses are unrelated, thus PCIBus shouldn't
> really have IRQs.

True, interrupt lines (not lanes I think - lanes is a PCI express
unrelated to interrupts since interrupts are just messages under PCIe)
bypass the PCI bus. They are in fact even used outside the
normal GNT#/REQ# protocol.

The system vendor is free to combine the various INTx# signals from the 
PCI connector(s)
in any way to connect them to the interrupt controller. They may be 
wire-ORed or
electronically switched under program control, or any combination 
thereof. The system
designer must insure that each INTx# signal from each connector is 
connected to an input
on the interrupt controller. This means the device driver may not make 
any assumptions
about interrupt sharing. All PCI device drivers must be able to share 
an interrupt (chaining)
with any other logical device including devices in the same 
multi-function package.

> 
> Moreover, in case the interrupt lines are shared between multiple PCI slots, 
> a usual pattern is to swizzle these lines such that the intAs from the slots 
> don't all occupy just one IRQ line. That means that depending on the slot the 
> device is plugged into a different lane is triggered. Above code, however, 
> would always trigger the same line and wouldn't even allow for modeling the 
> swizzeling.

the swizzeling always applies in case of PCI bridges:

However, since bridges will be used on add-in cards, the BIOS will assume an 
association
between device location and which INTx# line it uses when requesting an 
interrupt.
...
The BIOS code will assume the following binding behind the bridge and will
write the IRQ number in each device as described in Table 9-1. The interrupt 
binding
defined in this table is mandatory for add-in cards utilizing a bridge.





> Also, above code would cause out of bounds array accesses if a PCI device had 
> more functions than there are on "the bus":
> For example, consider PIIX which has four PIRQs, so ARRAY_SIZE(irq_fn) == 4, 
> right? devfn can be up to 8 according to the PCI spec which would cause an 
> out if bounds array access above.
> 
> I think that this commit does actually re-define how PCI buses work in QEMU 
> although the cover letter claims to save this for another day. We should 
> probably not apply the series in its current form.
> 
> Best regards,
> Bernhard
> 
> > }
> > 
> > static void pci_device_init(Object *obj)
> >@@ -2850,6 +2853,11 @@ void pci_set_power(PCIDevice *d, bool state)
> > }
> > }
> > 
> >+/*
> >+ * QEMU interface:
> >+ * + Unnamed GPIO output: set to 1 if the PCI Device has asserted its irq
> >+ */
> >+
> > static const TypeInfo pci_device_type_info = {
> > .name = TYPE_PCI_DEVICE,
> > .parent = TYPE_DEVICE,

Re: [PULL 03/16] ssi: cache SSIPeripheralClass to avoid GET_CLASS()

2023-05-11 Thread Philippe Mathieu-Daudé


Hi Alex,

On 25/10/22 17:20, Cédric Le Goater wrote:

From: Alex Bennée 

Investigating why some BMC models are so slow compared to a plain ARM
virt machines I did some profiling of:

   ./qemu-system-arm -M romulus-bmc -nic user \
 -drive
 file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
 -nographic -serial mon:stdio

And saw that object_class_dynamic_cast_assert was dominating the
profile times. We have a number of cases in this model of the SSI bus.
As the class is static once the object is created we just cache it and
use it instead of the dynamic case macros.

Profiling against:

   ./tests/venv/bin/avocado run \
 tests/avocado/machine_aspeed.py:test_arm_ast2500_romulus_openbmc_v2_9_0

Before: 35.565 s ±  0.087 s
After: 15.713 s ±  0.287 s


Do you remember if you were using --enable-qom-cast-debug when
profiling this?


Signed-off-by: Alex Bennée 
Cc: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Reviewed-by: Cédric Le Goater 
Message-Id: <20220811151413.3350684-6-alex.ben...@linaro.org>
Message-Id: <20220923084803.498337-6-...@kaod.org>
Signed-off-by: Cédric Le Goater 
---
  include/hw/ssi/ssi.h |  3 +++
  hw/ssi/ssi.c | 18 --
  2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
index f411858ab083..6950f86810d3 100644
--- a/include/hw/ssi/ssi.h
+++ b/include/hw/ssi/ssi.h
@@ -59,6 +59,9 @@ struct SSIPeripheralClass {
  struct SSIPeripheral {
  DeviceState parent_obj;
  
+/* cache the class */

+SSIPeripheralClass *spc;
+
  /* Chip select state */
  bool cs;
  };
diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
index 003931fb509e..d54a109beeb5 100644
--- a/hw/ssi/ssi.c
+++ b/hw/ssi/ssi.c
@@ -38,9 +38,8 @@ static void ssi_cs_default(void *opaque, int n, int level)
  bool cs = !!level;
  assert(n == 0);
  if (s->cs != cs) {
-SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(s);
-if (ssc->set_cs) {
-ssc->set_cs(s, cs);
+if (s->spc->set_cs) {
+s->spc->set_cs(s, cs);
  }
  }
  s->cs = cs;
@@ -48,11 +47,11 @@ static void ssi_cs_default(void *opaque, int n, int level)
  
  static uint32_t ssi_transfer_raw_default(SSIPeripheral *dev, uint32_t val)

  {
-SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev);
+SSIPeripheralClass *ssc = dev->spc;
  
  if ((dev->cs && ssc->cs_polarity == SSI_CS_HIGH) ||

-(!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
-ssc->cs_polarity == SSI_CS_NONE) {
+(!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
+ssc->cs_polarity == SSI_CS_NONE) {
  return ssc->transfer(dev, val);
  }
  return 0;
@@ -67,6 +66,7 @@ static void ssi_peripheral_realize(DeviceState *dev, Error 
**errp)
  ssc->cs_polarity != SSI_CS_NONE) {
  qdev_init_gpio_in_named(dev, ssi_cs_default, SSI_GPIO_CS, 1);
  }
+s->spc = ssc;
  
  ssc->realize(s, errp);

  }
@@ -115,13 +115,11 @@ uint32_t ssi_transfer(SSIBus *bus, uint32_t val)
  {
  BusState *b = BUS(bus);
  BusChild *kid;
-SSIPeripheralClass *ssc;
  uint32_t r = 0;
  
  QTAILQ_FOREACH(kid, >children, sibling) {

-SSIPeripheral *peripheral = SSI_PERIPHERAL(kid->child);
-ssc = SSI_PERIPHERAL_GET_CLASS(peripheral);
-r |= ssc->transfer_raw(peripheral, val);
+SSIPeripheral *p = SSI_PERIPHERAL(kid->child);
+r |= p->spc->transfer_raw(p, val);
  }
  
  return r;

Re: [PULL 04/16] aspeed/smc: Cache AspeedSMCClass

2023-05-11 Thread Philippe Mathieu-Daudé


Hi Cédric,

On 25/10/22 17:20, Cédric Le Goater wrote:

Store a reference on the AspeedSMC class under the flash object and
use it when accessing the flash contents. Avoiding the class cast
checkers in these hot paths improves performance by 10% when running
the aspeed avocado tests.


I doubt you still have your benchmark number, but do you remember
if you were using --enable-qom-cast-debug ?


Message-Id: <20220923084803.498337-7-...@kaod.org>
Signed-off-by: Cédric Le Goater 
---
  include/hw/ssi/aspeed_smc.h | 2 ++
  hw/ssi/aspeed_smc.c | 9 -
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/hw/ssi/aspeed_smc.h b/include/hw/ssi/aspeed_smc.h
index 2d5f8f3d8f68..8e1dda556b91 100644
--- a/include/hw/ssi/aspeed_smc.h
+++ b/include/hw/ssi/aspeed_smc.h
@@ -30,6 +30,7 @@
  #include "qom/object.h"
  
  struct AspeedSMCState;

+struct AspeedSMCClass;
  
  #define TYPE_ASPEED_SMC_FLASH "aspeed.smc.flash"

  OBJECT_DECLARE_SIMPLE_TYPE(AspeedSMCFlash, ASPEED_SMC_FLASH)
@@ -37,6 +38,7 @@ struct AspeedSMCFlash {
  SysBusDevice parent_obj;
  
  struct AspeedSMCState *controller;

+struct AspeedSMCClass *asc;
  uint8_t cs;
  
  MemoryRegion mmio;

diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index faed7e0cbe17..22df4be528a7 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -388,7 +388,7 @@ static inline int aspeed_smc_flash_cmd(const AspeedSMCFlash 
*fl)
  static inline int aspeed_smc_flash_addr_width(const AspeedSMCFlash *fl)
  {
  const AspeedSMCState *s = fl->controller;
-AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s);
+AspeedSMCClass *asc = fl->asc;
  
  if (asc->addr_width) {

  return asc->addr_width(s);
@@ -420,7 +420,7 @@ static uint32_t aspeed_smc_check_segment_addr(const 
AspeedSMCFlash *fl,
uint32_t addr)
  {
  const AspeedSMCState *s = fl->controller;
-AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s);
+AspeedSMCClass *asc = fl->asc;
  AspeedSegments seg;
  
  asc->reg_to_segment(s, s->regs[R_SEG_ADDR0 + fl->cs], );

@@ -1234,7 +1234,6 @@ static const TypeInfo aspeed_smc_info = {
  static void aspeed_smc_flash_realize(DeviceState *dev, Error **errp)
  {
  AspeedSMCFlash *s = ASPEED_SMC_FLASH(dev);
-AspeedSMCClass *asc;
  g_autofree char *name = g_strdup_printf(TYPE_ASPEED_SMC_FLASH ".%d", 
s->cs);
  
  if (!s->controller) {

@@ -1242,14 +1241,14 @@ static void aspeed_smc_flash_realize(DeviceState *dev, 
Error **errp)
  return;
  }
  
-asc = ASPEED_SMC_GET_CLASS(s->controller);

+s->asc = ASPEED_SMC_GET_CLASS(s->controller);
  
  /*

   * Use the default segment value to size the memory region. This
   * can be changed by FW at runtime.
   */
  memory_region_init_io(>mmio, OBJECT(s), _smc_flash_ops,
-  s, name, asc->segments[s->cs].size);
+  s, name, s->asc->segments[s->cs].size);
  sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mmio);
  }

Re: [PATCH 2/3] hw/intc: Add NULL pointer check on LoongArch ipi device

2023-05-11 Thread Philippe Mathieu-Daudé


On 12/5/23 05:01, Song Gao wrote:

Hi,  Philippe

在 2023/5/12 上午3:03, Philippe Mathieu-Daudé 写道:

On 6/4/23 12:00, Song Gao wrote:

When ipi mailbox is used, cpu index is decoded from iocsr register.
cpu maybe does not exist. This patch adss NULL pointer check on
ipi device.


How can that happens from a guest vcpu context?


cpuid(cs->cpu_index)  is decoded from iocsr register.

     cpuid = (val >> 16) & 0x3ff;   // ipi_sned [25:16]

The value maybe invalid.  qemu only support 4 vcpu.


What about something like this?

-- >8 --
-static void ipi_send(uint64_t val)
+static void ipi_send(uint32_t val)
 {
-int cpuid, data;
+uint32_t cpuid;
+uint8_t vector;
 CPULoongArchState *env;
 CPUState *cs;
 LoongArchCPU *cpu;

-cpuid = (val >> 16) & 0x3ff;
+cpuid = extract32(val, 16, 10);
+if (cpuid >= MAX_IPI_CORE_NUM) {
+trace_loongarch_ipi_unsupported_cpuid("IOCSR_IPI_SEND", cpuid);
+return;
+}
 /* IPI status vector */
-data = 1 << (val & 0x1f);
+vector = extract8(val, 0, 5);
+
 cs = qemu_get_cpu(cpuid);
 cpu = LOONGARCH_CPU(cs);
 env = >env;
 address_space_stl(>address_space_iocsr, 0x1008,
-  data, MEMTXATTRS_UNSPECIFIED, NULL);
+  BIT(vector), MEMTXATTRS_UNSPECIFIED, NULL);

 }
---


you can find more about ipi_send registers at:
https://github.com/loongson/LoongArch-Documentation/releases/download/2023.04.20/Loongson-3A5000-usermanual-v1.03-EN.pdf
Table 63. Processor core inter-processor communication registers


Signed-off-by: Song Gao 
---
  hw/intc/loongarch_ipi.c | 31 +++
  1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/intc/loongarch_ipi.c b/hw/intc/loongarch_ipi.c
index 0563d83a35..39e899df46 100644
--- a/hw/intc/loongarch_ipi.c
+++ b/hw/intc/loongarch_ipi.c
@@ -86,11 +86,12 @@ static void ipi_send(uint64_t val)
  /* IPI status vector */
  data = 1 << (val & 0x1f);
  cs = qemu_get_cpu(cpuid);
-    cpu = LOONGARCH_CPU(cs);
-    env = >env;
-    address_space_stl(>address_space_iocsr, 0x1008,
-  data, MEMTXATTRS_UNSPECIFIED, NULL);
-
+    if (cs) {
+    cpu = LOONGARCH_CPU(cs);
+    env = >env;
+    address_space_stl(>address_space_iocsr, 0x1008,
+  data, MEMTXATTRS_UNSPECIFIED, NULL);
+    }


Is that the hardware behavior?


Yes.

Could logging the invalid cpuid request be useful?


Sure.

Thanks.
Song Gao

Re: [PATCH v2 12/19] cutils: Allow NULL str in qemu_strtosz

2023-05-11 Thread Philippe Mathieu-Daudé


On 12/5/23 04:10, Eric Blake wrote:

All the other qemu_strto* and parse_uint allow a NULL str.  Having
qemu_strtosz crash on qemu_strtosz(NULL, NULL, ) is an easy fix
that adds some consistency between our string parsers.

Signed-off-by: Eric Blake 
---
  tests/unit/test-cutils.c | 3 +++
  util/cutils.c| 2 +-
  2 files changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 01/19] test-cutils: Avoid g_assert in unit tests

2023-05-11 Thread Philippe Mathieu-Daudé


On 12/5/23 04:10, Eric Blake wrote:

glib documentation[1] is clear: g_assert() should be avoided in unit
tests because it is ineffective if G_DISABLE_ASSERT is defined; unit
tests should stick to constructs based on g_assert_true() instead.
Note that since commit 262a69f428, we intentionally state that you
cannot define G_DISABLE_ASSERT that while building qemu; but our code
can be copied to other projects without that restriction, so we should
be consistent.

For most of the replacements in this patch, using g_assert_cmpstr()
would be a regression in quality - although it would helpfully display
the string contents of both pointers on test failure, here, we really
do care about pointer equality, not just string content equality.  But
when a NULL pointer is expected, g_assert_null works fine.

[1] https://libsoup.org/glib/glib-Testing.html#g-assert

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
  tests/unit/test-cutils.c | 324 +++
  1 file changed, 162 insertions(+), 162 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/3] hw/intc: Add NULL pointer check on LoongArch ipi device

2023-05-11 Thread Song Gao


Hi,  Philippe

在 2023/5/12 上午3:03, Philippe Mathieu-Daudé 写道:

On 6/4/23 12:00, Song Gao wrote:

When ipi mailbox is used, cpu index is decoded from iocsr register.
cpu maybe does not exist. This patch adss NULL pointer check on
ipi device.


How can that happens from a guest vcpu context?


cpuid(cs->cpu_index)  is decoded from iocsr register.

    cpuid = (val >> 16) & 0x3ff;   // ipi_sned [25:16]

The value maybe invalid.  qemu only support 4 vcpu.

you can find more about ipi_send registers at:
https://github.com/loongson/LoongArch-Documentation/releases/download/2023.04.20/Loongson-3A5000-usermanual-v1.03-EN.pdf
Table 63. Processor core inter-processor communication registers


Signed-off-by: Song Gao 
---
  hw/intc/loongarch_ipi.c | 31 +++
  1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/intc/loongarch_ipi.c b/hw/intc/loongarch_ipi.c
index 0563d83a35..39e899df46 100644
--- a/hw/intc/loongarch_ipi.c
+++ b/hw/intc/loongarch_ipi.c
@@ -86,11 +86,12 @@ static void ipi_send(uint64_t val)
  /* IPI status vector */
  data = 1 << (val & 0x1f);
  cs = qemu_get_cpu(cpuid);
-    cpu = LOONGARCH_CPU(cs);
-    env = >env;
-    address_space_stl(>address_space_iocsr, 0x1008,
-  data, MEMTXATTRS_UNSPECIFIED, NULL);
-
+    if (cs) {
+    cpu = LOONGARCH_CPU(cs);
+    env = >env;
+    address_space_stl(>address_space_iocsr, 0x1008,
+  data, MEMTXATTRS_UNSPECIFIED, NULL);
+    }


Is that the hardware behavior?


Yes.

Could logging the invalid cpuid request be useful?


Sure.

Thanks.
Song Gao

[PATCH v2 13/19] numa: Check for qemu_strtosz_MiB error

2023-05-11 Thread Eric Blake

As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the
result value untouched (we have to audit further to learn that in that
case, the QAPI generator says that visit_type_NumaOptions() will have
zero-initialized it), and sometimes leaves it with the value of a
partial parse before -EINVAL occurs because of trailing garbage.
Rather than blindly treating any string the user may throw at us as
valid, we should check for parse failures.

Fixes: cc001888 ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0)
Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
 hw/core/numa.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index d8d36b16d80..f08956ddb0f 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -531,10 +531,17 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 /* Fix up legacy suffix-less format */
 if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
 const char *mem_str = qemu_opt_get(opts, "mem");
-qemu_strtosz_MiB(mem_str, NULL, >u.node.mem);
+int ret = qemu_strtosz_MiB(mem_str, NULL, >u.node.mem);
+
+if (ret < 0) {
+error_setg_errno(, -ret, "could not parse memory size '%s'",
+ mem_str);
+}
 }

-set_numa_options(ms, object, );
+if (!err) {
+set_numa_options(ms, object, );
+}

 qapi_free_NumaOptions(object);
 if (err) {
-- 
2.40.1

[PATCH v2 17/19] cutils: Use parse_uint in qemu_strtosz for negative rejection

2023-05-11 Thread Eric Blake

Rather than open-coding two different ways to check for an unwanted
negative sign, reuse the same code in both functions.  That way, if we
decide down the road to accept "-0" instead of rejecting it, we have
fewer places to change.  Also, it means we now get ERANGE instead of
EINVAL for negative values in qemu_strtosz, which is reasonable for
what it represents.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 7 +++
 util/cutils.c| 8 ++--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 1272638582a..b8ad4d7fbac 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3396,10 +3396,9 @@ static void test_qemu_strtosz_trailing(void)
 static void test_qemu_strtosz_erange(void)
 {
 /* FIXME negative values fit better as ERANGE */
-do_strtosz(" -0", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 3 */);
-do_strtosz("-1", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 2 */);
-do_strtosz_full("-2M", qemu_strtosz, -EINVAL /* FIXME -ERANGE */, 0,
-0 /* FIXME 2 */, -EINVAL, 0);
+do_strtosz(" -0", -ERANGE, 0, 3);
+do_strtosz("-1", -ERANGE, 0, 2);
+do_strtosz_full("-2M", qemu_strtosz, -ERANGE, 0, 2, -EINVAL, 0);
 do_strtosz(" -.0", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 4 */);
 do_strtosz_full("-.1k", qemu_strtosz, -EINVAL /* FIXME -ERANGE */, 0,
 0 /* FIXME 3 */, -EINVAL, 0);
diff --git a/util/cutils.c b/util/cutils.c
index b5a6641fa0f..550abbe5c06 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -201,6 +201,7 @@ static int64_t suffix_mul(char suffix, int64_t unit)
  * - hex with scaling suffix, such as 0x20M
  * - octal, such as 08
  * - fractional hex, such as 0x1.8
+ * - negative values, including -0
  * - floating point exponents, such as 1e3
  *
  * The end pointer will be returned in *end, if not NULL.  If there is
@@ -226,15 +227,10 @@ static int do_strtosz(const char *nptr, const char **end,
 int64_t mul;

 /* Parse integral portion as decimal. */
-retval = qemu_strtou64(nptr, , 10, );
+retval = parse_uint(nptr, , 10, );
 if (retval) {
 goto out;
 }
-if (memchr(nptr, '-', endptr - nptr) != NULL) {
-endptr = nptr;
-retval = -EINVAL;
-goto out;
-}
 if (val == 0 && (*endptr == 'x' || *endptr == 'X')) {
 /* Input looks like hex; reparse, and insist on no fraction or suffix. 
*/
 retval = qemu_strtou64(nptr, , 16, );
-- 
2.40.1

[PATCH v2 12/19] cutils: Allow NULL str in qemu_strtosz

2023-05-11 Thread Eric Blake

All the other qemu_strto* and parse_uint allow a NULL str.  Having
qemu_strtosz crash on qemu_strtosz(NULL, NULL, ) is an easy fix
that adds some consistency between our string parsers.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 3 +++
 util/cutils.c| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 5c9ed78be93..1936c7b5795 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3260,6 +3260,9 @@ static void test_qemu_strtosz_float(void)

 static void test_qemu_strtosz_invalid(void)
 {
+do_strtosz(NULL, -EINVAL, 0xbaadf00d, 0);
+
+/* Must parse at least one digit */
 do_strtosz("", -EINVAL, 0xbaadf00d, 0);
 do_strtosz(" \t ", -EINVAL, 0xbaadf00d, 0);
 do_strtosz("crap", -EINVAL, 0xbaadf00d, 0);
diff --git a/util/cutils.c b/util/cutils.c
index e599924a0c4..91c90673aba 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -306,7 +306,7 @@ static int do_strtosz(const char *nptr, const char **end,
 out:
 if (end) {
 *end = endptr;
-} else if (*endptr) {
+} else if (nptr && *endptr) {
 retval = -EINVAL;
 }
 if (retval == 0) {
-- 
2.40.1

[PATCH v2 07/19] cutils: Adjust signature of parse_uint[_full]

2023-05-11 Thread Eric Blake

It's already confusing that we have two very similar functions for
wrapping the parse of a 64-bit unsigned value, differing mainly on
whether they permit leading '-'.  Adjust the signature of parse_uint()
and parse_uint_full() to be like all of qemu_strto*(): put the result
parameter last, use the same types (uint64_t is not always the same as
unsigned long long, and mark endptr const (only latter affects the
rare caller of parse_uint).  Adjust all callers in the tree.

Signed-off-by: Eric Blake 
---
 include/qemu/cutils.h |   5 +-
 audio/audio_legacy.c  |   4 +-
 block/gluster.c   |   4 +-
 block/nfs.c   |   4 +-
 blockdev.c|   4 +-
 contrib/ivshmem-server/main.c |   4 +-
 qapi/opts-visitor.c   |  10 +--
 tests/unit/test-cutils.c  | 113 +++---
 ui/vnc.c  |   4 +-
 util/cutils.c |  13 ++--
 util/guest-random.c   |   4 +-
 util/qemu-sockets.c   |  10 +--
 12 files changed, 82 insertions(+), 97 deletions(-)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 92c436d8c70..92c927a6a35 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -163,9 +163,8 @@ int qemu_strtou64(const char *nptr, const char **endptr, 
int base,
 int qemu_strtod(const char *nptr, const char **endptr, double *result);
 int qemu_strtod_finite(const char *nptr, const char **endptr, double *result);

-int parse_uint(const char *s, unsigned long long *value, char **endptr,
-   int base);
-int parse_uint_full(const char *s, unsigned long long *value, int base);
+int parse_uint(const char *s, const char **endptr, int base, uint64_t *value);
+int parse_uint_full(const char *s, int base, uint64_t *value);

 int qemu_strtosz(const char *nptr, const char **end, uint64_t *result);
 int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result);
diff --git a/audio/audio_legacy.c b/audio/audio_legacy.c
index b848001ff70..dc72ba55e9a 100644
--- a/audio/audio_legacy.c
+++ b/audio/audio_legacy.c
@@ -35,8 +35,8 @@

 static uint32_t toui32(const char *str)
 {
-unsigned long long ret;
-if (parse_uint_full(str, , 10) || ret > UINT32_MAX) {
+uint64_t ret;
+if (parse_uint_full(str, 10, ) || ret > UINT32_MAX) {
 dolog("Invalid integer value `%s'\n", str);
 exit(1);
 }
diff --git a/block/gluster.c b/block/gluster.c
index 185a83e5e53..ad5fadbe793 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -424,7 +424,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 int ret;
 int old_errno;
 SocketAddressList *server;
-unsigned long long port;
+uint64_t port;

 glfs = glfs_find_preopened(gconf->volume);
 if (glfs) {
@@ -445,7 +445,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
server->value->u.q_unix.path, 0);
 break;
 case SOCKET_ADDRESS_TYPE_INET:
-if (parse_uint_full(server->value->u.inet.port, , 10) < 0 ||
+if (parse_uint_full(server->value->u.inet.port, 10, ) < 0 ||
 port > 65535) {
 error_setg(errp, "'%s' is not a valid port number",
server->value->u.inet.port);
diff --git a/block/nfs.c b/block/nfs.c
index 006045d71a6..fabea0386a3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -114,13 +114,13 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
 qdict_put_str(options, "path", uri->path);

 for (i = 0; i < qp->n; i++) {
-unsigned long long val;
+uint64_t val;
 if (!qp->p[i].value) {
 error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
 goto out;
 }
-if (parse_uint_full(qp->p[i].value, , 0)) {
+if (parse_uint_full(qp->p[i].value, 0, )) {
 error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
 goto out;
diff --git a/blockdev.c b/blockdev.c
index d141ca7a2d5..162eaffae16 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,10 +341,10 @@ static bool parse_stats_intervals(BlockAcctStats *stats, 
QList *intervals,
 switch (qobject_type(entry->value)) {

 case QTYPE_QSTRING: {
-unsigned long long length;
+uint64_t length;
 const char *str = qstring_get_str(qobject_to(QString,
  entry->value));
-if (parse_uint_full(str, , 10) == 0 &&
+if (parse_uint_full(str, 10, ) == 0 &&
 length > 0 && length <= UINT_MAX) {
 block_acct_add_interval(stats, (unsigned) length);
 } else {
diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
index 224dbeb547e..5901f17707e 100644
--- a/contrib/ivshmem-server/main.c
+++

[PATCH v2 09/19] test-cutils: Add coverage of qemu_strtod

2023-05-11 Thread Eric Blake

It's hard to tweak code for consistency if I can't prove what will or
won't break from those tweaks.  Time to add unit tests for
qemu_strtod() and qemu_strtod_finite().

Among other things, I wrote a check whether we have C99 semantics for
strtod("0x1") (which MUST parse hex numbers) rather than C89 (which
must stop parsing at 'x').  These days, I suspect that is okay; but if
it fails CI checks, knowing the difference will help us decide what we
want to do about it.  Note that C2x, while not final at the time of
this patch, has been considering whether to make strtol("0b1") parse
as 1 with no slop instead of the C17 parse of 0 with slop "b1"; that
decision may also bleed over to strtod().  But for now, I didn't think
it worth adding unit tests on that front (to strtol or strtod) as
things may still change.

Likewise, there are plenty more corner cases of strtod proper that I
don't explicitly test here, but there are enough unit tests added here
that it covers all the branches reached in our wrappers.  In
particular, it demonstrates the difference on when *value is left
uninitialized, which an upcoming patch will normalize.

Signed-off-by: Eric Blake 

---

v2: Added g_assert_false(signbit(res)) anywhere I used
g_assert_cmpfloat(res,==,0.0); add a test for strtod() hex parsing and
handling of junk after ERANGE, which is major enough that I dropped
R-b
---
 tests/unit/test-cutils.c | 510 +++
 1 file changed, 510 insertions(+)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index d3076c3fec1..1763839a157 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */

+#include 
+
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/units.h"
@@ -2668,6 +2670,485 @@ static void test_qemu_strtou64_full_erange_junk(void)
 g_assert_cmpint(res, ==, UINT64_MAX);
 }

+static void test_qemu_strtod_simple(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* no radix or exponent */
+str = "1";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 1.0);
+g_assert_true(endptr == str + 1);
+
+/* leading space and sign */
+str = " -0.0";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, -0.0);
+g_assert_true(signbit(res));
+g_assert_true(endptr == str + 5);
+
+/* fraction only */
+str = "+.5";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 0.5);
+g_assert_true(endptr == str + 3);
+
+/* exponent */
+str = "1.e+1";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 10.0);
+g_assert_true(endptr == str + 5);
+
+/* hex without radix */
+str = "0x10";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 16.0);
+g_assert_true(endptr == str + 4);
+}
+
+static void test_qemu_strtod_einval(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* empty */
+str = "";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);
+
+/* NULL */
+str = NULL;
+endptr = "random";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 999.0);
+g_assert_null(endptr);
+
+/* not recognizable */
+str = " junk";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);
+}
+
+static void test_qemu_strtod_erange(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* overflow */
+str = "9e999";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, ==, HUGE_VAL);
+g_assert_true(endptr == str + 5);
+
+str = "-9e+999";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, ==, -HUGE_VAL);
+g_assert_true(endptr == str + 7);
+
+/* underflow */
+str = "-9e-999";
+endptr = "somewhere";
+res = 999;
+err = qemu_strtod(str, , );
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, >=, -DBL_MIN);
+g_assert_cmpfloat(res, <=, -0.0);
+g_assert_true(signbit(res));
+

[PATCH v2 15/19] cutils: Set value in all qemu_strtosz* error paths

2023-05-11 Thread Eric Blake

Making callers determine whether or not *value was populated on error
is not nice for usability.  Pre-patch, we have unit tests that check
that *result is left unchanged on most EINVAL errors and set to 0 on
many ERANGE errors.  This is subtly different from libc strtoumax()
behavior which returns UINT64_MAX on ERANGE errors, as well as
different from our parse_uint() which slams to 0 on EINVAL on the
grounds that we want our functions to be harder to mis-use than
strtoumax().

Let's audit callers:

- hw/core/numa.c:parse_numa() fixed in the previous patch to check for
  errors

- migration/migration-hmp-cmds.c:hmp_migrate_set_parameter(),
  monitor/hmp.c:monitor_parse_arguments(),
  qapi/opts-visitor.c:opts_type_size(),
  qapi/qobject-input-visitor.c:qobject_input_type_size_keyval(),
  qemu-img.c:cvtnum_full(), qemu-io-cmds.c:cvtnum(),
  target/i386/cpu.c:x86_cpu_parse_featurestr(), and
  util/qemu-option.c:parse_option_size() appear to reject all failures
  (although some with distinct messages for ERANGE as opposed to
  EINVAL), so it doesn't matter what is in the value parameter on
  error.

- All remaining callers are in the testsuite, where we can tweak our
  expectations to match our new desired behavior.

Advancing to the end of the string parsed on overflow (ERANGE), while
still returning 0, makes sense (UINT64_MAX as a size is unlikely to be
useful); likewise, our size parsing code is complex enough that it's
easier to always return 0 when endptr is NULL but trailing garbage was
found, rather than trying to return the value of the prefix actually
parsed (no current caller cared about the value of the prefix).

Signed-off-by: Eric Blake 

---

v2: cutils.c unchanged, but rebasing test suite is significant enough
that I doropped Hanna's R-b
---
 tests/unit/test-cutils.c | 106 +++
 util/cutils.c|  17 +--
 2 files changed, 63 insertions(+), 60 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 7800caf9b0e..4a1baf05ca6 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3273,7 +3273,7 @@ static void test_qemu_strtosz_float(void)
 do_strtosz("1.k", 0, 1024, 3);

 /* FIXME An empty fraction head should be tolerated */
-do_strtosz(" .5k", -EINVAL /* FIXME 0 */, 0xbaadf00d /* FIXME 512 */,
+do_strtosz(" .5k", -EINVAL /* FIXME 0 */, 0 /* FIXME 512 */,
0 /* FIXME 4 */);

 /* For convenience, we permit values that are not byte-exact */
@@ -3298,29 +3298,29 @@ static void test_qemu_strtosz_float(void)

 static void test_qemu_strtosz_invalid(void)
 {
-do_strtosz(NULL, -EINVAL, 0xbaadf00d, 0);
+do_strtosz(NULL, -EINVAL, 0, 0);

 /* Must parse at least one digit */
-do_strtosz("", -EINVAL, 0xbaadf00d, 0);
-do_strtosz(" \t ", -EINVAL, 0xbaadf00d, 0);
-do_strtosz(".", -EINVAL, 0xbaadf00d, 0);
-do_strtosz(" .", -EINVAL, 0xbaadf00d, 0);
-do_strtosz(" .k", -EINVAL, 0xbaadf00d, 0);
-do_strtosz("inf", -EINVAL, 0xbaadf00d, 0);
-do_strtosz("NaN", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("", -EINVAL, 0, 0);
+do_strtosz(" \t ", -EINVAL, 0, 0);
+do_strtosz(".", -EINVAL, 0, 0);
+do_strtosz(" .", -EINVAL, 0, 0);
+do_strtosz(" .k", -EINVAL, 0, 0);
+do_strtosz("inf", -EINVAL, 0, 0);
+do_strtosz("NaN", -EINVAL, 0, 0);

 /* Lone suffix is not okay */
-do_strtosz("k", -EINVAL, 0xbaadf00d, 0);
-do_strtosz(" M", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("k", -EINVAL, 0, 0);
+do_strtosz(" M", -EINVAL, 0, 0);

 /* Fractional values require scale larger than bytes */
-do_strtosz("1.1B", -EINVAL, 0xbaadf00d, 0);
-do_strtosz("1.1", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("1.1B", -EINVAL, 0, 0);
+do_strtosz("1.1", -EINVAL, 0, 0);

 /* FIXME underflow in the fraction tail should matter for 'B' */
-do_strtosz("1.1B", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("1.1B", -EINVAL, 0, 0);
 do_strtosz("1.0001B", 0 /* FIXME -EINVAL */,
-   1 /* FIXME 0xbaadf00d */, 23 /* FIXME 0 */);
+   1 /* FIXME 0 */, 23 /* FIXME 0 */);
 do_strtosz("1."
"00"
"00"
@@ -3329,62 +3329,60 @@ static void test_qemu_strtosz_invalid(void)
"00"
"00"
"00"
-   "1B", 0 /* FIXME -EINVAL */, 1 /* FIXME 0xbaadf00d */,
+   "1B", 0 /* FIXME -EINVAL */, 1 /* FIXME 0 */,
354 /* FIXME 0 */);

 /* No hex fractions */
-do_strtosz("0x1.8k", -EINVAL, 0xbaadf00d, 0);
-do_strtosz("0x1.k", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("0x1.8k", -EINVAL, 0, 0);
+do_strtosz("0x1.k", -EINVAL, 0, 0);

 /* No hex suffixes */
-

[PATCH v2 04/19] test-cutils: Test more integer corner cases

2023-05-11 Thread Eric Blake

We have quite a few undertested and underdocumented integer parsing
corner cases.  To ensure that any changes we make in the code are
intentional rather than accidental semantic changes, it is time to add
more unit tests of existing behavior.

In particular, this demonstrates that parse_uint() and qemu_strtou64()
behave differently.  For "-0", it's hard to argue why parse_uint needs
to reject it (it's not a negative integer), but the documentation sort
of mentions it; but it is intentional that all other negative values
are treated as ERANGE with value 0 (compared to qemu_strtou64()
treating "-2" as success and UINT64_MAX-1, for example).

Also, when mixing overflow/underflow with a check for no trailing
junk, parse_uint_full favors ERANGE over EINVAL, while qemu_strto[iu]*
favor EINVAL.  This behavior is outside the C standard, so we can pick
whatever we want, but it would be nice to be consistent.

Note that C requires that "9223372036854775808" fail strtoll() with
ERANGE/INT64_MAX, but "-9223372036854775808" pass with INT64_MIN; we
weren't testing this.  For strtol(), the behavior depends on whether
long is 32- or 64-bits (the cutoff point either being the same as
strtoll() or at "-2147483648").  Meanwhile, C is clear that
"-18446744073709551615" pass stroull() (but not strtoll) with value 1,
even though we want it to fail parse_uint().  And although
qemu_strtoui() has no C counterpart, it makes more sense if we design
it like 32-bit strtoul() (that is, where "-4294967296" be an alternate
acceptable spelling for "1".  We aren't there yet, so some of the
tests added in this patch have FIXME comments.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 799 ---
 1 file changed, 738 insertions(+), 61 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 1eeaf21ae22..89c10f5307a 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -150,7 +150,6 @@ static void test_parse_uint_decimal(void)
 g_assert_true(endptr == str + strlen(str));
 }

-
 static void test_parse_uint_llong_max(void)
 {
 unsigned long long i = 999;
@@ -168,16 +167,51 @@ static void test_parse_uint_llong_max(void)
 g_free(str);
 }

+static void test_parse_uint_max(void)
+{
+unsigned long long i = 999;
+char f = 'X';
+char *endptr = 
+char *str = g_strdup_printf("%llu", ULLONG_MAX);
+int r;
+
+r = parse_uint(str, , , 0);
+
+g_assert_cmpint(r, ==, 0);
+g_assert_cmpuint(i, ==, ULLONG_MAX);
+g_assert_true(endptr == str + strlen(str));
+
+g_free(str);
+}
+
 static void test_parse_uint_overflow(void)
 {
-unsigned long long i = 999;
+unsigned long long i;
 char f = 'X';
-char *endptr = 
-const char *str = "99";
+char *endptr;
+const char *str;
 int r;

+i = 999;
+endptr = 
+str = "99";
 r = parse_uint(str, , , 0);
+g_assert_cmpint(r, ==, -ERANGE);
+g_assert_cmpuint(i, ==, ULLONG_MAX);
+g_assert_true(endptr == str + strlen(str));

+i = 999;
+endptr = 
+str = "0x1"; /* 65 bits, 64-bit sign bit clear */
+r = parse_uint(str, , , 0);
+g_assert_cmpint(r, ==, -ERANGE);
+g_assert_cmpuint(i, ==, ULLONG_MAX);
+g_assert_true(endptr == str + strlen(str));
+
+i = 999;
+endptr = 
+str = "0x180008000"; /* 65 bits, 64-bit sign bit set */
+r = parse_uint(str, , , 0);
 g_assert_cmpint(r, ==, -ERANGE);
 g_assert_cmpuint(i, ==, ULLONG_MAX);
 g_assert_true(endptr == str + strlen(str));
@@ -198,6 +232,20 @@ static void test_parse_uint_negative(void)
 g_assert_true(endptr == str + strlen(str));
 }

+static void test_parse_uint_negzero(void)
+{
+unsigned long long i = 999;
+char f = 'X';
+char *endptr = 
+const char *str = " -0";
+int r;
+
+r = parse_uint(str, , , 0);
+
+g_assert_cmpint(r, ==, -ERANGE);
+g_assert_cmpuint(i, ==, 0);
+g_assert_true(endptr == str + strlen(str));
+}

 static void test_parse_uint_full_trailing(void)
 {
@@ -223,6 +271,19 @@ static void test_parse_uint_full_correct(void)
 g_assert_cmpuint(i, ==, 123);
 }

+static void test_parse_uint_full_erange_junk(void)
+{
+/* FIXME - inconsistent with qemu_strto* which favors EINVAL */
+unsigned long long i = 999;
+const char *str = "-2junk";
+int r;
+
+r = parse_uint_full(str, , 0);
+
+g_assert_cmpint(r, ==, -ERANGE /* FIXME -EINVAL */);
+g_assert_cmpuint(i, ==, 0);
+}
+
 static void test_qemu_strtoi_correct(void)
 {
 const char *str = "12345 foo";
@@ -410,7 +471,39 @@ static void test_qemu_strtoi_max(void)

 static void test_qemu_strtoi_overflow(void)
 {
-char *str = g_strdup_printf("%lld", (long long)INT_MAX + 1ll);
+const char *str;
+const char *endptr;
+int res;
+int err;
+
+str = "2147483648"; /* INT_MAX + 1ll */
+endptr = "somewhere";
+res =

[PATCH v2 08/19] cutils: Allow NULL endptr in parse_uint()

2023-05-11 Thread Eric Blake

All the qemu_strto*() functions permit a NULL endptr, just like their
libc counterparts, leaving parse_uint() as the oddball that caused
SEGFAULT on NULL and required the user to call parse_uint_full()
instead.  Relax things for consistency, even though the testsuite is
the only impacted caller.  Add one more unit test to ensure even
parse_uint_full(NULL, 0, ) works.  This also fixes our code to
uniformly favor EINVAL over ERANGE when both apply.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 18 --
 util/cutils.c| 34 --
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 0c7d07b3297..d3076c3fec1 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -260,14 +260,26 @@ static void test_parse_uint_full_correct(void)

 static void test_parse_uint_full_erange_junk(void)
 {
-/* FIXME - inconsistent with qemu_strto* which favors EINVAL */
+/* EINVAL has priority over ERANGE */
 uint64_t i = 999;
 const char *str = "-2junk";
 int r;

 r = parse_uint_full(str, 0, );

-g_assert_cmpint(r, ==, -ERANGE /* FIXME -EINVAL */);
+g_assert_cmpint(r, ==, -EINVAL);
+g_assert_cmpuint(i, ==, 0);
+}
+
+static void test_parse_uint_full_null(void)
+{
+uint64_t i = 999;
+const char *str = NULL;
+int r;
+
+r = parse_uint_full(str, 0, );
+
+g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpuint(i, ==, 0);
 }

@@ -3207,6 +3219,8 @@ int main(int argc, char **argv)
 test_parse_uint_full_correct);
 g_test_add_func("/cutils/parse_uint_full/erange_junk",
 test_parse_uint_full_erange_junk);
+g_test_add_func("/cutils/parse_uint_full/null",
+test_parse_uint_full_null);

 /* qemu_strtoi() tests */
 g_test_add_func("/cutils/qemu_strtoi/correct",
diff --git a/util/cutils.c b/util/cutils.c
index 8ab0cae5e75..e599924a0c4 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -715,8 +715,7 @@ const char *qemu_strchrnul(const char *s, int c)
  * parse_uint:
  *
  * @s: String to parse
- * @endptr: Destination for pointer to first character not consumed, must
- * not be %NULL
+ * @endptr: Destination for pointer to first character not consumed
  * @base: integer base, between 2 and 36 inclusive, or 0
  * @value: Destination for parsed integer value
  *
@@ -730,7 +729,8 @@ const char *qemu_strchrnul(const char *s, int c)
  *
  * Set *@endptr to point right beyond the parsed integer (even if the integer
  * overflows or is negative, all digits will be parsed and *@endptr will
- * point right beyond them).
+ * point right beyond them).  If @endptr is %NULL, any trailing character
+ * instead causes a result of -EINVAL with *@value of 0.
  *
  * If the integer is negative, set *@value to 0, and return -ERANGE.
  * (If you want to allow negative numbers that wrap around within
@@ -777,7 +777,12 @@ int parse_uint(const char *s, const char **endptr, int 
base, uint64_t *value)

 out:
 *value = val;
-*endptr = endp;
+if (endptr) {
+*endptr = endp;
+} else if (s && *endp) {
+r = -EINVAL;
+*value = 0;
+}
 return r;
 }

@@ -788,28 +793,13 @@ out:
  * @base: integer base, between 2 and 36 inclusive, or 0
  * @value: Destination for parsed integer value
  *
- * Parse unsigned integer from entire string
+ * Parse unsigned integer from entire string, rejecting any trailing slop.
  *
- * Have the same behavior of parse_uint(), but with an additional
- * check for additional data after the parsed number. If extra
- * characters are present after a non-overflowing parsed number, the
- * function will return -EINVAL, and *@v will be set to 0.
+ * Shorthand for parse_uint(s, NULL, base, value).
  */
 int parse_uint_full(const char *s, int base, uint64_t *value)
 {
-const char *endp;
-int r;
-
-r = parse_uint(s, , base, value);
-if (r < 0) {
-return r;
-}
-if (*endp) {
-*value = 0;
-return -EINVAL;
-}
-
-return 0;
+return parse_uint(s, NULL, base, value);
 }

 int qemu_parse_fd(const char *param)
-- 
2.40.1

[PATCH v2 16/19] cutils: Set value in all integral qemu_strto* error paths

2023-05-11 Thread Eric Blake

Our goal in writing qemu_strtoi() and friends is to have an interface
harder to abuse than libc's strtol().  Leaving the return value
uninitialized on some but not all error paths does not lend itself
well to this goal; and our documentation wasn't helpful on what to
expect.

Note that the previous patch changed all qemu_strtosz() EINVAL error
paths to slam value to 0 rather than stay uninitialized, even when the
EINVAL eror occurs because of trailing junk.  But for the remaining
integral qemu_strto*, it's easier to return the parsed value than to
force things back to zero, in part because of how check_strtox_error
works; in part because people expect that from libc strto* (while
there is no libc strtosz to compare to), and in part because doing so
creates less churn in the testsuite.

Here, the list of affected callers is much longer ('git grep
"qemu_strto[ui]" *.c **/*.c | grep -v tests/ |wc -l' outputs 87,
although a few of those are the implementation in in cutils.c), so
touching as little as possible is the wisest course of action.

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 

---

v2: commit message tweaked, but code unchanged, so R-b applied
---
 tests/unit/test-cutils.c | 24 +++
 util/cutils.c| 42 +---
 2 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 4a1baf05ca6..1272638582a 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -310,7 +310,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -610,7 +610,7 @@ static void test_qemu_strtoi_full_null(void)
 err = qemu_strtoi(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -713,7 +713,7 @@ static void test_qemu_strtoui_null(void)
 err = qemu_strtoui(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1011,7 +1011,7 @@ static void test_qemu_strtoui_full_null(void)
 err = qemu_strtoui(NULL, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtoui_full_empty(void)
@@ -,7 +,7 @@ static void test_qemu_strtol_null(void)
 err = qemu_strtol(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1415,7 +1415,7 @@ static void test_qemu_strtol_full_null(void)
 err = qemu_strtol(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1518,7 +1518,7 @@ static void test_qemu_strtoul_null(void)
 err = qemu_strtoul(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1811,7 +1811,7 @@ static void test_qemu_strtoul_full_null(void)
 err = qemu_strtoul(NULL, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtoul_full_empty(void)
@@ -1911,7 +1911,7 @@ static void test_qemu_strtoi64_null(void)
 err = qemu_strtoi64(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -2201,7 +2201,7 @@ static void test_qemu_strtoi64_full_null(void)
 err = qemu_strtoi64(NULL, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 }

 static void test_qemu_strtoi64_full_empty(void)
@@ -2304,7 +2304,7 @@ static void test_qemu_strtou64_null(void)
 err = qemu_strtou64(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -2593,7 +2593,7 @@ static void test_qemu_strtou64_full_null(void)
 err = qemu_strtou64(NULL, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtou64_full_empty(void)
diff --git a/util/cutils.c b/util/cutils.c
index 24955d3ca94..b5a6641fa0f 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -384,12 +384,13 @@ static int check_strtox_error(const char *nptr, char *ep,
  *
  * @nptr may be null, and no conversion is performed then.
  *
- * If no conversion is performed, store @nptr in *@endptr and return
- * -EINVAL.
+ * If no conversion is performed, store @nptr in

[PATCH v2 18/19] cutils: Improve qemu_strtod* error paths

2023-05-11 Thread Eric Blake

Previous patches changed all integral qemu_strto*() error paths to
guarantee that *value is never left uninitialized.  Do likewise for
qemu_strtod.  Also, tighten qemu_strtod_finite() to never return a
non-finite value (prior to this patch, we were rejecting "inf" with
-EINVAL and unspecified result 0.0, but failing "9e999" with -ERANGE
and HUGE_VAL - which is infinite on IEEE machines - despite our
function claiming to recognize only finite values).

Auditing callers, we have no external callers of qemu_strtod, and
among the callers of qemu_strtod_finite:

- qapi/qobject-input-visitor.c:qobject_input_type_number_keyval() and
  qapi/string-input-visitor.c:parse_type_number() which reject all
  errors (does not matter what we store)

- utils/cutils.c:do_strtosz() incorrectly assumes that *endptr points
  to '.' on all failures (that is, it is not distinguishing between
  EINVAL and ERANGE; and therefore still does the WRONG THING for
  "9.9e999".  The change here does not entirely fix that (a later
  patch will tackle this more systematically), but at least the value
  of endptr is now less likely to be out of bounds on overflow

- our testsuite, which we can update to match what we document

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 

---

v2: no change to cutils.c; commit message tweaks and testsuite rebase
were minor enough to keep R-b
---
 tests/unit/test-cutils.c | 63 +++-
 util/cutils.c| 32 +++-
 2 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index b8ad4d7fbac..8f2dd335f13 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2747,7 +2747,8 @@ static void test_qemu_strtod_einval(void)
 res = 999;
 err = qemu_strtod(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_null(endptr);

 /* not recognizable */
@@ -2979,7 +2980,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 /* NULL */
@@ -2988,7 +2990,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_null(endptr);

 /* not recognizable */
@@ -2997,7 +3000,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);
 }

@@ -3008,24 +3012,26 @@ static void test_qemu_strtod_finite_erange(void)
 int err;
 double res;

-/* overflow */
+/* overflow turns into EINVAL */
 str = "9e999";
 endptr = "somewhere";
 res = 999;
 err = qemu_strtod_finite(str, , );
-g_assert_cmpint(err, ==, -ERANGE);
-g_assert_cmpfloat(res, ==, HUGE_VAL);
-g_assert_true(endptr == str + 5);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);

 str = "-9e+999";
 endptr = "somewhere";
 res = 999;
 err = qemu_strtod_finite(str, , );
-g_assert_cmpint(err, ==, -ERANGE);
-g_assert_cmpfloat(res, ==, -HUGE_VAL);
-g_assert_true(endptr == str + 7);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);

-/* underflow */
+/* underflow is still possible */
 str = "-9e-999";
 endptr = "somewhere";
 res = 999;
@@ -3050,7 +3056,8 @@ static void test_qemu_strtod_finite_nonfinite(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 str = "-infinity";
@@ -3058,7 +3065,8 @@ static void test_qemu_strtod_finite_nonfinite(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 /* not a number */
@@ -3067,7 +3075,8 @@ static void test_qemu_strtod_finite_nonfinite(void)
 res = 999;
 err = qemu_strtod_finite(str, , );
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==,

[PATCH v2 19/19] cutils: Improve qemu_strtosz handling of fractions

2023-05-11 Thread Eric Blake

We have several limitations and bugs worth fixing; they are
inter-related enough that it is not worth splitting this patch into
smaller pieces:

* ".5k" should work to specify 512, just as "0.5k" does
* "1.k" and "1." + "9"*50 + "k" should both produce the same
  result of 2048 after rounding
* "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
  underflow in the fraction should not be lost
* "7.99e99" and "7.99e999" look similar, but our code was doing a
  read-out-of-bounds on the latter because it was not expecting ERANGE
  due to overflow. While we document that scientific notation is not
  supported, and the previous patch actually fixed
  qemu_strtod_finite() to no longer return ERANGE overflows, it is
  easier to pre-filter than to try and determine after the fact if
  strtod() consumed more than we wanted.  Note that this is a
  low-level semantic change (when endptr is not NULL, we can now
  successfully parse with a scale of 'E' and then report trailing
  junk, instead of failing outright with EINVAL); but an earlier
  commit already argued that this is not a high-level semantic change
  since the only caller passing in a non-NULL endptr also checks that
  the tail is whitespace-only.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
Fixes: cf923b78 ("utils: Improve qemu_strtosz() to have 64 bits of precision", 
6.0.0)
Fixes: 7625a1ed ("utils: Use fixed-point arithmetic in qemu_strtosz", 6.0.0)
Signed-off-by: Eric Blake 

---

v2: more changes, handle negatives differently, catch fractions that
round to 0 but don't underflow [Hanna]
---
 tests/unit/test-cutils.c | 50 +-
 util/cutils.c| 89 ++--
 2 files changed, 86 insertions(+), 53 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 8f2dd335f13..67c3de00c82 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3285,19 +3285,18 @@ static void test_qemu_strtosz_float(void)
 /* An empty fraction tail is tolerated */
 do_strtosz("1.k", 0, 1024, 3);

-/* FIXME An empty fraction head should be tolerated */
-do_strtosz(" .5k", -EINVAL /* FIXME 0 */, 0 /* FIXME 512 */,
-   0 /* FIXME 4 */);
+/* An empty fraction head is tolerated */
+do_strtosz(" .5k", 0, 512, 4);

 /* For convenience, we permit values that are not byte-exact */
 do_strtosz("12.345M", 0, (uint64_t) (12.345 * MiB + 0.5), 7);

-/* FIXME Fraction tail should round correctly */
+/* Fraction tail can round up */
 do_strtosz("1.k", 0, 2048, 7);
 do_strtosz("1.k", 0,
-   1024 /* FIXME 2048 */, 55);
+   2048, 55);

-/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */
+/* ERANGE underflow in the fraction tail does not matter for 'k' */
 do_strtosz("1."
"00"
"00"
@@ -3306,7 +3305,7 @@ static void test_qemu_strtosz_float(void)
"00"
"00"
"00"
-   "1k", 0, 1 /* FIXME 1024 */, 354);
+   "1k", 0, 1024, 354);
 }

 static void test_qemu_strtosz_invalid(void)
@@ -3330,10 +3329,9 @@ static void test_qemu_strtosz_invalid(void)
 do_strtosz("1.1B", -EINVAL, 0, 0);
 do_strtosz("1.1", -EINVAL, 0, 0);

-/* FIXME underflow in the fraction tail should matter for 'B' */
+/* 'B' cannot have any nonzero fraction, even with rounding or underflow */
 do_strtosz("1.1B", -EINVAL, 0, 0);
-do_strtosz("1.0001B", 0 /* FIXME -EINVAL */,
-   1 /* FIXME 0 */, 23 /* FIXME 0 */);
+do_strtosz("1.0001B", -EINVAL, 0, 0);
 do_strtosz("1."
"00"
"00"
@@ -3342,8 +3340,7 @@ static void test_qemu_strtosz_invalid(void)
"00"
"00"
"00"
-   "1B", 0 /* FIXME -EINVAL */, 1 /* FIXME 0 */,
-   354 /* FIXME 0 */);
+   "1B", -EINVAL, 0, 0);

 /* No hex fractions */
 do_strtosz("0x1.8k", -EINVAL, 0, 0);
@@ -3389,28 +3386,20 @@ static void test_qemu_strtosz_trailing(void)
 do_strtosz_full("123-45", qemu_strtosz, 0, 123, 3, -EINVAL, 0);
 do_strtosz_full(" 123 - 45", qemu_strtosz, 0, 123, 4, -EINVAL, 0);

-/* FIXME should stop parse after 'e'. No floating point exponents */
-do_strtosz_full("1.5e1k", qemu_strtosz, -EINVAL /* FIXME 0 */,
-

[PATCH v2 14/19] test-cutils: Add more coverage to qemu_strtosz11; rgb:1e1e/1e1e/1e1e

2023-05-11 Thread Eric Blake

Add some more strings that the user might send our way.  In
particular, some of these additions include FIXME comments showing
where our parser doesn't quite behave the way we want.

Signed-off-by: Eric Blake 

---

v2: even more tests added, pad a string to avoid out-of-bounds
randomness [Hanna]
---
 tests/unit/test-cutils.c | 147 +++
 1 file changed, 135 insertions(+), 12 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 1936c7b5795..7800caf9b0e 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3162,7 +3162,12 @@ static void do_strtosz_full(const char *str, 
qemu_strtosz_fn fn,
 ret = fn(str, , );
 g_assert_cmpint(ret, ==, exp_ptr_ret);
 g_assert_cmpuint(val, ==, exp_ptr_val);
-g_assert_true(endptr == str + exp_ptr_offset);
+if (str) {
+g_assert_true(endptr == str + exp_ptr_offset);
+} else {
+g_assert_cmpint(exp_ptr_offset, ==, 0);
+g_assert_null(endptr);
+}

 val = 0xbaadf00d;
 ret = fn(str, NULL, );
@@ -3198,8 +3203,8 @@ static void test_qemu_strtosz_simple(void)
 /* Leading 0 gives decimal results, not octal */
 do_strtosz("08", 0, 8, 2);

-/* Leading space is ignored */
-do_strtosz(" 12345", 0, 12345, 6);
+/* Leading space and + are ignored */
+do_strtosz(" +12345", 0, 12345, 7);

 /* 2^53-1 */
 do_strtosz("9007199254740991", 0, 0x1fULL, 16);
@@ -3226,17 +3231,27 @@ static void test_qemu_strtosz_hex(void)

 do_strtosz("0xab", 0, 171, 4);

-do_strtosz("0xae", 0, 174, 4);
+do_strtosz(" +0xae", 0, 174, 6);
 }

 static void test_qemu_strtosz_units(void)
 {
-/* default is M */
+/* default scale depends on function */
+do_strtosz("1", 0, 1, 1);
 do_strtosz_MiB("1", 0, MiB, 1);
+do_strtosz_metric("1", 0, 1, 1);

+/* Explicit byte suffix works for all functions */
 do_strtosz("1B", 0, 1, 2);
+do_strtosz_MiB("1B", 0, 1, 2);
+do_strtosz_metric("1B", 0, 1, 2);

+/* Expose the scale */
 do_strtosz("1K", 0, KiB, 2);
+do_strtosz_MiB("1K", 0, KiB, 2);
+do_strtosz_metric("1K", 0, 1000, 2);
+
+/* Other suffixes, see also test_qemu_strtosz_metric */
 do_strtosz("1M", 0, MiB, 2);
 do_strtosz("1G", 0, GiB, 2);
 do_strtosz("1T", 0, TiB, 2);
@@ -3248,14 +3263,37 @@ static void test_qemu_strtosz_float(void)
 {
 do_strtosz("0.5E", 0, EiB / 2, 4);

+/* Implied M suffix okay */
+do_strtosz_MiB("0.5", 0, MiB / 2, 3);
+
 /* For convenience, a fraction of 0 is tolerated even on bytes */
 do_strtosz("1.0B", 0, 1, 4);

-/* An empty fraction is tolerated */
+/* An empty fraction tail is tolerated */
 do_strtosz("1.k", 0, 1024, 3);

+/* FIXME An empty fraction head should be tolerated */
+do_strtosz(" .5k", -EINVAL /* FIXME 0 */, 0xbaadf00d /* FIXME 512 */,
+   0 /* FIXME 4 */);
+
 /* For convenience, we permit values that are not byte-exact */
 do_strtosz("12.345M", 0, (uint64_t) (12.345 * MiB + 0.5), 7);
+
+/* FIXME Fraction tail should round correctly */
+do_strtosz("1.k", 0, 2048, 7);
+do_strtosz("1.k", 0,
+   1024 /* FIXME 2048 */, 55);
+
+/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */
+do_strtosz("1."
+   "00"
+   "00"
+   "00"
+   "00"
+   "00"
+   "00"
+   "00"
+   "1k", 0, 1 /* FIXME 1024 */, 354);
 }

 static void test_qemu_strtosz_invalid(void)
@@ -3265,57 +3303,142 @@ static void test_qemu_strtosz_invalid(void)
 /* Must parse at least one digit */
 do_strtosz("", -EINVAL, 0xbaadf00d, 0);
 do_strtosz(" \t ", -EINVAL, 0xbaadf00d, 0);
-do_strtosz("crap", -EINVAL, 0xbaadf00d, 0);
+do_strtosz(".", -EINVAL, 0xbaadf00d, 0);
+do_strtosz(" .", -EINVAL, 0xbaadf00d, 0);
+do_strtosz(" .k", -EINVAL, 0xbaadf00d, 0);
 do_strtosz("inf", -EINVAL, 0xbaadf00d, 0);
 do_strtosz("NaN", -EINVAL, 0xbaadf00d, 0);

+/* Lone suffix is not okay */
+do_strtosz("k", -EINVAL, 0xbaadf00d, 0);
+do_strtosz(" M", -EINVAL, 0xbaadf00d, 0);
+
 /* Fractional values require scale larger than bytes */
 do_strtosz("1.1B", -EINVAL, 0xbaadf00d, 0);
 do_strtosz("1.1", -EINVAL, 0xbaadf00d, 0);

+/* FIXME underflow in the fraction tail should matter for 'B' */
+do_strtosz("1.1B", -EINVAL, 0xbaadf00d, 0);
+do_strtosz("1.0001B", 0 /* FIXME -EINVAL */,
+   1 /* FIXME 0xbaadf00d */, 23 /* FIXME 0

[PATCH v2 06/19] cutils: Document differences between parse_uint and qemu_strtou64

2023-05-11 Thread Eric Blake

These two functions are subtly different, and not just because of
swapped parameter order.  It took me adding better unit tests to
figure out why.  Document the differences to make it more obvious to
developers trying to pick which one to use, as well as to aid in
upcoming semantic changes.

While touching the documentation, adjust a mis-statement: parse_uint
does not return -EINVAL on invalid base, but assert()s, like all the
other qemu_strto* functions that take a base argument.

Signed-off-by: Eric Blake 
---
 util/cutils.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/util/cutils.c b/util/cutils.c
index 997ddcd09e5..4e3cc6e3605 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -604,6 +604,8 @@ int qemu_strtoi64(const char *nptr, const char **endptr, 
int base,
  * Convert string @nptr to an uint64_t.
  *
  * Works like qemu_strtoul(), except it stores UINT64_MAX on overflow.
+ * (If you want to prohibit negative numbers that wrap around to
+ * positive, use parse_uint()).
  */
 int qemu_strtou64(const char *nptr, const char **endptr, int base,
   uint64_t *result)
@@ -714,7 +716,8 @@ const char *qemu_strchrnul(const char *s, int c)
  *
  * @s: String to parse
  * @value: Destination for parsed integer value
- * @endptr: Destination for pointer to first character not consumed
+ * @endptr: Destination for pointer to first character not consumed, must
+ * not be %NULL
  * @base: integer base, between 2 and 36 inclusive, or 0
  *
  * Parse unsigned integer
@@ -722,15 +725,16 @@ const char *qemu_strchrnul(const char *s, int c)
  * Parsed syntax is like strtoull()'s: arbitrary whitespace, a single optional
  * '+' or '-', an optional "0x" if @base is 0 or 16, one or more digits.
  *
- * If @s is null, or @base is invalid, or @s doesn't start with an
- * integer in the syntax above, set *@value to 0, *@endptr to @s, and
- * return -EINVAL.
+ * If @s is null, or @s doesn't start with an integer in the syntax
+ * above, set *@value to 0, *@endptr to @s, and return -EINVAL.
  *
  * Set *@endptr to point right beyond the parsed integer (even if the integer
  * overflows or is negative, all digits will be parsed and *@endptr will
  * point right beyond them).
  *
  * If the integer is negative, set *@value to 0, and return -ERANGE.
+ * (If you want to allow negative numbers that wrap around within
+ * bounds, use qemu_strtou64()).
  *
  * If the integer overflows unsigned long long, set *@value to
  * ULLONG_MAX, and return -ERANGE.
@@ -787,10 +791,10 @@ out:
  *
  * Parse unsigned integer from entire string
  *
- * Have the same behavior of parse_uint(), but with an additional check
- * for additional data after the parsed number. If extra characters are present
- * after the parsed number, the function will return -EINVAL, and *@v will
- * be set to 0.
+ * Have the same behavior of parse_uint(), but with an additional
+ * check for additional data after the parsed number. If extra
+ * characters are present after a non-overflowing parsed number, the
+ * function will return -EINVAL, and *@v will be set to 0.
  */
 int parse_uint_full(const char *s, unsigned long long *value, int base)
 {
-- 
2.40.1

[PATCH v2 01/19] test-cutils: Avoid g_assert in unit tests

2023-05-11 Thread Eric Blake

glib documentation[1] is clear: g_assert() should be avoided in unit
tests because it is ineffective if G_DISABLE_ASSERT is defined; unit
tests should stick to constructs based on g_assert_true() instead.
Note that since commit 262a69f428, we intentionally state that you
cannot define G_DISABLE_ASSERT that while building qemu; but our code
can be copied to other projects without that restriction, so we should
be consistent.

For most of the replacements in this patch, using g_assert_cmpstr()
would be a regression in quality - although it would helpfully display
the string contents of both pointers on test failure, here, we really
do care about pointer equality, not just string content equality.  But
when a NULL pointer is expected, g_assert_null works fine.

[1] https://libsoup.org/glib/glib-Testing.html#g-assert

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
 tests/unit/test-cutils.c | 324 +++
 1 file changed, 162 insertions(+), 162 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 3c4f8754202..0202ac0d5b3 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -1,7 +1,7 @@
 /*
  * cutils.c unit-tests
  *
- * Copyright (C) 2013 Red Hat Inc.
+ * Copyright Red Hat
  *
  * Authors:
  *  Eduardo Habkost 
@@ -40,7 +40,7 @@ static void test_parse_uint_null(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == NULL);
+g_assert_null(endptr);
 }

 static void test_parse_uint_empty(void)
@@ -55,7 +55,7 @@ static void test_parse_uint_empty(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void test_parse_uint_whitespace(void)
@@ -70,7 +70,7 @@ static void test_parse_uint_whitespace(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }


@@ -86,7 +86,7 @@ static void test_parse_uint_invalid(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }


@@ -102,7 +102,7 @@ static void test_parse_uint_trailing(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + 3);
+g_assert_true(endptr == str + 3);
 }

 static void test_parse_uint_correct(void)
@@ -117,7 +117,7 @@ static void test_parse_uint_correct(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_octal(void)
@@ -132,7 +132,7 @@ static void test_parse_uint_octal(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 0123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_decimal(void)
@@ -147,7 +147,7 @@ static void test_parse_uint_decimal(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }


@@ -163,7 +163,7 @@ static void test_parse_uint_llong_max(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));

 g_free(str);
 }
@@ -180,7 +180,7 @@ static void test_parse_uint_overflow(void)

 g_assert_cmpint(r, ==, -ERANGE);
 g_assert_cmpint(i, ==, ULLONG_MAX);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_negative(void)
@@ -195,7 +195,7 @@ static void test_parse_uint_negative(void)

 g_assert_cmpint(r, ==, -ERANGE);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }


@@ -235,7 +235,7 @@ static void test_qemu_strtoi_correct(void)

 g_assert_cmpint(err, ==, 0);
 g_assert_cmpint(res, ==, 12345);
-g_assert(endptr == str + 5);
+g_assert_true(endptr == str + 5);
 }

 static void test_qemu_strtoi_null(void)
@@ -248,7 +248,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == NULL);
+g_assert_null(endptr);
 }

 static void test_qemu_strtoi_empty(void)
@@ -262,7 +262,7 @@ static void test_qemu_strtoi_empty(void)
 err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void test_qemu_strtoi_whitespace(void)
@@ -276,7 +276,7 @@ static void test_qemu_strtoi_whitespace(void)
 err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void

[PATCH v2 11/19] test-cutils: Refactor qemu_strtosz tests for less boilerplate

2023-05-11 Thread Eric Blake

No need to copy-and-paste lots of boilerplate per string tested, when
we can consolidate that behind helper functions.  Plus, this adds a
bit more coverage (we now test all strings both with and without
endptr, whereas before some tests skipped the NULL endptr case), which
exposed a SEGFAULT on qemu_strtosz(NULL, NULL, ) that will be
parsed in an upcoming patch.

Note that duplicating boilerplate has one advantage lost here - a
failed test tells you which line number failed; but a helper function
does not show the call stack that reached the failure.  Since we call
the helper more than once within many of the "unit tests", even the
unit test name doesn't point out which call is failing.  But that only
matters when tests fail (they normally pass); at which point I'm
debugging the failures under gdb anyways, so I'm not too worried about
it.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 503 ---
 1 file changed, 100 insertions(+), 403 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index fe9be690faf..5c9ed78be93 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3149,473 +3149,170 @@ static void test_qemu_strtod_finite_erange_junk(void)
 g_assert_cmpfloat(res, ==, 999.0);
 }

+typedef int (*qemu_strtosz_fn)(const char *, const char **, uint64_t *);
+static void do_strtosz_full(const char *str, qemu_strtosz_fn fn,
+int exp_ptr_ret, uint64_t exp_ptr_val,
+size_t exp_ptr_offset, int exp_null_ret,
+uint64_t exp_null_val)
+{
+const char *endptr = "somewhere";
+uint64_t val = 0xbaadf00d;
+int ret;
+
+ret = fn(str, , );
+g_assert_cmpint(ret, ==, exp_ptr_ret);
+g_assert_cmpuint(val, ==, exp_ptr_val);
+g_assert_true(endptr == str + exp_ptr_offset);
+
+val = 0xbaadf00d;
+ret = fn(str, NULL, );
+g_assert_cmpint(ret, ==, exp_null_ret);
+g_assert_cmpuint(val, ==, exp_null_val);
+}
+
+static void do_strtosz(const char *str, int exp_ret, uint64_t exp_val,
+   size_t exp_offset)
+{
+do_strtosz_full(str, qemu_strtosz, exp_ret, exp_val, exp_offset,
+exp_ret, exp_val);
+}
+
+static void do_strtosz_MiB(const char *str, int exp_ret, uint64_t exp_val,
+   size_t exp_offset)
+{
+do_strtosz_full(str, qemu_strtosz_MiB, exp_ret, exp_val, exp_offset,
+exp_ret, exp_val);
+}
+
+static void do_strtosz_metric(const char *str, int exp_ret, uint64_t exp_val,
+  size_t exp_offset)
+{
+do_strtosz_full(str, qemu_strtosz_metric, exp_ret, exp_val, exp_offset,
+exp_ret, exp_val);
+}
+
 static void test_qemu_strtosz_simple(void)
 {
-const char *str;
-const char *endptr;
-int err;
-uint64_t res;
-
-str = "0";
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmpuint(res, ==, 0);
-g_assert_true(endptr == str + 1);
+do_strtosz("0", 0, 0, 1);

 /* Leading 0 gives decimal results, not octal */
-str = "08";
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmpuint(res, ==, 8);
-g_assert_true(endptr == str + 2);
+do_strtosz("08", 0, 8, 2);

 /* Leading space is ignored */
-str = " 12345";
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmpuint(res, ==, 12345);
-g_assert_true(endptr == str + 6);
+do_strtosz(" 12345", 0, 12345, 6);

-res = 0xbaadf00d;
-err = qemu_strtosz(str, NULL, );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmpuint(res, ==, 12345);
+/* 2^53-1 */
+do_strtosz("9007199254740991", 0, 0x1fULL, 16);

-str = "9007199254740991"; /* 2^53-1 */
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmphex(res, ==, 0x1fULL);
-g_assert_true(endptr == str + 16);
+/* 2^53 */
+do_strtosz("9007199254740992", 0, 0x20ULL, 16);

-str = "9007199254740992"; /* 2^53 */
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmphex(res, ==, 0x20ULL);
-g_assert_true(endptr == str + 16);
+/* 2^53+1 */
+do_strtosz("9007199254740993", 0, 0x21ULL, 16);

-str = "9007199254740993"; /* 2^53+1 */
-endptr = str;
-res = 0xbaadf00d;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, 0);
-g_assert_cmphex(res, ==, 0x21ULL);
-g_assert_true(endptr == str + 16);
+/* 0xf800 (53 msbs set) */
+do_strtosz("18446744073709549568", 0, 0xf800ULL, 20);

-str = "18446744073709549568"; /* 0xf800 (53

[PATCH v2 10/19] test-cutils: Prepare for upcoming semantic change in qemu_strtosz

2023-05-11 Thread Eric Blake

A quick search for 'qemu_strtosz' in the code base shows that outside
of the testsuite, the ONLY place that passes a non-NULL pointer to
@endptr of any variant of a size parser is in hmp.c (the 'o' parser of
monitor_parse_arguments), and that particular caller warns of
"extraneous characters at the end of line" unless the trailing bytes
are purely whitespace.  Thus, it makes no semantic difference at the
high level whether we parse "1.5e1k" as "1" + ".5e1" + "k" (an attempt
to use scientific notation in strtod with a scaling suffix of 'k' with
no trailing junk, but which qemu_strtosz says should fail with
EINVAL), or as "1.5e" + "1k" (a valid size with scaling suffix of 'e'
for exabytes, followed by two junk bytes) - either way, any user
passing such a string will get an error message about a parse failure.

However, an upcoming patch to qemu_strtosz will fix other corner case
bugs in handling the fractional portion of a size, and in doing so, it
is easier to declare that qemu_strtosz() itself stops parsing at the
first 'e' rather than blindly consuming whatever strtod() will
recognize.  Once that is fixed, the difference will be visible at the
low level (getting a valid parse with trailing garbage when @endptr is
non-NULL, while continuing to get -EINVAL when @endptr is NULL); this
is easier to demonstrate by moving the affected strings from
test_qemu_strtosz_invalid() (which declares them as always -EINVAL) to
test_qemu_strtosz_trailing() (where @endptr affects behavior, for now
with FIXME comments).

Note that a similar argument could be made for having "0x1.5" or
"0x1M" parse as 0x1 with ".5" or "M" as trailing junk, instead of
blindly treating it as -EINVAL; however, as these cases do not suffer
from the same problems as floating point, they are not worth changing
at this time.

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
 tests/unit/test-cutils.c | 42 ++--
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 1763839a157..fe9be690faf 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3440,21 +3440,6 @@ static void test_qemu_strtosz_invalid(void)
 g_assert_cmphex(res, ==, 0xbaadf00d);
 g_assert_true(endptr == str);

-/* No floating point exponents */
-str = "1.5e1k";
-endptr = NULL;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
-g_assert_true(endptr == str);
-
-str = "1.5E+0k";
-endptr = NULL;
-err = qemu_strtosz(str, , );
-g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
-g_assert_true(endptr == str);
-
 /* No hex fractions */
 str = "0x1.8k";
 endptr = NULL;
@@ -3558,6 +3543,33 @@ static void test_qemu_strtosz_trailing(void)
 err = qemu_strtosz(str, NULL, );
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmphex(res, ==, 0xbaadf00d);
+
+/* FIXME should stop parse after 'e'. No floating point exponents */
+str = "1.5e1k";
+endptr = NULL;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, , );
+g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
+g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */);
+g_assert_true(endptr == str /* FIXME + 4 */);
+
+res = 0xbaadf00d;
+err = qemu_strtosz(str, NULL, );
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0xbaadf00d);
+
+str = "1.5E+0k";
+endptr = NULL;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, , );
+g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
+g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */);
+g_assert_true(endptr == str /* FIXME + 4 */);
+
+res = 0xbaadf00d;
+err = qemu_strtosz(str, NULL, );
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmphex(res, ==, 0xbaadf00d);
 }

 static void test_qemu_strtosz_erange(void)
-- 
2.40.1

[PATCH v2 03/19] test-cutils: Test integral qemu_strto* value on failures

2023-05-11 Thread Eric Blake

We are inconsistent on the contents of *value after a strto* parse
failure.  I found the following behaviors:

- parse_uint() and parse_uint_full(), which document that *value is
  slammed to 0 on all EINVAL failures and 0 or UINT_MAX on ERANGE
  failures, and has unit tests for that (note that parse_uint requires
  non-NULL endptr, and does not fail with EINVAL for trailing junk)

- qemu_strtosz(), which leaves *value untouched on all failures (both
  EINVAL and ERANGE), and has unit tests but not documentation for
  that

- qemu_strtoi() and other integral friends, which document *value on
  ERANGE failures but is unspecified on EINVAL (other than implicitly
  by comparison to libc strto*); there, *value is untouched for NULL
  string, slammed to 0 on no conversion, and left at the prefix value
  on NULL endptr; unit tests do not consistently check the value

- qemu_strtod(), which documents *value on ERANGE failures but is
  unspecified on EINVAL; there, *value is untouched for NULL string,
  slammed to 0.0 for no conversion, and left at the prefix value on
  NULL endptr; there are no unit tests (other than indirectly through
  qemu_strtosz)

- qemu_strtod_finite(), which documents *value on ERANGE failures but
  is unspecified on EINVAL; there, *value is left at the prefix for
  'inf' or 'nan' and untouched in all other cases; there are no unit
  tests (other than indirectly through qemu_strtosz)

Upcoming patches will change behaviors for consistency, but it's best
to first have more unit test coverage to see the impact of those
changes.

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
 tests/unit/test-cutils.c | 58 +++-
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 38bd3990207..1eeaf21ae22 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -248,6 +248,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -262,6 +263,7 @@ static void test_qemu_strtoi_empty(void)
 err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -276,6 +278,7 @@ static void test_qemu_strtoi_whitespace(void)
 err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -290,6 +293,7 @@ static void test_qemu_strtoi_invalid(void)
 err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -473,6 +477,7 @@ static void test_qemu_strtoi_full_null(void)
 err = qemu_strtoi(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -485,6 +490,7 @@ static void test_qemu_strtoi_full_empty(void)
 err = qemu_strtoi(str, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 }

 static void test_qemu_strtoi_full_negative(void)
@@ -502,18 +508,19 @@ static void test_qemu_strtoi_full_negative(void)
 static void test_qemu_strtoi_full_trailing(void)
 {
 const char *str = "123xxx";
-int res;
+int res = 999;
 int err;

 err = qemu_strtoi(str, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 123);
 }

 static void test_qemu_strtoi_full_max(void)
 {
 char *str = g_strdup_printf("%d", INT_MAX);
-int res;
+int res = 999;
 int err;

 err = qemu_strtoi(str, NULL, 0, );
@@ -548,6 +555,7 @@ static void test_qemu_strtoui_null(void)
 err = qemu_strtoui(NULL, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -562,6 +570,7 @@ static void test_qemu_strtoui_empty(void)
 err = qemu_strtoui(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -576,6 +585,7 @@ static void test_qemu_strtoui_whitespace(void)
 err = qemu_strtoui(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -590,6 +600,7 @@ static void test_qemu_strtoui_invalid(void)
 err = qemu_strtoui(str, , 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -771,6 +782,7 @@ static void test_qemu_strtoui_full_null(void)
 err = qemu_strtoui(NULL, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 999);
 }

 static void test_qemu_strtoui_full_empty(void)
@@ -782,7 +794,9 @@ static void test_qemu_strtoui_full_empty(void)
 err = qemu_strtoui(str, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==,

[PATCH v2 05/19] cutils: Fix wraparound parsing in qemu_strtoui

2023-05-11 Thread Eric Blake

While we were matching 32-bit strtol in qemu_strtoi, our use of a
64-bit parse was leaking through for some inaccurate answers in
qemu_strtoui in comparison to a 32-bit strtoul.  Fix those, and update
the testsuite now that our bounds checks are correct.

Our int wrappers would be a lot easier to write if libc had a
guaranteed 32-bit parser even on platforms with 64-bit long.

Fixes: 473a2a331e ("cutils: add qemu_strtoi & qemu_strtoui parsers for 
int/unsigned int types", v2.12.0)
Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 11 +--
 util/cutils.c| 14 ++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 89c10f5307a..08989d1d3ac 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -858,7 +858,7 @@ static void test_qemu_strtoui_hex(void)

 static void test_qemu_strtoui_wrap(void)
 {
-/* FIXME - wraparound should be consistent with 32-bit strtoul */
+/* wraparound is consistent with 32-bit strtoul */
 const char *str = "-4294967295"; /* 1 mod 2^32 */
 char f = 'X';
 const char *endptr = 
@@ -867,8 +867,8 @@ static void test_qemu_strtoui_wrap(void)

 err = qemu_strtoui(str, , 0, );

-g_assert_cmpint(err, ==, -ERANGE /* FIXME 0 */);
-g_assert_cmphex(res, ==, UINT_MAX /* FIXME 1 */);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmphex(res, ==, 1);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -935,13 +935,12 @@ static void test_qemu_strtoui_underflow(void)
 g_assert_cmpint(res, ==, UINT_MAX);
 g_assert_true(endptr == str + strlen(str));

-/* FIXME - overflow should be consistent with 32-bit strtoul */
 str = "-18446744073709551615"; /* -UINT64_MAX (not 1) */
 endptr = "somewhere";
 res = 999;
 err = qemu_strtoui(str, , 0, );
-g_assert_cmpint(err, ==, 0 /* FIXME -ERANGE */);
-g_assert_cmpint(res, ==, 1 /* FIXME UINT_MAX */);
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpint(res, ==, UINT_MAX);
 g_assert_true(endptr == str + strlen(str));

 str = "-0x1"; /* 65 bits, 32-bit sign bit clear */
diff --git a/util/cutils.c b/util/cutils.c
index 5887e744140..997ddcd09e5 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -466,10 +466,16 @@ int qemu_strtoui(const char *nptr, const char **endptr, 
int base,
 if (errno == ERANGE) {
 *result = -1;
 } else {
-if (lresult > UINT_MAX) {
-*result = UINT_MAX;
-errno = ERANGE;
-} else if (lresult < INT_MIN) {
+/*
+ * Note that platforms with 32-bit strtoul accept input in the
+ * range [-4294967295, 4294967295]; but we used 64-bit
+ * strtoull which wraps -18446744073709551615 to 1.  Reject
+ * positive values that contain '-', and wrap all valid
+ * negative values.
+ */
+if (lresult > UINT_MAX ||
+lresult < -(long long)UINT_MAX ||
+(lresult > 0 && memchr(nptr, '-', ep - nptr))) {
 *result = UINT_MAX;
 errno = ERANGE;
 } else {
-- 
2.40.1

[PATCH v2 00/19] Fix qemu_strtosz() read-out-of-bounds

2023-05-11 Thread Eric Blake

v1 was here:
https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg01988.html

since then:
- make parse_uint easier to use, then use it in qemu_strtosz
- add even more unit tests
- fix a bug in qemu_strtoui
- avoid dereferencing randome memory during unit tests [Hanna]
- other cleanups as I found them
- compress the strtosz unit tests (the major cause of the large
  interdiff statistics)

backport-diff looks like:

001/19:[] [--] 'test-cutils: Avoid g_assert in unit tests'
002/19:[] [--] 'test-cutils: Use g_assert_cmpuint where appropriate'
003/19:[] [--] 'test-cutils: Test integral qemu_strto* value on failures'
004/19:[down] 'test-cutils: Test more integer corner cases'
005/19:[down] 'cutils: Fix wraparound parsing in qemu_strtoui'
006/19:[down] 'cutils: Document differences between parse_uint and 
qemu_strtou64'
007/19:[down] 'cutils: Adjust signature of parse_uint[_full]'
008/19:[down] 'cutils: Allow NULL endptr in parse_uint()'
009/19:[0147] [FC] 'test-cutils: Add coverage of qemu_strtod'
010/19:[] [--] 'test-cutils: Prepare for upcoming semantic change in 
qemu_strtosz'
011/19:[down] 'test-cutils: Refactor qemu_strtosz tests for less boilerplate'
012/19:[down] 'cutils: Allow NULL str in qemu_strtosz'
013/19:[] [--] 'numa: Check for qemu_strtosz_MiB error'
014/19:[down] 'test-cutils: Add more coverage to 
qemu_strtosz11;rgb:1e1e/1e1e/1e1e'
015/19:[0178] [FC] 'cutils: Set value in all qemu_strtosz* error paths'
016/19:[] [--] 'cutils: Set value in all integral qemu_strto* error paths'
017/19:[down] 'cutils: Use parse_uint in qemu_strtosz for negative rejection'
018/19:[0018] [FC] 'cutils: Improve qemu_strtod* error paths'
019/19:[0107] [FC] 'cutils: Improve qemu_strtosz handling of fractions'


Eric Blake (19):
  test-cutils: Avoid g_assert in unit tests
  test-cutils: Use g_assert_cmpuint where appropriate
  test-cutils: Test integral qemu_strto* value on failures
  test-cutils: Test more integer corner cases
  cutils: Fix wraparound parsing in qemu_strtoui
  cutils: Document differences between parse_uint and qemu_strtou64
  cutils: Adjust signature of parse_uint[_full]
  cutils: Allow NULL endptr in parse_uint()
  test-cutils: Add coverage of qemu_strtod
  test-cutils: Prepare for upcoming semantic change in qemu_strtosz
  test-cutils: Refactor qemu_strtosz tests for less boilerplate
  cutils: Allow NULL str in qemu_strtosz
  numa: Check for qemu_strtosz_MiB error
  test-cutils: Add more coverage to qemu_strtosz11;rgb:1e1e/1e1e/1e1e
  cutils: Set value in all qemu_strtosz* error paths
  cutils: Set value in all integral qemu_strto* error paths
  cutils: Use parse_uint in qemu_strtosz for negative rejection
  cutils: Improve qemu_strtod* error paths
  cutils: Improve qemu_strtosz handling of fractions

 include/qemu/cutils.h |5 +-
 audio/audio_legacy.c  |4 +-
 block/gluster.c   |4 +-
 block/nfs.c   |4 +-
 blockdev.c|4 +-
 contrib/ivshmem-server/main.c |4 +-
 hw/core/numa.c|   11 +-
 qapi/opts-visitor.c   |   10 +-
 tests/unit/test-cutils.c  | 2340 -
 ui/vnc.c  |4 +-
 util/cutils.c |  251 ++--
 util/guest-random.c   |4 +-
 util/qemu-sockets.c   |   10 +-
 13 files changed, 1891 insertions(+), 764 deletions(-)


base-commit: 278238505d28d292927bff7683f39fb4fbca7fd1
-- 
2.40.1

[PATCH v2 02/19] test-cutils: Use g_assert_cmpuint where appropriate

2023-05-11 Thread Eric Blake

When debugging test failures, seeing unsigned values as large positive
values rather than negative values matters (assuming glib 2.78+; given
that I just fixed a bug in glib 2.76 [1] where g_assert_cmpuint
displays signed instead of unsigned values).  No impact when the test
is passing, but using a consistent style will matter more in upcoming
test additions.  Also, some tests are better with cmphex.

While at it, fix some spacing and minor typing issues spotted nearby.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/2997

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
---
 tests/unit/test-cutils.c | 148 +++
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 0202ac0d5b3..38bd3990207 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -39,7 +39,7 @@ static void test_parse_uint_null(void)
 r = parse_uint(NULL, , , 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_null(endptr);
 }

@@ -54,7 +54,7 @@ static void test_parse_uint_empty(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -69,7 +69,7 @@ static void test_parse_uint_whitespace(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -85,7 +85,7 @@ static void test_parse_uint_invalid(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -101,7 +101,7 @@ static void test_parse_uint_trailing(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + 3);
 }

@@ -116,7 +116,7 @@ static void test_parse_uint_correct(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -131,7 +131,7 @@ static void test_parse_uint_octal(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 0123);
+g_assert_cmpuint(i, ==, 0123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -146,7 +146,7 @@ static void test_parse_uint_decimal(void)
 r = parse_uint(str, , , 10);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -162,7 +162,7 @@ static void test_parse_uint_llong_max(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1);
+g_assert_cmpuint(i, ==, (unsigned long long)LLONG_MAX + 1);
 g_assert_true(endptr == str + strlen(str));

 g_free(str);
@@ -179,7 +179,7 @@ static void test_parse_uint_overflow(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, -ERANGE);
-g_assert_cmpint(i, ==, ULLONG_MAX);
+g_assert_cmpuint(i, ==, ULLONG_MAX);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -194,7 +194,7 @@ static void test_parse_uint_negative(void)
 r = parse_uint(str, , , 0);

 g_assert_cmpint(r, ==, -ERANGE);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -208,7 +208,7 @@ static void test_parse_uint_full_trailing(void)
 r = parse_uint_full(str, , 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 }

 static void test_parse_uint_full_correct(void)
@@ -220,7 +220,7 @@ static void test_parse_uint_full_correct(void)
 r = parse_uint_full(str, , 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 }

 static void test_qemu_strtoi_correct(void)
@@ -428,7 +428,7 @@ static void test_qemu_strtoi_underflow(void)
 int res = 999;
 int err;

-err  = qemu_strtoi(str, , 0, );
+err = qemu_strtoi(str, , 0, );

 g_assert_cmpint(err, ==, -ERANGE);
 g_assert_cmpint(res, ==, INT_MIN);
@@ -479,10 +479,10 @@ static void test_qemu_strtoi_full_null(void)
 static void test_qemu_strtoi_full_empty(void)
 {
 const char *str = "";
-int res = 999L;
+int res = 999;
 int err;

-err =  qemu_strtoi(str, NULL, 0, );
+err = qemu_strtoi(str, NULL, 0, );

 g_assert_cmpint(err, ==, -EINVAL);
 }
@@ -728,7 +728,7 @@ static void test_qemu_strtoui_underflow(void)
 unsigned int res = 999;
 int err;

-err  = qemu_strtoui(str, , 0, );
+err = qemu_strtoui(str, , 0, );

 g_assert_cmpint(err, ==, -ERANGE);

Re: css_clear_io_interrupt() error handling

2023-05-11 Thread Halil Pasic

On Thu, 11 May 2023 14:20:51 +0200
Markus Armbruster  wrote:
[..]
> >
> > In my opinion the best way to deal with such situations would be to
> > abort() in test/development and log a warning in production. Of course  
> 
> Understand, but...
> 
> > assert() wouldn't give me that, and it wouldn't be locally consistent at
> > all.  
> 
> ... nothing behaves like that so far.
> 

I understand. And I agree with all statements from your previous mail. 

> Let's try to come to a conclusion.  We can either keep the current
> behavior, i.e. abort().  Or we change it to just print something.
> 
> If we want the latter: fprintf() to stderr, warn_report(), or trace
> point?
> 
> You are the maintainer, so the decision is yours.
> 
> I could stick a patch into a series of error-related cleanup patches I'm
> working on.

I would gladly take that offer. Given that we didn't see any crashes and
thus violations of assumptions up till now, and that both the kvm and the
qemu implementations are from my perspective stable, I think not forcing
a crash is a good option. From the options you offered, warn_report()
looks the most compelling to me, but I would trust your expertise to pick
the actually best one.

Thank you very much.

> 
> 
> [*] I'm rather fond of the trick to have oopsie() fork & crash.

I never thought of this, but I do actually find it very compelling
to get a dump while keeping the workload alive. Especially if
it was oopsie_once() so one does not get buried in dumps. But we don't
do things like this in QEMU, or do we?

Regards,
Halil

[PATCH] contrib/ivshmem-server: allow 2G ivshmem region

2023-05-11 Thread Erika Hunhoff

The ivshmem-server failed when configured with the following:
./ivshmem-server -F -S ivshmem-file -l 2G
This is because the ivshmem_server_ftruncate fails without calling
ftruncate.
This commit allows the ivshmem-server to create a region of size 2G.

Signed-off-by: Erika Hunhoff 
---
 contrib/ivshmem-server/ivshmem-server.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/contrib/ivshmem-server/ivshmem-server.c 
b/contrib/ivshmem-server/ivshmem-server.c
index 2f3c7320a6..ef31f07914 100644
--- a/contrib/ivshmem-server/ivshmem-server.c
+++ b/contrib/ivshmem-server/ivshmem-server.c
@@ -239,13 +239,17 @@ ivshmem_server_ftruncate(int fd, unsigned shmsize)
 return 0;
 }
 
-while (shmsize <= IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE) {
+/*
+ * This is a do-while loop in case
+ * shmsize > IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE
+ */
+do {
 ret = ftruncate(fd, shmsize);
 if (ret == 0) {
 return ret;
 }
 shmsize *= 2;
-}
+} while (shmsize <= IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE);
 
 return -1;
 }
-- 
2.25.1

Re: [RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support

2023-05-11 Thread Nathan Fontenot

On 5/11/23 12:56, Fan Ni wrote:
> From: Fan Ni 
> 
> Per cxl spec 3.0, add dynamic capacity region representative based on
> Table 8-126 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 68 +
>  include/hw/cxl/cxl_device.h | 16 +
>  2 files changed, 84 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7ff4fbdf22..61c77e52d8 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -81,6 +81,8 @@ enum {
>  #define GET_POISON_LIST0x0
>  #define INJECT_POISON  0x1
>  #define CLEAR_POISON   0x2
> + DCD_CONFIG = 0x48, /*8.2.9.8.9*/
> + #define GET_DC_REGION_CONFIG   0x0
>  PHYSICAL_SWITCH = 0x51
>  #define IDENTIFY_SWITCH_DEVICE  0x0
>  };
> @@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd 
> *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.2
> + * Get Dynamic Capacity Configuration
> + **/
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> + CXLDeviceState *cxl_dstate,
> + uint16_t *len)
> +{
> + struct get_dyn_cap_config_in_pl {
> + uint8_t region_cnt;
> + uint8_t start_region_id;
> + } QEMU_PACKED;
> +
> +struct get_dyn_cap_config_out_pl {
> + uint8_t num_regions;
> + uint8_t rsvd1[7];
> + struct {
> + uint64_t base;
> + uint64_t decode_len;
> + uint64_t region_len;
> + uint64_t block_size;
> + uint32_t dsmadhandle;
> + uint8_t flags;
> + uint8_t rsvd2[3];
> + } QEMU_PACKED records[];

Could you declare CXLDCD_Region as QEMU_PACKED and use it here instead of
re-defining the region structure?

> + } QEMU_PACKED;
> +
> + struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
> + struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
> + struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, 
> cxl_dstate);
> + uint16_t record_count = 0, i = 0;
> + uint16_t out_pl_len;
> +
> + if (in->start_region_id >= ct3d->dc.num_regions)
> + record_count = 0;
> + else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
> + record_count = ct3d->dc.num_regions - in->start_region_id;
> + else
> + record_count = in->region_cnt;
> +
> + out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> + assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> + memset(out, 0, out_pl_len);
> + out->num_regions = record_count;
> + for (; i < record_count; i++) {
> + stq_le_p(>records[i].base,
> + ct3d->dc.regions[in->start_region_id+i].base);
> + stq_le_p(>records[i].decode_len,
> + ct3d->dc.regions[in->start_region_id+i].decode_len);
> + stq_le_p(>records[i].region_len,
> + ct3d->dc.regions[in->start_region_id+i].len);
> + stq_le_p(>records[i].block_size,
> + ct3d->dc.regions[in->start_region_id+i].block_size);
> + stl_le_p(>records[i].dsmadhandle,
> + ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
> + out->records[i].flags
> + = ct3d->dc.regions[in->start_region_id+i].flags;

In this loop your reading from 'in' and writing to 'out' where in and out both
point to the same payload buffer. It works because of the structure layouts but
feels like a bug waiting to happen. Perhaps saving start_region to a local 
variable
and using that for the loop?

-Nathan

> + }
> +
> + *len = out_pl_len;
> + return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>  cmd_media_inject_poison, 8, 0 },
>  [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
>  cmd_media_clear_poison, 72, 0 },
> + [DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
> + cmd_dcd_get_dyn_cap_config, 2, 0 },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index e285369693..8a04e53e90 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -383,6 +383,17 @@ typedef struct CXLPoison {
>  typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  #define CXL_POISON_LIST_LIMIT 256
>  
>

Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ

2023-05-11 Thread Bernhard Beschow




Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland 
:
>Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
>calling PCI bus IRQ handler function directly. In order to preserve the
>existing behaviour update pci_qdev_realize() so that it automatically connects
>the PCI device IRQ to the PCI bus IRQ handler.
>
>Finally add a "QEMU interface" description documenting the new PCI device IRQ
>gpio next to the declaration of TYPE_PCI_DEVICE.
>
>Signed-off-by: Mark Cave-Ayland 
>---
> hw/pci/pci.c | 12 ++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>index 9471f996a7..3da1481eb5 100644
>--- a/hw/pci/pci.c
>+++ b/hw/pci/pci.c
>@@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
> 
> void pci_set_irq(PCIDevice *pci_dev, int level)
> {
>-int intx = pci_intx(pci_dev);
>-pci_irq_handler(pci_dev, intx, level);
>+qemu_set_irq(pci_dev->irq, level);
> }
> 
> /* Special hooks used by device assignment */
>@@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
>**errp)
> pci_set_power(pci_dev, true);
> 
> pci_dev->msi_trigger = pci_msi_trigger;
>+
>+/* Connect device IRQ to bus */
>+qdev_connect_gpio_out(DEVICE(pci_dev), 0,
>+  pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);

I think this is confusing a few things. In my understanding -- unlike ISA -- 
PCI considers interrupt lanes only for PCI slots but not for buses. So for 
example each PCI slot could have its own direct connections (up to four, 
intA..intD) to the interrupt controller. IOW interrupt lanes and PCI buses are 
unrelated, thus PCIBus shouldn't really have IRQs.

Moreover, in case the interrupt lines are shared between multiple PCI slots, a 
usual pattern is to swizzle these lines such that the intAs from the slots 
don't all occupy just one IRQ line. That means that depending on the slot the 
device is plugged into a different lane is triggered. Above code, however, 
would always trigger the same line and wouldn't even allow for modeling the 
swizzeling.

Also, above code would cause out of bounds array accesses if a PCI device had 
more functions than there are on "the bus":
For example, consider PIIX which has four PIRQs, so ARRAY_SIZE(irq_fn) == 4, 
right? devfn can be up to 8 according to the PCI spec which would cause an out 
if bounds array access above.

I think that this commit does actually re-define how PCI buses work in QEMU 
although the cover letter claims to save this for another day. We should 
probably not apply the series in its current form.

Best regards,
Bernhard

> }
> 
> static void pci_device_init(Object *obj)
>@@ -2850,6 +2853,11 @@ void pci_set_power(PCIDevice *d, bool state)
> }
> }
> 
>+/*
>+ * QEMU interface:
>+ * + Unnamed GPIO output: set to 1 if the PCI Device has asserted its irq
>+ */
>+
> static const TypeInfo pci_device_type_info = {
> .name = TYPE_PCI_DEVICE,
> .parent = TYPE_DEVICE,

Re: [PATCH v4 20/20] aio: remove aio_disable_external() API

2023-05-11 Thread Stefan Hajnoczi

On Thu, May 04, 2023 at 11:34:17PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > All callers now pass is_external=false to aio_set_fd_handler() and
> > aio_set_event_notifier(). The aio_disable_external() API that
> > temporarily disables fd handlers that were registered is_external=true
> > is therefore dead code.
> > 
> > Remove aio_disable_external(), aio_enable_external(), and the
> > is_external arguments to aio_set_fd_handler() and
> > aio_set_event_notifier().
> > 
> > The entire test-fdmon-epoll test is removed because its sole purpose was
> > testing aio_disable_external().
> > 
> > Parts of this patch were generated using the following coccinelle
> > (https://coccinelle.lip6.fr/) semantic patch:
> > 
> >   @@
> >   expression ctx, fd, is_external, io_read, io_write, io_poll, 
> > io_poll_ready, opaque;
> >   @@
> >   - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, 
> > io_poll_ready, opaque)
> >   + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, 
> > opaque)
> > 
> >   @@
> >   expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
> >   @@
> >   - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, 
> > io_poll_ready)
> >   + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
> > 
> > Reviewed-by: Juan Quintela 
> > Reviewed-by: Philippe Mathieu-Daudé 
> > Signed-off-by: Stefan Hajnoczi 
> 
> > diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
> > index 1683aa1105..6b6a1a91f8 100644
> > --- a/util/fdmon-epoll.c
> > +++ b/util/fdmon-epoll.c
> > @@ -133,13 +128,12 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, 
> > unsigned npfd)
> >  return false;
> >  }
> >  
> > -/* Do not upgrade while external clients are disabled */
> > -if (qatomic_read(>external_disable_cnt)) {
> > -return false;
> > -}
> > -
> > -if (npfd < EPOLL_ENABLE_THRESHOLD) {
> > -return false;
> > +if (npfd >= EPOLL_ENABLE_THRESHOLD) {
> > +if (fdmon_epoll_try_enable(ctx)) {
> > +return true;
> > +} else {
> > +fdmon_epoll_disable(ctx);
> > +}
> >  }
> >  
> >  /* The list must not change while we add fds to epoll */
> 
> I don't understand this hunk. Why are you changing more than just
> deleting the external_disable_cnt check?
> 
> Is this a mismerge with your own commit e62da985?

Yes, it's a mismerge. Thanks for catching that!

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()

2023-05-11 Thread Stefan Hajnoczi

On Thu, May 04, 2023 at 11:13:42PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Detach ioeventfds during drained sections to stop I/O submission from
> > the guest. virtio-blk is no longer reliant on aio_disable_external()
> > after this patch. This will allow us to remove the
> > aio_disable_external() API once all other code that relies on it is
> > converted.
> > 
> > Take extra care to avoid attaching/detaching ioeventfds if the data
> > plane is started/stopped during a drained section. This should be rare,
> > but maybe the mirror block job can trigger it.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  hw/block/dataplane/virtio-blk.c | 17 +--
> >  hw/block/virtio-blk.c   | 38 -
> >  2 files changed, 48 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index bd7cc6e76b..d77fc6028c 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> >  }
> >  
> >  /* Get this show started by hooking up our callbacks */
> > -aio_context_acquire(s->ctx);
> > -for (i = 0; i < nvqs; i++) {
> > -VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > +if (!blk_in_drain(s->conf->conf.blk)) {
> > +aio_context_acquire(s->ctx);
> > +for (i = 0; i < nvqs; i++) {
> > +VirtQueue *vq = virtio_get_queue(s->vdev, i);
> >  
> > -virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > +virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > +}
> > +aio_context_release(s->ctx);
> >  }
> > -aio_context_release(s->ctx);
> >  return 0;
> >  
> >fail_aio_context:
> > @@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> >  trace_virtio_blk_data_plane_stop(s);
> >  
> >  aio_context_acquire(s->ctx);
> > -aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +
> > +if (!blk_in_drain(s->conf->conf.blk)) {
> > +aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +}
> 
> So here we actually get a semantic change: What you described as the
> second part in the previous patch, processing the virtqueue one last
> time, isn't done any more if the device is drained.
> 
> If it's okay to just skip this during drain, why do we need to do it
> outside of drain?

I forgot to answer why we need to process requests one last time outside
drain.

This approach comes from how vhost uses ioeventfd. When switching from
vhost back to QEMU emulation, there's a chance that a final virtqueue
kick snuck in while ioeventfd was being disabled.

This is not the case with dataplane nowadays (it may have in the past).
The only state dataplane transitions are device reset and 'stop'/'cont'.
Neither of these require QEMU to process new requests in while stopping
dataplane.

My confidence is not 100%, but still pretty high that the
virtio_queue_host_notifier_read() call could be dropped from dataplane
code. Since I'm not 100% sure I didn't attempt that.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v4 17/20] virtio-blk: implement BlockDevOps->drained_begin()

2023-05-11 Thread Stefan Hajnoczi

On Thu, May 04, 2023 at 11:13:42PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Detach ioeventfds during drained sections to stop I/O submission from
> > the guest. virtio-blk is no longer reliant on aio_disable_external()
> > after this patch. This will allow us to remove the
> > aio_disable_external() API once all other code that relies on it is
> > converted.
> > 
> > Take extra care to avoid attaching/detaching ioeventfds if the data
> > plane is started/stopped during a drained section. This should be rare,
> > but maybe the mirror block job can trigger it.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  hw/block/dataplane/virtio-blk.c | 17 +--
> >  hw/block/virtio-blk.c   | 38 -
> >  2 files changed, 48 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index bd7cc6e76b..d77fc6028c 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -245,13 +245,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> >  }
> >  
> >  /* Get this show started by hooking up our callbacks */
> > -aio_context_acquire(s->ctx);
> > -for (i = 0; i < nvqs; i++) {
> > -VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > +if (!blk_in_drain(s->conf->conf.blk)) {
> > +aio_context_acquire(s->ctx);
> > +for (i = 0; i < nvqs; i++) {
> > +VirtQueue *vq = virtio_get_queue(s->vdev, i);
> >  
> > -virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > +virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > +}
> > +aio_context_release(s->ctx);
> >  }
> > -aio_context_release(s->ctx);
> >  return 0;
> >  
> >fail_aio_context:
> > @@ -317,7 +319,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> >  trace_virtio_blk_data_plane_stop(s);
> >  
> >  aio_context_acquire(s->ctx);
> > -aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +
> > +if (!blk_in_drain(s->conf->conf.blk)) {
> > +aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +}
> 
> So here we actually get a semantic change: What you described as the
> second part in the previous patch, processing the virtqueue one last
> time, isn't done any more if the device is drained.
> 
> If it's okay to just skip this during drain, why do we need to do it
> outside of drain?

Yes, it's safe because virtio_blk_data_plane_stop() has two cases:
1. The device is being reset. It is not necessary to process new
   requests.
2. 'stop'/'cont'. 'cont' will call virtio_blk_data_plane_start() ->
   event_notifier_set() so new requests will be processed when the guest
   resumes exection.

That's why I think this is safe and the right thing to do.

However, your question led me to a pre-existing drain bug when a vCPU
resets the device during a drained section (e.g. when a mirror block job
has started a drained section and the main loop runs until the block job
exits). New requests must not be processed by
virtio_blk_data_plane_stop() because that would violate drain semantics.

It turns out requests are still processed because of
virtio_blk_data_plane_stop() -> virtio_bus_cleanup_host_notifier() ->
virtio_queue_host_notifier_read().

I think that should be handled in a separate patch series. It's not
related to aio_disable_external().

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v4 16/20] virtio: make it possible to detach host notifier from any thread

2023-05-11 Thread Stefan Hajnoczi

On Thu, May 04, 2023 at 11:00:35PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > virtio_queue_aio_detach_host_notifier() does two things:
> > 1. It removes the fd handler from the event loop.
> > 2. It processes the virtqueue one last time.
> > 
> > The first step can be peformed by any thread and without taking the
> > AioContext lock.
> > 
> > The second step may need the AioContext lock (depending on the device
> > implementation) and runs in the thread where request processing takes
> > place. virtio-blk and virtio-scsi therefore call
> > virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
> > AioContext
> > 
> > Scheduling a BH is undesirable for .drained_begin() functions. The next
> > patch will introduce a .drained_begin() function that needs to call
> > virtio_queue_aio_detach_host_notifier().
> 
> Why is it undesirable? In my mental model, .drained_begin() is still
> free to start as many asynchronous things as it likes. The only
> important thing to take care of is that .drained_poll() returns true as
> long as the BH (or other asynchronous operation) is still pending.
> 
> Of course, your way of doing things still seems to result in simpler
> code because you don't have to deal with a BH at all if you only really
> want the first part and not the second.

I have clarified this in the commit description. We can't wait
synchronously, but we could wait asynchronously as you described. It's
simpler to split the function instead of implementing async wait using
.drained_poll().

> 
> > Move the virtqueue processing out to the callers of
> > virtio_queue_aio_detach_host_notifier() so that the function can be
> > called from any thread. This is in preparation for the next patch.
> 
> Did you forget to remove it in virtio_queue_aio_detach_host_notifier()?
> If it's unchanged, I don't think the AioContext requirement is lifted.

Yes! Thank you :)

> 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  hw/block/dataplane/virtio-blk.c | 2 ++
> >  hw/scsi/virtio-scsi-dataplane.c | 9 +
> >  2 files changed, 11 insertions(+)
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index b28d81737e..bd7cc6e76b 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -286,8 +286,10 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
> >  
> >  for (i = 0; i < s->conf->num_queues; i++) {
> >  VirtQueue *vq = virtio_get_queue(s->vdev, i);
> > +EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
> >  
> >  virtio_queue_aio_detach_host_notifier(vq, s->ctx);
> > +virtio_queue_host_notifier_read(host_notifier);
> >  }
> >  }
> 
> The existing code in virtio_queue_aio_detach_host_notifier() has a
> comment before the read:
> 
> /* Test and clear notifier before after disabling event,
>  * in case poll callback didn't have time to run. */
> 
> Do we want to keep it around in the new places? (And also fix the
> "before after", I suppose, or replace it with a similar, but better
> comment that explains why we're reading here.)

I will add the comment.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH] scsi: check inquiry buffer length to prevent crash

2023-05-11 Thread Théo Maillart

>From 31fd9e07df62663e6fb427ce3e7e767e07cf7aeb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Th=C3=A9o=20Maillart?= 
Date: Wed, 26 Apr 2023 13:57:44 +0200
Subject: [PATCH] scsi: check inquiry buffer length to prevent crash
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Using linux 6.x guest, at boot time, an inquiry makes qemu crash.

More precisely, linux version containing v5.18-rc1-157-gc92a6b5d6335.

Here is a trace of the scsi inquiry in question:

scsi_req_parsed target 1 lun 0 tag 0x2cffb48 command 18 dir 1 length 4
scsi_req_parsed_lba target 1 lun 0 tag 0x2cffb48 command 18 lba 110592
scsi_req_alloc target 1 lun 0 tag 0x2cffb48
scsi_inquiry target 1 lun 0 tag 0x2cffb48 page 0x01/0xb0
scsi_generic_send_command Command: data= 0x12 0x01 0xb0 0x00 0x04 0x00
scsi_req_continue target 1 lun 0 tag 0x2cffb48
scsi_generic_read_data scsi_read_data tag=0x2cffb48
scsi_generic_aio_sgio_command generic aio sgio: tag=0x2cffb48 cmd=0x12
timeout=3
scsi_generic_read_complete Data ready tag=0x2cffb48 len=4
scsi_req_data target 1 lun 0 tag 0x2cffb48 len 4
scsi_req_continue target 1 lun 0 tag 0x2cffb48
scsi_generic_read_data scsi_read_data tag=0x2cffb48
scsi_generic_command_complete_noio Command complete 0x7fb0870b80
tag=0x2cffb48 status=0
scsi_req_dequeue target 1 lun 0 tag 0x2cffb48

And here is a backtrace from the crash:

 #0  0x007face68580 in a_crash () at ./src/internal/atomic.h:250
 #1  get_nominal_size (end=0x7f6758f92c "", p=0x7f6758f920 "") at
src/malloc/mallocng/meta.h:168
 #2  __libc_free (p=0x7f6758f920) at src/malloc/mallocng/free.c:110
 #3  0x005592f93ed8 in scsi_free_request (req=0x7fac2c6b50) at
../hw/scsi/scsi-generic.c:70
 #4  0x005592f869b8 in scsi_req_unref (req=0x7fac2c6b50) at
../hw/scsi/scsi-bus.c:1382
 #5  0x005592f94b7c in scsi_read_complete (opaque=0x7fac2c6b50, ret=0)
at ../hw/scsi/scsi-generic.c:354
 #6  0x005593659b90 in blk_aio_complete (acb=0x7f66c206a0) at
../block/block-backend.c:1527
 #7  0x00559365a3c8 in blk_aio_ioctl_entry (opaque=0x7f66c206a0) at
../block/block-backend.c:1735
 #8  0x0055937dee64 in coroutine_bootstrap (self=0x7f672f77e0,
co=0x7f672f77e0) at ../util/coroutine-sigaltstack.c:104
 #9  0x0055937deed8 in coroutine_trampoline (signal=12) at
../util/coroutine-sigaltstack.c:145
 #10 
 #11 __cp_end () at src/thread/aarch64/syscall_cp.s:30
 #12 0x007facea8214 in __syscall_cp_c (nr=133, u=,
v=, w=, x=,
 y=, z=) at src/thread/pthread_cancel.c:33
 #13 0x007facefa020 in ?? ()

Signed-off-by: Théo Maillart 
---
 hw/scsi/scsi-generic.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index ac9fa662b4..13f01e311d 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -191,12 +191,24 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq
*r, SCSIDevice *s, int len)
 if ((s->type == TYPE_DISK || s->type == TYPE_ZBC) &&
 (r->req.cmd.buf[1] & 0x01)) {
 page = r->req.cmd.buf[2];
-if (page == 0xb0) {
+if (page == 0xb0 && r->buflen > 8) {
 uint64_t max_transfer = calculate_max_transfer(s);
-stl_be_p(>buf[8], max_transfer);
+uint8_t buf[4];
+
+stl_be_p(buf, max_transfer);
+if (r->buflen <= 12) {
+memcpy(>buf[8], buf, r->buflen - 8);
+return len;
+}
+memcpy(>buf[8], buf, sizeof(uint32_t));
+
 /* Also take care of the opt xfer len. */
-stl_be_p(>buf[12],
-MIN_NON_ZERO(max_transfer, ldl_be_p(>buf[12])));
+stl_be_p(buf, MIN_NON_ZERO(max_transfer,
ldl_be_p(>buf[12])));
+if (r->buflen <= 16) {
+memcpy(>buf[12], buf, r->buflen - 12);
+return len;
+}
+memcpy(>buf[12], buf, sizeof(uint32_t));
 } else if (s->needs_vpd_bl_emulation && page == 0x00 && r->buflen
>= 4) {
 /*
  * Now we're capable of supplying the VPD Block Limits
-- 
2.40.0

Re: [PATCH 3/3] softmmu/ioport.c: make MemoryRegionPortioList owner of portio_list MemoryRegions

2023-05-11 Thread Philippe Mathieu-Daudé


On 19/4/23 17:16, Mark Cave-Ayland wrote:

Currently when portio_list MemoryRegions are freed using portio_list_destroy() 
the RCU
thread segfaults generating a backtrace similar to that below:

 #0 0x599a34b6 in phys_section_destroy ../softmmu/physmem.c:996
 #1 0x599a37a3 in phys_sections_free ../softmmu/physmem.c:1011
 #2 0x599b24aa in address_space_dispatch_free ../softmmu/physmem.c:2430
 #3 0x5996a283 in flatview_destroy ../softmmu/memory.c:292
 #4 0x5a2cb9fb in call_rcu_thread ../util/rcu.c:284
 #5 0x5a29b71d in qemu_thread_start ../util/qemu-thread-posix.c:541
 #6 0x74a0cea6 in start_thread nptl/pthread_create.c:477
 #7 0x7492ca2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfca2e)

The problem here is that portio_list_destroy() unparents the portio_list 
MemoryRegions
causing them to be freed immediately, however the flatview still has a 
reference to the
MemoryRegion and so causes a use-after-free segfault when the RCU thread next 
updates
the flatview.

Solve the lifetime issue by making MemoryRegionPortioList the owner of the 
portio_list
MemoryRegions, and then reparenting them to the portio_list owner. This ensures 
that they
can be accessed as QOM childen via the portio_list owner, yet the 
MemoryRegionPortioList


"children"


owns the refcount.

Update portio_list_destroy() to unparent the MemoryRegion from the portio_list 
owner and
then add a finalize() method to MemoryRegionPortioList, so that the portio_list
MemoryRegions remain allocated until flatview_destroy() removes the final 
refcount upon
the next flatview update.

Signed-off-by: Mark Cave-Ayland 
---
  softmmu/ioport.c | 34 +++---
  1 file changed, 31 insertions(+), 3 deletions(-)




@@ -230,6 +230,8 @@ static void portio_list_add_1(PortioList *piolist,
unsigned off_low, unsigned off_high)
  {
  MemoryRegionPortioList *mrpio;
+Object *owner;
+char *name;
  unsigned i;
  
  /* Copy the sub-list and null-terminate it.  */

@@ -246,8 +248,25 @@ static void portio_list_add_1(PortioList *piolist,
  mrpio->ports[i].base = start + off_low;
  }
  
-memory_region_init_io(>mr, piolist->owner, _ops, mrpio,

+/*
+ * The MemoryRegion owner is the MemoryRegionPortioList since that manages
+ * the lifecycle via the refcount
+ */
+memory_region_init_io(>mr, OBJECT(mrpio), _ops, mrpio,
piolist->name, off_high - off_low);
+
+/* Reparent the MemoryRegion to the piolist owner */
+object_ref(>mr);
+object_unparent(OBJECT(>mr));


Out of this patch scope, but could this part <...


+if (!piolist->owner) {
+owner = container_get(qdev_get_machine(), "/unattached");
+} else {
+owner = piolist->owner;
+}
+name = g_strdup_printf("%s[*]", piolist->name);
+object_property_add_child(owner, name, OBJECT(>mr));
+g_free(name);


...> be extracted as qdev_add_child()? It seems to duplicate
code from device_set_realized().


  if (piolist->flush_coalesced_mmio) {
  memory_region_set_flush_coalesced(>mr);
  }
@@ -305,10 +324,19 @@ void portio_list_del(PortioList *piolist)
  }
  }


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 3/5] migration: Teach dirtyrate about qemu_target_page_size()

2023-05-11 Thread Juan Quintela

Philippe Mathieu-Daudé  wrote:
> On 11/5/23 16:12, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>   migration/dirtyrate.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> index 38ea95af59..6706e3fe66 100644
>> --- a/migration/dirtyrate.c
>> +++ b/migration/dirtyrate.c
>> @@ -314,10 +314,10 @@ static void update_dirtyrate(uint64_t msec)
>>   static uint32_t get_ramblock_vfn_hash(struct RamblockDirtyInfo *info,
>> uint64_t vfn)
>>   {
>> +int page_size = qemu_target_page_size();
>
> size_t? Otherwise,

Nice catch.  Will change it.

Not that it matters in real life. The bigger value that
qemu_target_page_size() can give with current hardware is 64K, so
anything bigger than a short should work O:-)

>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>>   uint32_t crc;
>>   -crc = crc32(0, (info->ramblock_addr +
>> -vfn * TARGET_PAGE_SIZE), TARGET_PAGE_SIZE);
>> +crc = crc32(0, info->ramblock_addr + vfn * page_size, page_size);
>> trace_get_ramblock_vfn_hash(info->idstr, vfn, crc);
>>   return crc;

Re: [PATCH 1/3] hw/loongarch/virt: Modify ipi as percpu device

2023-05-11 Thread Philippe Mathieu-Daudé


On 6/4/23 12:00, Song Gao wrote:

ipi is used to communicate between cpus, this patch modified
loongarch ipi device as percpu deivce, so that there are
2 MemoryRegions with ipi device, rather than 2*cpus
MemoryRegions, which may be large than QDEV_MAX_MMIO if
more cpus are added on loongarch virt machine.

Signed-off-by: Song Gao 
---
  hw/intc/loongarch_ipi.c | 32 ++--
  hw/loongarch/virt.c | 12 ++--
  include/hw/intc/loongarch_ipi.h | 10 --
  include/hw/loongarch/virt.h |  1 -
  4 files changed, 20 insertions(+), 35 deletions(-)




  static const VMStateDescription vmstate_ipi_core = {
@@ -233,7 +223,7 @@ static const VMStateDescription vmstate_ipi_core = {
  VMSTATE_UINT32(en, IPICore),
  VMSTATE_UINT32(set, IPICore),
  VMSTATE_UINT32(clear, IPICore),
-VMSTATE_UINT32_ARRAY(buf, IPICore, MAX_IPI_MBX_NUM * 2),
+VMSTATE_UINT32_ARRAY(buf, IPICore, 2),


Since this break the migration stream, you should update the version_id.


  VMSTATE_END_OF_LIST()
  }
  };
@@ -243,9 +233,7 @@ static const VMStateDescription vmstate_loongarch_ipi = {
  .version_id = 0,


Ditto.


  .minimum_version_id = 0,
  .fields = (VMStateField[]) {
-VMSTATE_STRUCT_ARRAY(ipi_core, LoongArchMachineState,
- MAX_IPI_CORE_NUM, 0,
- vmstate_ipi_core, IPICore),
+VMSTATE_STRUCT(ipi_core, LoongArchIPI, 0, vmstate_ipi_core, IPICore),
  VMSTATE_END_OF_LIST()
  }
  };


The rest LGTM.

Re: [PATCH 3/3] hw/loongarch/virt: Set max 256 cpus support on loongarch virt machine

2023-05-11 Thread Philippe Mathieu-Daudé


On 6/4/23 12:00, Song Gao wrote:

Add separate macro EXTIOI_CPUS for extioi interrupt controller, extioi
only supports 4 cpu. And set macro LOONGARCH_MAX_CPUS as 256 so that
loongarch virt machine supports more cpus.

Interrupts from external devices can only be routed cpu 0-3 because
of extioi limits, cpu internal interrupt such as timer/ipi can be
triggered on all cpus.

Signed-off-by: Song Gao 
---
  hw/intc/loongarch_extioi.c |  4 ++--
  hw/loongarch/virt.c| 21 ++---
  include/hw/intc/loongarch_extioi.h | 10 ++
  include/hw/loongarch/virt.h|  2 +-
  4 files changed, 23 insertions(+), 14 deletions(-)




@@ -618,10 +623,12 @@ static void loongarch_irq_init(LoongArchMachineState 
*lams)
   * cpu_pin[9:2] <= intc_pin[7:0]
   */
  for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
-cpudev = DEVICE(qemu_get_cpu(cpu));
-for (pin = 0; pin < LS3A_INTC_IP; pin++) {
-qdev_connect_gpio_out(extioi, (cpu * 8 + pin),
-  qdev_get_gpio_in(cpudev, pin + 2));
+if (cpu < EXTIOI_CPUS) {


Alternatively:

   for (cpu = 0; cpu < MIN(ms->smp.cpus, EXTIOI_CPUS); cpu++) {


+cpudev = DEVICE(qemu_get_cpu(cpu));
+for (pin = 0; pin < LS3A_INTC_IP; pin++) {
+qdev_connect_gpio_out(extioi, (cpu * 8 + pin),
+  qdev_get_gpio_in(cpudev, pin + 2));
+   }
  }
  }


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/3] hw/intc: Add NULL pointer check on LoongArch ipi device

2023-05-11 Thread Philippe Mathieu-Daudé


On 6/4/23 12:00, Song Gao wrote:

When ipi mailbox is used, cpu index is decoded from iocsr register.
cpu maybe does not exist. This patch adss NULL pointer check on
ipi device.


How can that happens from a guest vcpu context?


Signed-off-by: Song Gao 
---
  hw/intc/loongarch_ipi.c | 31 +++
  1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/intc/loongarch_ipi.c b/hw/intc/loongarch_ipi.c
index 0563d83a35..39e899df46 100644
--- a/hw/intc/loongarch_ipi.c
+++ b/hw/intc/loongarch_ipi.c
@@ -86,11 +86,12 @@ static void ipi_send(uint64_t val)
  /* IPI status vector */
  data = 1 << (val & 0x1f);
  cs = qemu_get_cpu(cpuid);
-cpu = LOONGARCH_CPU(cs);
-env = >env;
-address_space_stl(>address_space_iocsr, 0x1008,
-  data, MEMTXATTRS_UNSPECIFIED, NULL);
-
+if (cs) {
+cpu = LOONGARCH_CPU(cs);
+env = >env;
+address_space_stl(>address_space_iocsr, 0x1008,
+  data, MEMTXATTRS_UNSPECIFIED, NULL);
+}


Is that the hardware behavior?

Could logging the invalid cpuid request be useful?

   else {

   //log or trace event here
   }

  }

Re: [PATCH v2 5/5] migration: Make dirtyrate.c target independent

2023-05-11 Thread Philippe Mathieu-Daudé


On 11/5/23 16:12, Juan Quintela wrote:

After the previous two patches, there is nothing else that is target
specific.

Signed-off-by: Juan Quintela 
Reviewed-by: Richard Henderson 

---

- Remove check for CONFIG_SOFTMMU for dirtyrate.c, not needed (thanks
   Richard)
---
  migration/dirtyrate.c | 2 --
  migration/meson.build | 4 ++--
  2 files changed, 2 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 3/5] migration: Teach dirtyrate about qemu_target_page_size()

2023-05-11 Thread Philippe Mathieu-Daudé


On 11/5/23 16:12, Juan Quintela wrote:

Signed-off-by: Juan Quintela 
---
  migration/dirtyrate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 38ea95af59..6706e3fe66 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -314,10 +314,10 @@ static void update_dirtyrate(uint64_t msec)
  static uint32_t get_ramblock_vfn_hash(struct RamblockDirtyInfo *info,
uint64_t vfn)
  {
+int page_size = qemu_target_page_size();


size_t? Otherwise,

Reviewed-by: Philippe Mathieu-Daudé 


  uint32_t crc;
  
-crc = crc32(0, (info->ramblock_addr +

-vfn * TARGET_PAGE_SIZE), TARGET_PAGE_SIZE);
+crc = crc32(0, info->ramblock_addr + vfn * page_size, page_size);
  
  trace_get_ramblock_vfn_hash(info->idstr, vfn, crc);

  return crc;

[Qemu RFC 0/7] Early enabling of DCD emulation in Qemu

2023-05-11 Thread Fan Ni

Since the early draft of DCD support in kernel is out
(https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
this patch series provide dcd emulation in qemu so people who are interested
can have an early try. It is noted that the patch series may need to be updated
accordingly if the kernel side implementation changes.

To support DCD emulation, the patch series add DCD related mailbox command
support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
with dynamic capacity extent and region representative.
To support read/write to the dynamic capacity of the device, a host backend
is provided and necessary check mechnism is added to ensure the dynamic
capacity accessed is backed with active dc extents.
Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
, but we add two qmp interfaces for adding/releasing dynamic capacity extents.
Also, the support for multiple hosts sharing the same DCD case is missing.

Things we can try with the patch series together with kernel dcd code:
1. Create DC regions to cover the address range of the dynamic capacity
regions.
2. Add/release dynamic capacity extents to the device and notify the
kernel.
3. Test kernel side code to accept added dc extents and create dax devices,
and release dc extents and notify the device
4. Online the memory range backed with dc extents and let application use
them.

The patch series is based on Jonathan's local qemu branch:
https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28

Simple tests peformed with the patch series:
1 Install cxl modules:

modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem

2 Create dc regions:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x1000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x1000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

/home/fan/cxl/tools-and-scripts# cxl list
[
  {
"memdevs":[
  {
"memdev":"mem0",
"pmem_size":536870912,
"ram_size":0,
"serial":0,
"host":":0d:00.0"
  }
]
  },
  {
"regions":[
  {
"region":"region0",
"resource":45365592064,
"size":268435456,
"interleave_ways":1,
"interleave_granularity":256,
"decode_state":"commit"
  }
]
  }
]

3 Add two dc extents (128MB each) through qmp interface

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity-event",
"arguments": {
 "path": "/machine/peripheral/cxl-pmem0",
"region-id" : 0,
 "num-extent": 2,
"dpa":0,
"extent-len": 128
}
}

/home/fan/cxl/tools-and-scripts# lsmem
RANGE  SIZE   STATE REMOVABLE   BLOCK
0x-0x7fff2G  online   yes0-15
0x0001-0x00027fff6G  online   yes   32-79
0x000a9000-0x000a9fff  256M offline   338-339

Memory block size:   128M
Total online memory:   8G
Total offline memory:256M


4.Online the momory with 'daxctl online-memory dax0.0' to online the memory

/home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
[  230.730553] Fallback order for Node 0: 0 1
[  230.730825] Fallback order for Node 1: 1 0
[  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
[  230.731110] Policy zone: Normal
onlined memory for 1 device

root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
RANGE  SIZE   STATE REMOVABLE BLOCK
0x-0x7fff2G  online   yes  0-15
0x0001-0x00027fff6G  online   yes 32-79
0x000a9000-0x000a97ff  128M  online   yes   338
0x000a9800-0x000a9fff  128M offline 339

Memory block size:   128M
Total online memory: 8.1G
Total offline memory:128M

5 using dc extents as regular memory

/home/fan/cxl/ndctl# numactl --membind=1 ls
CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
ndctl  scripts  test.h  version.h.in COPYING acpi.h
config.h.meson   cxl  make-git-snapshot.sh  ndctl.spec.in  sles tools
Documentationbuild  contrib  daxctl   meson.build   
rhel
tagstopology.png LICENSESccan   cscope.files
git-version  meson_options.txt  rpmbuild.shtest util


QEMU command line cxl configuration:

RP1="-object 
memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
-object

[RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support

2023-05-11 Thread Fan Ni

From: Fan Ni 

Per cxl spec 3.0, add dynamic capacity region representative based on
Table 8-126 and extend the cxl type3 device definition to include dc region
information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Signed-off-by: Fan Ni 
---
 hw/cxl/cxl-mailbox-utils.c  | 68 +
 include/hw/cxl/cxl_device.h | 16 +
 2 files changed, 84 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7ff4fbdf22..61c77e52d8 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -81,6 +81,8 @@ enum {
 #define GET_POISON_LIST0x0
 #define INJECT_POISON  0x1
 #define CLEAR_POISON   0x2
+   DCD_CONFIG = 0x48, /*8.2.9.8.9*/
+   #define GET_DC_REGION_CONFIG   0x0
 PHYSICAL_SWITCH = 0x51
 #define IDENTIFY_SWITCH_DEVICE  0x0
 };
@@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * cxl spec 3.0: 8.2.9.8.9.2
+ * Get Dynamic Capacity Configuration
+ **/
+static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+   struct get_dyn_cap_config_in_pl {
+   uint8_t region_cnt;
+   uint8_t start_region_id;
+   } QEMU_PACKED;
+
+struct get_dyn_cap_config_out_pl {
+   uint8_t num_regions;
+   uint8_t rsvd1[7];
+   struct {
+   uint64_t base;
+   uint64_t decode_len;
+   uint64_t region_len;
+   uint64_t block_size;
+   uint32_t dsmadhandle;
+   uint8_t flags;
+   uint8_t rsvd2[3];
+   } QEMU_PACKED records[];
+   } QEMU_PACKED;
+
+   struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
+   struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
+   struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, 
cxl_dstate);
+   uint16_t record_count = 0, i = 0;
+   uint16_t out_pl_len;
+
+   if (in->start_region_id >= ct3d->dc.num_regions)
+   record_count = 0;
+   else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
+   record_count = ct3d->dc.num_regions - in->start_region_id;
+   else
+   record_count = in->region_cnt;
+
+   out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+   assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+   memset(out, 0, out_pl_len);
+   out->num_regions = record_count;
+   for (; i < record_count; i++) {
+   stq_le_p(>records[i].base,
+   ct3d->dc.regions[in->start_region_id+i].base);
+   stq_le_p(>records[i].decode_len,
+   ct3d->dc.regions[in->start_region_id+i].decode_len);
+   stq_le_p(>records[i].region_len,
+   ct3d->dc.regions[in->start_region_id+i].len);
+   stq_le_p(>records[i].block_size,
+   ct3d->dc.regions[in->start_region_id+i].block_size);
+   stl_le_p(>records[i].dsmadhandle,
+   ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
+   out->records[i].flags
+   = ct3d->dc.regions[in->start_region_id+i].flags;
+   }
+
+   *len = out_pl_len;
+   return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_inject_poison, 8, 0 },
 [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
 cmd_media_clear_poison, 72, 0 },
+   [DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
+   cmd_dcd_get_dyn_cap_config, 2, 0 },
 };
 
 static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e285369693..8a04e53e90 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -383,6 +383,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_REGION_NUM 8
+
+typedef struct CXLDCD_Region {
+   uint64_t base;
+   uint64_t decode_len; /* in multiples of 256MB */
+   uint64_t len;
+   uint64_t block_size;
+   uint32_t dsmadhandle;
+   uint8_t flags;
+} CXLDCD_Region;
+
 struct CXLType3Dev {
 /* Private */
 PCIDevice parent_obj;
@@ -414,6 +425,11 @@ struct CXLType3Dev {
 unsigned int poison_list_cnt;
 bool poison_list_overflowed;
 uint64_t poison_list_overflow_ts;
+
+   struct dynamic_capacity {
+   uint8_t

[RFC 3/7] hw/mem/cxl_type3: Add a parameter to pass number of DC regions the device supports in qemu command line

2023-05-11 Thread Fan Ni

From: Fan Ni 

Add a property 'num-dc-regions' to ct3_props to allow users to create DC
regions.
With the change, users can control the number of DC regions the device
supports.
To make it easier, other parameters of the region like region base, length,
and block size are hard coded. If desired, these parameters
can be added easily.

Signed-off-by: Fan Ni 
---
 hw/mem/cxl_type3.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2b483d3d8e..b9c375d9b4 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -684,6 +684,34 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 }
 
+/*
+ * Create a dc region to test "Get Dynamic Capacity Configuration" command.
+ */
+static int cxl_create_toy_regions(CXLType3Dev *ct3d)
+{
+   int i;
+   uint64_t region_base = ct3d->hostvmem?ct3d->hostvmem->size
+   + ct3d->hostpmem->size:ct3d->hostpmem->size;
+   uint64_t region_len = 1024*1024*1024;
+   uint64_t decode_len = 4; /* 4*256MB */
+   uint64_t blk_size = 2*1024*1024;
+   struct CXLDCD_Region *region;
+
+   for (i = 0; i < ct3d->dc.num_regions; i++) {
+   region = >dc.regions[i];
+   region->base = region_base;
+   region->decode_len = decode_len;
+   region->len = region_len;
+   region->block_size = blk_size;
+   /* dsmad_handle is set when creating cdat table entries */
+   region->flags = 0;
+
+   region_base += region->len;
+   }
+
+   return 0;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 DeviceState *ds = DEVICE(ct3d);
@@ -752,6 +780,9 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 g_free(p_name);
 }
 
+   if (cxl_create_toy_regions(ct3d))
+   return false;
+
 return true;
 }
 
@@ -1036,6 +1067,7 @@ static Property ct3_props[] = {
 DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
 DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
 DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
+   DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.25.1

[RFC 4/7] hw/mem/cxl_type3: Add DC extent representative to cxl type3 device

2023-05-11 Thread Fan Ni

From: Fan Ni 

Add dynamic capacity extent information to the definition of
CXLType3Dev and add get DC extent list mailbox command based on
CXL.spec.3.0:.8.2.9.8.9.2.

With this command, we can create dc regions as below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x3000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x3000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Signed-off-by: Fan Ni 
---
 hw/cxl/cxl-mailbox-utils.c  | 73 -
 hw/mem/cxl_type3.c  |  1 +
 include/hw/cxl/cxl_device.h | 23 
 3 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 61c77e52d8..ed2ac154cb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -83,6 +83,7 @@ enum {
 #define CLEAR_POISON   0x2
DCD_CONFIG = 0x48, /*8.2.9.8.9*/
#define GET_DC_REGION_CONFIG   0x0
+   #define GET_DYN_CAP_EXT_LIST   0x1
 PHYSICAL_SWITCH = 0x51
 #define IDENTIFY_SWITCH_DEVICE  0x0
 };
@@ -938,7 +939,7 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd 
*cmd,
 }
 
 /*
- * cxl spec 3.0: 8.2.9.8.9.2
+ * cxl spec 3.0: 8.2.9.8.9.1
  * Get Dynamic Capacity Configuration
  **/
 static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
@@ -1001,6 +1002,73 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(struct 
cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * cxl spec 3.0: 8.2.9.8.9.2
+ * Get Dynamic Capacity Extent List (Opcode 4810h)
+ **/
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+   struct get_dyn_cap_ext_list_in_pl {
+   uint32_t extent_cnt;
+   uint32_t start_extent_id;
+   } QEMU_PACKED;
+
+   struct get_dyn_cap_ext_list_out_pl {
+   uint32_t count;
+   uint32_t total_extents;
+   uint32_t generation_num;
+   uint8_t rsvd[4];
+   struct {
+   uint64_t start_dpa;
+   uint64_t len;
+   uint8_t tag[0x10];
+   uint16_t shared_seq;
+   uint8_t rsvd[6];
+   } QEMU_PACKED records[];
+   } QEMU_PACKED;
+
+   struct get_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
+   struct get_dyn_cap_ext_list_out_pl *out = (void *)cmd->payload;
+   struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, 
cxl_dstate);
+   uint16_t record_count = 0, i = 0, record_done = 0;
+   CXLDCDExtentList *extent_list = >dc.extents;
+   CXLDCD_Extent *ent;
+   uint16_t out_pl_len;
+
+   if (in->start_extent_id > ct3d->dc.total_extent_count)
+   return CXL_MBOX_INVALID_INPUT;
+
+   if (ct3d->dc.total_extent_count - in->start_extent_id < in->extent_cnt)
+   record_count = ct3d->dc.total_extent_count - 
in->start_extent_id;
+   else
+   record_count = in->extent_cnt;
+
+   out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+   assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+   memset(out, 0, out_pl_len);
+   stl_le_p(>count, record_count);
+   stl_le_p(>total_extents, ct3d->dc.total_extent_count);
+   stl_le_p(>generation_num, ct3d->dc.ext_list_gen_seq);
+
+   QTAILQ_FOREACH(ent, extent_list, node) {
+   if (i++ < in->start_extent_id)
+   continue;
+   stq_le_p(>records[i].start_dpa, ent->start_dpa);
+   stq_le_p(>records[i].len, ent->len);
+   memcpy(>records[i].tag, ent->tag, 0x10);
+   stw_le_p(>records[i].shared_seq, ent->shared_seq);
+   record_done++;
+   if (record_done == record_count)
+   break;
+   }
+
+   *len = out_pl_len;
+   return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1041,6 +1109,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_clear_poison, 72, 0 },
[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
cmd_dcd_get_dyn_cap_config, 2, 0 },
+   [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+   "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", 
cmd_dcd_get_dyn_cap_ext_list,
+   8, 0 },
 };
 
 static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git

[RFC 1/7] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command

2023-05-11 Thread Fan Ni

From: Fan Ni 

Based on CXL spec 3.0 Table 8-94 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Signed-off-by: Fan Ni 
---
 hw/cxl/cxl-mailbox-utils.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9f8e6722d7..7ff4fbdf22 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,8 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+/* Experimental value: dynamic capacity event log size */
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -519,8 +521,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 uint16_t inject_poison_limit;
 uint8_t poison_caps;
 uint8_t qos_telemetry_caps;
+   uint16_t dc_event_log_size;
 } QEMU_PACKED *id;
-QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
 
 CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
@@ -543,6 +546,7 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 st24_le_p(id->poison_list_max_mer, 256);
 /* No limit - so limited by main poison record limit */
 stw_le_p(>inject_poison_limit, 0);
+   stw_le_p(>dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
-- 
2.25.1

[RFC 7/7] hw/mem/cxl_type3: add read/write support to dynamic capacity

2023-05-11 Thread Fan Ni

From: Fan Ni 

Before the change, read from or write to dynamic capacity of the memory
device is not support as 1) no host backed file/memory is provided for
it; 2) no address space is created for the dynamic capacity.

With the change, add code to support following:
1. add a new property to type3 device "dc-memdev" to point to host
   memory backend for dynamic capacity;
2. add a bitmap for each region to track whether a block is host backed,
which will be used for address check when read/write dynamic capacity;
3. add namespace for dynamic capacity for read/write support;
4. create cdat entries for each dynamic capacity region;

Signed-off-by: Fan Ni 
---
 hw/cxl/cxl-mailbox-utils.c  |  21 ++-
 hw/mem/cxl_type3.c  | 336 +---
 include/hw/cxl/cxl_device.h |   8 +-
 3 files changed, 298 insertions(+), 67 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7212934627..efe61e67fb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -391,9 +391,11 @@ static CXLRetCode cmd_firmware_update_get_info(struct 
cxl_cmd *cmd,
 char fw_rev4[0x10];
 } QEMU_PACKED *fw_info;
 QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
+   CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
 if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-(cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+   (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+   (ct3d->dc.total_dynamic_capicity < CXL_CAPACITY_MULTIPLIER)) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
@@ -534,7 +536,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 
 if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+   (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, 
CXL_CAPACITY_MULTIPLIER)) ||
+   (!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
+   CXL_CAPACITY_MULTIPLIER))) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
@@ -543,7 +547,8 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 
 snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-stq_le_p(>total_capacity, cxl_dstate->mem_size / 
CXL_CAPACITY_MULTIPLIER);
+   stq_le_p(>total_capacity,
+   cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
 stq_le_p(>persistent_capacity, cxl_dstate->pmem_size / 
CXL_CAPACITY_MULTIPLIER);
 stq_le_p(>volatile_capacity, cxl_dstate->vmem_size / 
CXL_CAPACITY_MULTIPLIER);
 stl_le_p(>lsa_size, cvc->get_lsa_size(ct3d));
@@ -568,9 +573,12 @@ static CXLRetCode cmd_ccls_get_partition_info(struct 
cxl_cmd *cmd,
 uint64_t next_pmem;
 } QEMU_PACKED *part_info = (void *)cmd->payload;
 QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+   CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
 if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+   (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, 
CXL_CAPACITY_MULTIPLIER)) ||
+   (!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
+   CXL_CAPACITY_MULTIPLIER))) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
@@ -881,9 +889,8 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd 
*cmd,
 struct clear_poison_pl *in = (void *)cmd->payload;
 
 dpa = ldq_le_p(>dpa);
-if (dpa + 64 > cxl_dstate->mem_size) {
-return CXL_MBOX_INVALID_PA;
-}
+   if (dpa + 64 > cxl_dstate->static_mem_size && ct3d->dc.num_regions == 0)
+   return CXL_MBOX_INVALID_PA;
 
 QLIST_FOREACH(ent, poison_list, node) {
 /*
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 70d47d43b9..334660bd0f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -33,8 +33,8 @@ enum {
 };
 
 static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
- int dsmad_handle, MemoryRegion *mr,
- bool is_pmem, uint64_t dpa_base)
+   int dsmad_handle, uint8_t flags,
+   uint64_t dpa_base, uint64_t size)
 {
 g_autofree CDATDsmas *dsmas = NULL;
 g_autofree CDATDslbis *dslbis0 = NULL;
@@ -53,9 +53,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader 
**cdat_table,
 .length = sizeof(*dsmas),
 },
 .DSMADhandle = dsmad_handle,
-.flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+   .flags = flags,
 .DPA_base = dpa_base,
-.DPA_length = int128_get64(mr->size),
+   .DPA_length = size,
 };
 
 /* For now, no memory side cache,

[RFC 5/7] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response

2023-05-11 Thread Fan Ni

From: Fan Ni 

Per CXL spec 3.0, we implemented the two mailbox commands:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
Release Dynamic Capacity Response (Opcode 4803h) 8.2.9.8.9.4.

Signed-off-by: Fan Ni 
---
 hw/cxl/cxl-mailbox-utils.c  | 223 
 include/hw/cxl/cxl_device.h |   3 +-
 2 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ed2ac154cb..7212934627 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,8 @@ enum {
DCD_CONFIG = 0x48, /*8.2.9.8.9*/
#define GET_DC_REGION_CONFIG   0x0
#define GET_DYN_CAP_EXT_LIST   0x1
+   #define ADD_DYN_CAP_RSP0x2
+   #define RELEASE_DYN_CAP0x3
 PHYSICAL_SWITCH = 0x51
 #define IDENTIFY_SWITCH_DEVICE  0x0
 };
@@ -1069,6 +1071,221 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct 
cxl_cmd *cmd,
return CXL_MBOX_SUCCESS;
 }
 
+static inline int test_bits(const unsigned long *addr, int nr, int size)
+{
+   unsigned long res = find_next_zero_bit(addr, size + nr, nr);
+
+   if (res >= nr + size)
+   return 1;
+   else
+   return 0;
+}
+
+static uint8_t find_region_id(struct CXLType3Dev *dev, uint64_t dpa
+   , uint64_t len)
+{
+   int8_t i = dev->dc.num_regions-1;
+
+   while (i > 0 && dpa < dev->dc.regions[i].base)
+   i--;
+
+   if (dpa < dev->dc.regions[i].base
+   || dpa + len > dev->dc.regions[i].base + 
dev->dc.regions[i].len)
+   return dev->dc.num_regions;
+
+   return i;
+}
+
+static CXLRetCode detect_malformed_extent_list(CXLType3Dev *dev, void *data)
+{
+   struct updated_dc_extent_list_in_pl {
+   uint32_t num_entries_updated;
+   uint8_t rsvd[4];
+   struct {
+   uint64_t start_dpa;
+   uint64_t len;
+   uint8_t rsvd[8];
+   } QEMU_PACKED updated_entries[];
+   } QEMU_PACKED;
+
+   struct updated_dc_extent_list_in_pl *in = data;
+   unsigned long *blk_bitmap;
+   uint64_t min_block_size = dev->dc.regions[0].block_size;
+   struct CXLDCD_Region *region = >dc.regions[0];
+   uint32_t i;
+   uint64_t dpa, len;
+   uint8_t rid;
+
+   for (i = 1; i < dev->dc.num_regions; i++) {
+   region = >dc.regions[i];
+   if (min_block_size > region->block_size)
+   min_block_size = region->block_size;
+   }
+   blk_bitmap = bitmap_new((region->len + region->base
+   - dev->dc.regions[0].base) / min_block_size);
+   g_assert(blk_bitmap);
+   bitmap_zero(blk_bitmap, (region->len + region->base
+   - dev->dc.regions[0].base) / min_block_size);
+
+   for (i = 0; i < in->num_entries_updated; i++) {
+   dpa = in->updated_entries[i].start_dpa;
+   len = in->updated_entries[i].len;
+
+   rid = find_region_id(dev, dpa, len);
+   if (rid == dev->dc.num_regions) {
+   g_free(blk_bitmap);
+   return CXL_MBOX_INVALID_PA;
+   }
+   region = >dc.regions[rid];
+   if (dpa % region->block_size || len % region->block_size) {
+   g_free(blk_bitmap);
+   return CXL_MBOX_INVALID_EXTENT_LIST;
+   }
+   /* the dpa range already covered by some other extents in the 
list */
+   if (test_bits(blk_bitmap, dpa/min_block_size, 
len/min_block_size)) {
+   g_free(blk_bitmap);
+   return CXL_MBOX_INVALID_EXTENT_LIST;
+   }
+   bitmap_set(blk_bitmap, dpa/min_block_size, len/min_block_size);
+   }
+
+   g_free(blk_bitmap);
+   return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * cxl spec 3.0: 8.2.9.8.9.3
+ * Add Dynamic Capacity Response (opcode 4802h)
+ * Assuming extent list is updated when a extent is added, when receiving
+ * the response, verify and ensure the extent is utilized by the host, and
+ * update extent list  as needed.
+ **/
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len_unused)
+{
+   struct add_dyn_cap_ext_list_in_pl {
+   uint32_t num_entries_updated;
+   uint8_t rsvd[4];
+   struct {
+   uint64_t start_dpa;
+   uint64_t len;
+   uint8_t rsvd[8];
+   } QEMU_PACKED updated_entries[];
+   } QEMU_PACKED;
+
+   struct add_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
+   struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, 
cxl_dstate);
+   CXLDCDExtentList *extent_list =

[RFC 6/7] Add qmp interfaces to add/release dynamic capacity extents

2023-05-11 Thread Fan Ni

From: Fan Ni 

Since fabric manager emulation is not supported yet, the change implements
the functions to add/release dynamic capacity extents as QMP interfaces.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each is 128MB
long) to region 0 (starting at dpa offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity-event",
"arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"region-id" : 0,
"num-extent": 2,
"dpa":0,
"extent-len": 128
}
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MB from region
0 (starting at dpa offset 0) look like below:

{ "execute": "cxl-release-dynamic-capacity-event",
"arguments": {
 "path": "/machine/peripheral/cxl-pmem0",
"region-id" : 0,
 "num-extent": 1 ,
"dpa":0,
"extent-len": 128
}
}

Signed-off-by: Fan Ni 
---
 hw/mem/cxl_type3.c  | 127 
 include/hw/cxl/cxl_events.h |  16 +
 qapi/cxl.json   |  44 +
 3 files changed, 187 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 23954711b5..70d47d43b9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1651,6 +1651,133 @@ void qmp_cxl_inject_memory_module_event(const char 
*path, CxlEventLog log,
 }
 }
 
+static const QemuUUID dynamic_capacity_uuid = {
+   .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+   0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+static void qmp_cxl_process_dynamic_capacity_event(const char *path, 
CxlEventLog log,
+   uint8_t flags, uint8_t type, uint16_t hid, uint8_t rid, 
uint32_t extent_cnt,
+   CXLDCExtent_raw *extents, Error **errp)
+{
+   Object *obj = object_resolve_path(path, NULL);
+   CXLEventDynamicCapacity dCap;
+   CXLEventRecordHdr *hdr = 
+   CXLDeviceState *cxlds;
+   CXLType3Dev *dcd;
+   int i;
+
+   if (!obj) {
+   error_setg(errp, "Unable to resolve path");
+   return;
+   }
+   if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+   error_setg(errp, "Path not point to a valid CXL type3 device");
+   return;
+   }
+
+   dcd = CXL_TYPE3(obj);
+   cxlds = >cxl_dstate;
+   memset(, 0, sizeof(dCap));
+
+   if (!dcd->dc.num_regions) {
+   error_setg(errp, "No dynamic capacity support from the device");
+   return;
+   }
+
+   /*
+* 8.2.9.1.5
+* All Dynamic Capacity event records shall set the Event Record
+* Severity field in the Common Event Record Format to Informational
+* Event. All Dynamic Capacity related events shall be logged in the
+* Dynamic Capacity Event Log.
+*/
+   assert(flags & (1

Re: [PATCH v2 2/3] cpu, qapi, target/arm, i386, s390x: Generalize query-cpu-model-expansion

2023-05-11 Thread Markus Armbruster

Dinah Baum  writes:

> This patch enables 'query-cpu-model-expansion' on all
> architectures. Only architectures that implement
> the command will return results, others will return an
> error message as before.
>
> This patch lays the groundwork for parsing a
> -cpu cpu,help option as specified in
> https://gitlab.com/qemu-project/qemu/-/issues/1480

Yes, but why does the change make sense for QMP?

Any idea how hard implementing the thing for more targets would be?
Question, not a demand!

> Signed-off-by: Dinah Baum 
> ---
>  cpu.c| 20 
>  include/exec/cpu-common.h|  8 +
>  qapi/machine-target-common.json  | 51 +
>  qapi/machine-target.json | 56 
>  target/arm/arm-qmp-cmds.c|  7 ++--
>  target/arm/cpu.h |  7 +++-
>  target/i386/cpu-sysemu.c |  7 ++--
>  target/i386/cpu.h|  6 
>  target/s390x/cpu.h   |  7 
>  target/s390x/cpu_models_sysemu.c |  6 ++--
>  10 files changed, 108 insertions(+), 67 deletions(-)
>
> diff --git a/cpu.c b/cpu.c
> index 849bac062c..daf4e1ff0d 100644
> --- a/cpu.c
> +++ b/cpu.c
> @@ -292,6 +292,26 @@ void list_cpus(const char *optarg)
>  #endif
>  }
>  
> +CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType 
> type,
> +CpuModelInfo *model,
> +Error **errp)
> +{
> +/* XXX: implement cpu_model_expansion for targets that still miss it */
> +#if defined(cpu_model_expansion)
> +return cpu_model_expansion(type, model, errp);
> +#else
> +error_setg(errp, "Could not query cpu model information");

This is vague enough to leave the user wondering what could be done to
avoid this error and by whom.

Before the patch, it's clear enough: "The command
query-cpu-model-expansion has not been found".

You could go with something like "command not supported for this
target".

The error class changes from CommandNotFound to GenericError.  Please
verify libvirt is fine with that.

> +return NULL;
> +#endif
> +}
> +
> +CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
> type,
> + CpuModelInfo *model,
> + Error **errp)
> +{
> +return get_cpu_model_expansion_info(type, model, errp);
> +}
> +
>  #if defined(CONFIG_USER_ONLY)
>  void tb_invalidate_phys_addr(target_ulong addr)
>  {
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 6feaa40ca7..ec6024dfde 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -7,6 +7,8 @@
>  #include "exec/hwaddr.h"
>  #endif
>  
> +#include "qapi/qapi-commands-machine-target-common.h"
> +
>  /**
>   * vaddr:
>   * Type wide enough to contain any #target_ulong virtual address.
> @@ -166,5 +168,11 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
>  extern int singlestep;
>  
>  void list_cpus(const char *optarg);
> +typedef void (*cpu_model_expansion_func)(CpuModelExpansionType type,
> + CpuModelInfo *model,
> + Error **errp);
> +CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType 
> type,
> +CpuModelInfo *model,
> +Error **errp);
>  
>  #endif /* CPU_COMMON_H */
> diff --git a/qapi/machine-target-common.json b/qapi/machine-target-common.json
> index 1e6da3177d..44713e9935 100644
> --- a/qapi/machine-target-common.json
> +++ b/qapi/machine-target-common.json
> @@ -77,3 +77,54 @@
>  ##
>  { 'enum': 'CpuModelCompareResult',
>'data': [ 'incompatible', 'identical', 'superset', 'subset' ] }
> +
> +##
> +# @CpuModelExpansionInfo:
> +#
> +# The result of a cpu model expansion.
> +#
> +# @model: the expanded CpuModelInfo.
> +#
> +# Since: 2.8
> +##
> +{ 'struct': 'CpuModelExpansionInfo',
> +  'data': { 'model': 'CpuModelInfo' } }
> +
> +##
> +# @query-cpu-model-expansion:
> +#
> +# Expands a given CPU model (or a combination of CPU model + additional 
> options)
> +# to different granularities, allowing tooling to get an understanding what a
> +# specific CPU model looks like in QEMU under a certain configuration.
> +#
> +# This interface can be used to query the "host" CPU model.
> +#
> +# The data returned by this command may be affected by:
> +#
> +# * QEMU version: CPU models may look different depending on the QEMU 
> version.
> +#   (Except for CPU models reported as "static" in query-cpu-definitions.)
> +# * machine-type: CPU model  may look different depending on the 
> machine-type.
> +#   (Except for CPU models reported as "static" in query-cpu-definitions.)
> +# * machine options (including accelerator): in some architectures, CPU 
> models
> +#   may

Re: [PATCH v2 0/5] migration: Make dirtyrate.c target independent

2023-05-11 Thread Richard Henderson


On 5/11/23 15:12, Juan Quintela wrote:

Juan Quintela (5):
   softmmu: Create qemu_target_pages_to_MiB()
   Use new created qemu_target_pages_to_MiB()
   migration: Teach dirtyrate about qemu_target_page_size()
   migration: Teach dirtyrate about qemu_target_page_bits()
   migration: Make dirtyrate.c target independent


Series:
Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH] Makefile: include gtags in UNCHECKED_GOALS

2023-05-11 Thread Alex Bennée



Paolo Bonzini  writes:

> Il gio 11 mag 2023, 18:56 Alex Bennée  ha scritto:
>
>  This is the mechanism we use to avoid defaulting to a build dir when
>  we don't need to.
>
>  Signed-off-by: Alex Bennée 
>
> I had already squashed this into Steven Sistare's recently posted patch, 
> which otherwise would have broken the
> "pages" job in CI.

\o/

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 24/84] tcg/mips: Remove TARGET_LONG_BITS, TCG_TYPE_TL

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> All uses replaced with TCGContext.addr_type.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH] Makefile: include gtags in UNCHECKED_GOALS

2023-05-11 Thread Paolo Bonzini

Il gio 11 mag 2023, 18:56 Alex Bennée  ha scritto:

> This is the mechanism we use to avoid defaulting to a build dir when
> we don't need to.
>
> Signed-off-by: Alex Bennée 
>

I had already squashed this into Steven Sistare's recently posted patch,
which otherwise would have broken the "pages" job in CI.

Paolo

---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Makefile b/Makefile
> index e421f8a1f4..c566aeb418 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -26,7 +26,7 @@ quiet-command-run = $(if $(V),,$(if $2,printf "  %-7s
> %s\n" $2 $3 && ))$1
>  quiet-@ = $(if $(V),,@)
>  quiet-command = $(quiet-@)$(call quiet-command-run,$1,$2,$3)
>
> -UNCHECKED_GOALS := %clean TAGS cscope ctags dist \
> +UNCHECKED_GOALS := %clean TAGS cscope ctags gtags dist \
>  help check-help print-% \
>  docker docker-% vm-help vm-test vm-build-%
>
> --
> 2.39.2
>
>

Re: [PATCH 23/84] tcg/loongarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> All uses replaced with TCGContext.addr_type.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 22/84] tcg/aarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> All uses replaced with TCGContext.addr_type.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[RFC PATCH] Makefile: include gtags in UNCHECKED_GOALS

2023-05-11 Thread Alex Bennée

This is the mechanism we use to avoid defaulting to a build dir when
we don't need to.

Signed-off-by: Alex Bennée 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e421f8a1f4..c566aeb418 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ quiet-command-run = $(if $(V),,$(if $2,printf "  %-7s %s\n" 
$2 $3 && ))$1
 quiet-@ = $(if $(V),,@)
 quiet-command = $(quiet-@)$(call quiet-command-run,$1,$2,$3)
 
-UNCHECKED_GOALS := %clean TAGS cscope ctags dist \
+UNCHECKED_GOALS := %clean TAGS cscope ctags gtags dist \
 help check-help print-% \
 docker docker-% vm-help vm-test vm-build-%
 
-- 
2.39.2

Re: [PATCH 21/84] tcg/aarch64: Remove USE_GUEST_BASE

2023-05-11 Thread Alex Bennée

Richard Henderson  writes:

> Eliminate the test vs TARGET_LONG_BITS by considering this
> predicate to be always true, and simplify accordingly.
>
> Signed-off-by: Richard Henderson 

Without having an idea if that extra register would make a difference
(tb-stats could record maybe) so:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 20/84] tcg/arm: Remove TARGET_LONG_BITS

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> All uses can be infered from the INDEX_op_qemu_*_a{32,64}_*
> opcode being used.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RESEND PATCH 00/84] tcg: Build once for system, once for user

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Based-on: 20230503070656.1746170-1-richard.hender...@linaro.org
> ("[PATCH v4 00/57] tcg: Improve atomicity support")
>
> and also
>
> Based-on: 20230502160846.1289975-1-richard.hender...@linaro.org
> ("[PATCH 00/16] tcg: Remove TARGET_ALIGNED_ONLY")
>
> The goal here is only tcg/, leaving accel/tcg/ for future work.

On clang-user:

  TESTcdsg on s390x
/builds/stsquad/qemu/tcg/i386/tcg-target.c.inc:2176:17: runtime error: 
execution reached an unreachable program point
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
/builds/stsquad/qemu/tcg/i386/tcg-target.c.inc:2176:17 in 
make[1]: *** [Makefile:174: run-cdsg] Error 1
make: *** [/builds/stsquad/qemu/tests/Makefile.include:56: 
run-tcg-tests-s390x-linux-user] Error 2
make: *** Waiting for unfinished jobs

which is:

case MO_128:
{
TCGLabel *l1 = NULL, *l2 = NULL;
bool use_pair = h.align < MO_128;

tcg_debug_assert(TCG_TARGET_REG_BITS == 64);

if (!use_pair) {
tcg_debug_assert(!use_movbe);
/*
 * Atomicity requires that we use use VMOVDQA.
 * If we've already checked for 16-byte alignment, that's all
 * we need.  If we arrive here with lesser alignment, then we
 * have determined that less than 16-byte alignment can be
 * satisfied with two 8-byte loads.
 */

So possibly the atomic prereq in the tree?

>
>
> r~
>
>
> Richard Henderson (84):
>   tcg: Split out memory ops to tcg-op-ldst.c
>   tcg: Widen gen_insn_data to uint64_t
>   accel/tcg: Widen tcg-ldst.h addresses to uint64_t
>   tcg: Widen helper_{ld,st}_i128 addresses to uint64_t
>   tcg: Widen helper_atomic_* addresses to uint64_t
>   tcg: Widen tcg_gen_code pc_start argument to uint64_t
>   accel/tcg: Merge gen_mem_wrapped with plugin_gen_empty_mem_callback
>   accel/tcg: Merge do_gen_mem_cb into caller
>   tcg: Reduce copies for plugin_gen_mem_callbacks
>   accel/tcg: Widen plugin_gen_empty_mem_callback to i64
>   tcg: Add addr_type to TCGContext
>   tcg: Remove TCGv from tcg_gen_qemu_{ld,st}_*
>   tcg: Remove TCGv from tcg_gen_atomic_*
>   tcg: Split INDEX_op_qemu_{ld,st}* for guest address size
>   tcg/tci: Elimnate TARGET_LONG_BITS, target_ulong
>   tcg/i386: Always enable TCG_TARGET_HAS_extr[lh]_i64_i32
>   tcg/i386: Conditionalize tcg_out_extu_i32_i64
>   tcg/i386: Adjust type of tlb_mask
>   tcg/i386: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/arm: Remove TARGET_LONG_BITS
>   tcg/aarch64: Remove USE_GUEST_BASE
>   tcg/aarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/loongarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/mips: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/ppc: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/riscv: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/s390x: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg/sparc64: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg: Remove TARGET_LONG_BITS, TCG_TYPE_TL
>   tcg: Move TCG_TYPE_TL from tcg.h to tcg-op.h
>   tcg: Add page_bits and page_mask to TCGContext
>   tcg: Add tlb_dyn_max_bits to TCGContext
>   tcg: Widen CPUTLBEntry comparators to 64-bits
>   tcg: Add tlb_fast_offset to TCGContext
>   tcg: Remove TCG_TARGET_TLB_DISPLACEMENT_BITS
>   tcg: Split out tcg/debug-assert.h
>   *: Add missing includes of qemu/error-report.h
>   *: Add missing includes of tcg/debug-assert.h
>   *: Add missing includes of tcg/tcg.h
>   tcg: Split out tcg-target-reg-bits.h
>   target/arm: Fix test of TCG_OVERSIZED_GUEST
>   tcg: Split out tcg/oversized-guest.h
>   tcg: Move TCGv, dup_const_tl definitions to tcg-op.h
>   tcg: Split tcg/tcg-op-common.h from tcg/tcg-op.h
>   target/arm: Include helper-gen.h in translator.h
>   target/hexagon: Include helper-gen.h where needed
>   tcg: Remove outdated comments in helper-head.h
>   tcg: Move TCGHelperInfo and dependencies to tcg/helper-info.h
>   tcg: Pass TCGHelperInfo to tcg_gen_callN
>   tcg: Move temp_idx and tcgv_i32_temp debug out of line
>   tcg: Split tcg_gen_callN
>   tcg: Split helper-gen.h
>   tcg: Split helper-proto.h
>   tcg: Add insn_start_words to TCGContext
>   tcg: Add guest_mo to TCGContext
>   tcg: Move TLB_FLAGS_MASK check out of get_alignment_bits
>   tcg: Split tcg/tcg-op-gvec.h
>   tcg: Remove NO_CPU_IO_DEFS
>   exec-all: Widen tb_page_addr_t for user-only
>   exec-all: Widen TranslationBlock pc and cs_base to 64-bits
>   tcg: Remove DEBUG_DISAS
>   tcg: Remove USE_TCG_OPTIMIZATIONS
>   tcg: Spit out exec/translation-block.h
>   include/exec: Remove CODE_GEN_AVG_BLOCK_SIZE
>   accel/tcg: Move most of gen-icount.h into translator.c
>   accel/tcg: Introduce translator_io_start
>   accel/tcg: Move translator_fake_ldb out of line
>   target/arm: Tidy helpers for translation
>   target/mips: Tidy helpers for translation
>   *: Add missing includes of exec/translation-block.h
>   *: Add missing includes of exec/exec-all.h
>   accel/tcg: Tidy includes

Re: [PATCH 19/84] tcg/i386: Remove TARGET_LONG_BITS, TCG_TYPE_TL

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> All uses can be infered from the INDEX_op_qemu_*_a{32,64}_* opcode
> being used.  Add a field into TCGLabelQemuLdst to record the usage.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 18/84] tcg/i386: Adjust type of tlb_mask

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Because of its use on tgen_arithi, this value must be a signed
> 32-bit quantity, as that is what may be encoded in the insn.
> The truncation of the value to unsigned for 32-bit guests is
> done via the REX bit via 'trexw'.
>
> Removes the only uses of target_ulong from this tcg backend.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 17/84] tcg/i386: Conditionalize tcg_out_extu_i32_i64

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Since TCG_TYPE_I32 values are kept zero-extended in registers, via
> omission of the REXW bit, we need not extend if the register matches.
> This is already relied upon by qemu_{ld,st}.
>
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 16/84] tcg/i386: Always enable TCG_TARGET_HAS_extr[lh]_i64_i32

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Keep all 32-bit values zero extended in the register, not solely when
> addresses are 32 bits.  This eliminates a dependency on TARGET_LONG_BITS.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 15/84] tcg/tci: Elimnate TARGET_LONG_BITS, target_ulong

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> We now have the address size as part of the opcode, so
> we no longer need to test TARGET_LONG_BITS.  We can use
> uint64_t for target_ulong, as passed into load/store helpers.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 14/84] tcg: Split INDEX_op_qemu_{ld, st}* for guest address size

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> For 32-bit hosts, we cannot simply rely on TCGContext.addr_bits,
> as we need one or two host registers to represent the guest address.
>
> Create the new opcodes and update all users.  Since we have not
> yet eliminated TARGET_LONG_BITS, only one of the two opcodes will
> ever be used, so we can get away with treating them the same in
> the backends.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 27/27] mkvenv.py: experiment; use distlib to generate script entry points

2023-05-11 Thread John Snow

On Thu, May 11, 2023, 12:14 PM Paolo Bonzini  wrote:

>
>
> Il gio 11 mag 2023, 17:58 John Snow  ha scritto:
>
>> I'll make that simplifying change, which will also allow me to just put
>>> the import in the global scope instead of trying to do it JIT to work
>>> around ensurepip shenanigans. Should be a few less "I know this is bad"
>>> comments for the linters, too.
>>
>>
> I don't think you can do that because, until you are running in the new
> venv, you aren't guaranteed to have either distlib or pip. Once in the venv
> you'll get the latter via ensurepip, if it wasn't already present in the
> system site-packages.
>
> Paolo
>

Yeah, not without a *little* trickery. It still needs a try/except but it
can be moved up.

Or I could leave well enough alone and worry about cleaning up imports when
we move to 3.8

Re: [RFC PATCH 0/3] QEMU ACPI generic port support

2023-05-11 Thread Dave Jiang





On 5/3/23 3:42 AM, Jonathan Cameron wrote:

On Tue, 18 Apr 2023 15:21:36 -0700
Dave Jiang  wrote:


s small RFC patch series is really a hack on what I need from qemu rather
than a proper implementation. I'm hoping to get some guidance from the list on
how to implement this correctly for qemu upstream. Thank you!

The patch series provides support for the ACPI Generic Port support that's
defined by ACPI spec 6.5 5.2.16.7 (Generic Port Affinity Structure). The
series also adds a genport object that allows locality data to be injected via
qemu commandline to the HMAT tables. The generic port support is to allow a hot
plugged CXL memory device to calculate the locality data from the CPU to
the CXL device. The generic port related data provides the locality data from
the CPU to the CXL host bridge (latency and bandwidth). These data in
addition to the PCIe link data, CDAT from device, and CXL switch CDAT if switch
exist, provides the locality data for the entire path.

Patch1: Adds Generic Port Affinity Structure sub-tables to the SRAT. For
each CXL Host Bridge (HB) a GPAS entry is created with a unique proximity
domain. For example, if the system is created with 4 proximity domains (PXM) for
system memory, then the next GPAS will get PXM 4 and so on.


I may be going crazy but I'm not seeing an increment on the numa node. So I 
think
they all get 4 at the moment. Found it increment in patch 3.


Sorry about that. There are some changes for 1/3 strayed into 3/3.





Patch2: Add the json support for generic port. Split out because
clang-format really clobbers the json files.

Patch3: Add a generic port object. The intention here is to allow setup of
numa nodes, add hmat-lb data and node distance for the generic targets. I had to
add a hack in qemu_create_cli_devices() to realize the genport objects. I need
guidance on where and how to do this properly so the genport objects
realize at the correct place and time.

Example of genport setup:
-object genport,id=$X -numa node,genport=genport$X,nodeid=$Y,initiator=$Z
-numa 
hmat-lb,initiator=$Z,target=$X,hierarchy=memory,data-type=access-latency,latency=$latency
-numa 
hmat-lb,initiator=$Z,target=$X,hierarchy=memory,data-type=access-bandwidth,bandwidth=$bandwidthM


I think we should be using some links to the host bridges in here.
So I don't think there should be an explicit genport object at all.
Instead we should be able to point at the pxb itself.  Perhaps also
allowing other port types in future.

Something like

-device pxb-cxl,id=cxl1.1
-numa node,genport=cxl1.1,nodeid=$X


Ok I think that makes sense. So now there's a relation between genport 
being constructed and the passed in numa node. When we are building the 
SRAT, I assume there's a way to get hold of the parsed numa nodes 
attributes and iterate through to attempt a match?



-numa hmat-lb,initiator=$Z,target=$X,...
-numa hmat-lb,initiator=$X,target=$Y,...
//as generic port goes both ways.

As we are currently using bus_nr for UID (which is admittedly a somewhat dirty 
hack that
just happened to be convenient) the ACPI building code can use that to fill in 
the SRAT
entry at appropriate point.

I haven't tried implementing it so there may well be some ordering issues that
require some late binding etc, but it should be possible to make it work.


for ((i = 0; i < total_nodes; i++)); do
 for ((j = 0; j < cxl_hbs; j++ )); do# 2 CXL HBs
 -numa dist,src=$i,dst=$X,val=$dist
 done
done
Linux kernel support:
https://lore.kernel.org/linux-cxl/168088732996.1441063.10107817505475386072.stgit@djiang5-mobl3/T/#t

---

Dave Jiang (3):
   hw/acpi: Add support for Generic Port Affinity Structure to SRAT
   genport: Add json support for generic port
   acpi: add generic port device object


  hw/acpi/aml-build.c | 21 +
  hw/acpi/genport.c   | 61 +
  hw/acpi/meson.build |  1 +
  hw/i386/acpi-build.c| 45 +++
  include/hw/acpi/aml-build.h | 27 
  qapi/machine.json   |  3 +-
  qapi/qom.json   | 12 
  softmmu/vl.c| 26 
  8 files changed, 195 insertions(+), 1 deletion(-)
  create mode 100644 hw/acpi/genport.c

--

Re: [PATCH 27/27] mkvenv.py: experiment; use distlib to generate script entry points

2023-05-11 Thread Paolo Bonzini

Il gio 11 mag 2023, 17:58 John Snow  ha scritto:

> I'll make that simplifying change, which will also allow me to just put
>> the import in the global scope instead of trying to do it JIT to work
>> around ensurepip shenanigans. Should be a few less "I know this is bad"
>> comments for the linters, too.
>
>
I don't think you can do that because, until you are running in the new
venv, you aren't guaranteed to have either distlib or pip. Once in the venv
you'll get the latter via ensurepip, if it wasn't already present in the
system site-packages.

Paolo

--js
>
>>

Re: [PATCH 13/84] tcg: Remove TCGv from tcg_gen_atomic_*

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Expand from TCGv to TCGTemp inline in the translators,
> and validate that the size matches tcg_ctx->addr_type.
>
> Signed-off-by: Richard Henderson 
> ---
>  include/tcg/tcg-op.h | 184 ++--
>  tcg/tcg-op-ldst.c| 198 ---
>  2 files changed, 267 insertions(+), 115 deletions(-)
>

> diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
> index a94a70e8c4..4624b0a25b 100644
> --- a/tcg/tcg-op-ldst.c
> +++ b/tcg/tcg-op-ldst.c

>  
> -static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
> +static void do_atomic_op_i64(TCGv_i64 ret, TCGTemp *addr, TCGv_i64 val,
>   TCGArg idx, MemOp memop, void * const table[])
>  {
>  memop = tcg_canonicalize_memop(memop, 1, 0);
>  
>  if ((memop & MO_SIZE) == MO_64) {
> -#ifdef CONFIG_ATOMIC64

The commit message could briefly mention the table expansion is
controlled by CONFIG_ATOMIC64 so why we don't check it here.

> -gen_atomic_op_i64 gen;
> -TCGv_i64 a64;
> -MemOpIdx oi;
> +gen_atomic_op_i64 gen = table[memop & (MO_SIZE | MO_BSWAP)];
>  
> -gen = table[memop & (MO_SIZE | MO_BSWAP)];
> -tcg_debug_assert(gen != NULL);
> +if (gen) {
> +MemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +TCGv_i64 a64 = maybe_extend_addr64(addr);
> +gen(ret, cpu_env, a64, val, tcg_constant_i32(oi));
> +maybe_free_addr64(a64);
> +return;
> +}

personal preference nit, we don't need an early return, you could just
hoist into the else leg.


Otherwise:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v3 1/1] block/blkio: use qemu_open() to support fd passing for virtio-blk

2023-05-11 Thread Jonathon Jongsma


On 5/11/23 4:15 AM, Stefano Garzarella wrote:

The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the new
'fd' property. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened vhost-vdpa character
device. This is useful especially when the device can only be accessed
with certain privileges.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Signed-off-by: Stefano Garzarella 
---

Notes:
 v3:
 - use qemu_open() on `path` to simplify libvirt code [Jonathon]



Thanks

The one drawback now is that it doesn't seem possible for libvirt to 
introspect whether or not qemu supports passing an fd to the driver or 
not. When I was writing my initial patch (before I realized that it was 
missing fd-passing), I just checked for the existence of the 
virtio-blk-vhost-vdpa device. But we actually need to know both that 
this device exists and supports fd passing. As far as I can tell, 
versions 7.2.0 and 8.0.0 include this device but won't accept fds.


Jonathon




  block/blkio.c | 53 ++-
  1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 0cdc99a729..6a6f20f923 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -672,25 +672,60 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
  {
  const char *path = qdict_get_try_str(options, "path");
  BDRVBlkioState *s = bs->opaque;
-int ret;
+bool fd_supported = false;
+int fd, ret;
  
  if (!path) {

  error_setg(errp, "missing 'path' option");
  return -EINVAL;
  }
  
-ret = blkio_set_str(s->blkio, "path", path);

-qdict_del(options, "path");
-if (ret < 0) {
-error_setg_errno(errp, -ret, "failed to set path: %s",
- blkio_get_error_msg());
-return ret;
-}
-
  if (!(flags & BDRV_O_NOCACHE)) {
  error_setg(errp, "cache.direct=off is not supported");
  return -EINVAL;
  }
+
+if (blkio_get_int(s->blkio, "fd", ) == 0) {
+fd_supported = true;
+}
+
+/*
+ * If the libblkio driver supports fd passing, let's always use qemu_open()
+ * to open the `path`, so we can handle fd passing from the management
+ * layer through the "/dev/fdset/N" special path.
+ */
+if (fd_supported) {
+int open_flags;
+
+if (flags & BDRV_O_RDWR) {
+open_flags = O_RDWR;
+} else {
+open_flags = O_RDONLY;
+}
+
+fd = qemu_open(path, open_flags, errp);
+if (fd < 0) {
+return -EINVAL;
+}
+
+ret = blkio_set_int(s->blkio, "fd", fd);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set fd: %s",
+ blkio_get_error_msg());
+qemu_close(fd);
+return ret;
+}
+} else {
+ret = blkio_set_str(s->blkio, "path", path);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set path: %s",
+ blkio_get_error_msg());
+return ret;
+}
+}
+
+qdict_del(options, "path");
+
  return 0;
  }

Re: [PATCH 12/84] tcg: Remove TCGv from tcg_gen_qemu_{ld,st}_*

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Expand from TCGv to TCGTemp inline in the translators,
> and validate that the size matches tcg_ctx->addr_type.
> These inlines will eventually be seen only by target-specific code.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 07/27] mkvenv: add diagnose() method for ensure() failures

2023-05-11 Thread John Snow

On Thu, May 11, 2023, 11:57 AM Paolo Bonzini  wrote:

>
>
> Il gio 11 mag 2023, 17:53 John Snow  ha scritto:
>
>>
>> You're right, in the "optional" case for sphinx the error isn't really
>> *that* bad or serious. I'll try to work this or something very similar to
>> it in.
>>
>> I was thinking it could be up to the caller to discard the input, but I
>> suppose we can also route the semantics down into the tool, too.
>>
>> I'll play with it.
>>
>
> If you think that what I posted is okay or at least a start, I can squash
> it in.
>
> Paolo
>

I didn't want to bet that there wouldn't be more feedback 

but, you can squash it in pre-emptively and I'll do the same and we'll just
see what happens.

--js

>

Re: [PATCH 27/27] mkvenv.py: experiment; use distlib to generate script entry points

2023-05-11 Thread John Snow

On Thu, May 11, 2023, 3:02 AM Paolo Bonzini  wrote:

> On 5/11/23 05:54, John Snow wrote:
> > +if checkpip():
> > +# We ran ensurepip. We need to re-run post_init...!
> > +args = [sys.executable, __file__, "post_init"]
> > +subprocess.run(args, check=True)
> > +return
> > +
> >   # Finally, generate a 'pip' script so the venv is usable in a
> normal
> >   # way from the CLI. This only happens when we inherited pip from a
> >   # parent/system-site and haven't run ensurepip in some way.
>
> Can't this just be:
>
> +if not checkpip():
> +# Finally, generate a 'pip' script so the venv is usable in a
> normal
> +# way from the CLI. This only happens when we inherited pip from a
> +# parent/system-site and haven't run ensurepip in some way.
> +generate_console_scripts(["pip"])
>
> even squashed in the original

Debian 10 patch?
>
> You don't need to generate the pip shims if ensurepip has been run.
>
> Paolo
>

*cough*

You’re completely right. I was so preoccupied on diagnosing why it was
failing I didn't even recognize that it wasn't even needed...

*facepalm*

I'll make that simplifying change, which will also allow me to just put the
import in the global scope instead of trying to do it JIT to work around
ensurepip shenanigans. Should be a few less "I know this is bad" comments
for the linters, too.

--js

>

Re: [PATCH 07/27] mkvenv: add diagnose() method for ensure() failures

2023-05-11 Thread Paolo Bonzini

Il gio 11 mag 2023, 17:53 John Snow  ha scritto:

>
> You're right, in the "optional" case for sphinx the error isn't really
> *that* bad or serious. I'll try to work this or something very similar to
> it in.
>
> I was thinking it could be up to the caller to discard the input, but I
> suppose we can also route the semantics down into the tool, too.
>
> I'll play with it.
>

If you think that what I posted is okay or at least a start, I can squash
it in.

Paolo


> --js
>
>>

Re: [PATCH 07/27] mkvenv: add diagnose() method for ensure() failures

2023-05-11 Thread John Snow

On Thu, May 11, 2023, 2:53 AM Paolo Bonzini  wrote:

> On 5/11/23 05:54, John Snow wrote:
> > This is a routine that is designed to print some usable info for human
> > beings back out to the terminal if/when "mkvenv ensure" fails to locate
> > or install a package during configure time, such as meson or sphinx.
> >
> > Since we are requiring that "meson" and "sphinx" are installed to the
> > same Python environment as QEMU is configured to build with, this can
> > produce some surprising failures when things are mismatched. This method
> > is here to try and ease that sting by offering some actionable
> > diagnosis.
>
> I think this is a bit too verbose/scary, especially the "Ouch" for
> what was a totally normal occurrence before (no "--enable-docs" and sphinx
> absent or too old) and the "ERROR" from "pip install --no-index".
>
> Here is an attempt to tone it down:
>
> diff --git a/python/scripts/mkvenv.py b/python/scripts/mkvenv.py
> index 8e097e4759e3..5d30174a9aff 100644
> --- a/python/scripts/mkvenv.py
> +++ b/python/scripts/mkvenv.py
> @@ -74,6 +74,7 @@
>   Iterator,
>   Optional,
>   Sequence,
> +Tuple,
>   Union,
>   )
>   import venv
> @@ -594,7 +595,7 @@ def diagnose(
>   online: bool,
>   wheels_dir: Optional[Union[str, Path]],
>   prog: Optional[str],
> -) -> str:
> +) -> Tuple[str, bool]:
>   """
>   Offer a summary to the user as to why a package failed to be
> installed.
>
> @@ -610,6 +611,9 @@ def diagnose(
>   """
>   # pylint: disable=too-many-branches
>
> +# Some errors are not particularly serious
> +bad = False
> +
>   pkg_name = pkgname_from_depspec(dep_spec)
>   pkg_version = None
>
> @@ -654,11 +658,11 @@ def diagnose(
>   "No suitable version found in, or failed to install from"
>   f" '{wheels_dir}'."
>   )
> -else:
> -lines.append("No local package directory was searched.")
> +bad = True
>
>   if online:
>   lines.append("A suitable version could not be obtained from
> PyPI.")
> +bad = True
>   else:
>   lines.append(
>   "mkvenv was configured to operate offline and did not check
> PyPI."
> @@ -675,10 +679,14 @@ def diagnose(
>   f"Typically this means that '{prog}' has been installed "
>   "against a different Python interpreter on your system."
>   )
> +bad = True
>
>   lines = [f" • {line}" for line in lines]
> -lines.insert(0, f"Could not ensure availability of '{dep_spec}':")
> -return os.linesep.join(lines)
> +if bad:
> +lines.insert(0, f"Could not ensure availability of '{dep_spec}':")
> +else:
> +lines.insert(0, f"'{dep_spec}' not found:")
> +return os.linesep.join(lines), bad
>
>
>   def pip_install(
> @@ -731,7 +739,7 @@ def _do_ensure(
>   dep_specs,
>   online=False,
>   wheels_dir=wheels_dir,
> -devnull=online and not wheels_dir,
> +devnull=not wheels_dir,
>   )
>   # (A) or (B) happened. Success.
>   except subprocess.CalledProcessError:
> @@ -778,7 +786,10 @@ def ensure(
>   _do_ensure(dep_specs, online, wheels_dir)
>   except subprocess.CalledProcessError as exc:
>   # Well, that's not good.
> -raise Ouch(diagnose(dep_specs[0], online, wheels_dir, prog)) from
> exc
> +msg, bad = diagnose(dep_specs[0], online, wheels_dir, prog)
> +if bad:
> +raise Ouch(msg) from exc
> +print("", msg, "\n", sep="\n", file=sys.stderr)
>
>
>   def post_venv_setup() -> None:
>
>
> Paolo
>

You're right, in the "optional" case for sphinx the error isn't really
*that* bad or serious. I'll try to work this or something very similar to
it in.

I was thinking it could be up to the caller to discard the input, but I
suppose we can also route the semantics down into the tool, too.

I'll play with it.

--js

>

Re: [PATCH] configure: make clear that VirtFS is 9p

2023-05-11 Thread Thomas Huth


On 11/05/2023 16.12, Christian Schoenebeck wrote:

Add '9P' to the summary output section of 'VirtFS' to avoid being
confused with virtiofs.

Based-on: <20230503130757.863824-1-pefo...@google.com>
Signed-off-by: Christian Schoenebeck 
---
  meson.build | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 5d8373b608..5d65f53fec 100644
--- a/meson.build
+++ b/meson.build
@@ -3922,8 +3922,8 @@ if have_block
summary_info += {'Block whitelist (rw)': 
get_option('block_drv_rw_whitelist')}
summary_info += {'Block whitelist (ro)': 
get_option('block_drv_ro_whitelist')}
summary_info += {'Use block whitelist in tools': 
get_option('block_drv_whitelist_in_tools')}
-  summary_info += {'VirtFS support':have_virtfs}
-  summary_info += {'VirtFS Proxy Helper support': have_virtfs_proxy_helper}
+  summary_info += {'VirtFS (9P) support':have_virtfs}
+  summary_info += {'VirtFS (9P) Proxy Helper support': 
have_virtfs_proxy_helper}
summary_info += {'Live block migration': 
config_host_data.get('CONFIG_LIVE_BLOCK_MIGRATION')}
summary_info += {'replication support': 
config_host_data.get('CONFIG_REPLICATION')}
summary_info += {'bochs support': get_option('bochs').allowed()}


Reviewed-by: Thomas Huth

Re: [PATCH 11/84] tcg: Add addr_type to TCGContext

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> This will enable replacement of TARGET_LONG_BITS within tcg/.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 10/84] accel/tcg: Widen plugin_gen_empty_mem_callback to i64

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Since we do this inside gen_empty_mem_cb anyway, let's
> do this earlier inside tcg expansion.
>
> Signed-off-by: Richard Henderson 
> ---
>  include/exec/plugin-gen.h |  4 ++--
>  accel/tcg/plugin-gen.c|  9 +++--
>  tcg/tcg-op-ldst.c | 28 
>  3 files changed, 25 insertions(+), 16 deletions(-)
>
> diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
> index 5f5506f1cc..3af0168e65 100644
> --- a/include/exec/plugin-gen.h
> +++ b/include/exec/plugin-gen.h
> @@ -27,7 +27,7 @@ void plugin_gen_insn_start(CPUState *cpu, const struct 
> DisasContextBase *db);
>  void plugin_gen_insn_end(void);
>  
>  void plugin_gen_disable_mem_helpers(void);
> -void plugin_gen_empty_mem_callback(TCGv addr, uint32_t info);
> +void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info);
>  
>  static inline void plugin_insn_append(abi_ptr pc, const void *from, size_t 
> size)
>  {
> @@ -69,7 +69,7 @@ static inline void plugin_gen_tb_end(CPUState *cpu)
>  static inline void plugin_gen_disable_mem_helpers(void)
>  { }
>  
> -static inline void plugin_gen_empty_mem_callback(TCGv addr, uint32_t info)
> +static inline void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t 
> info)
>  { }
>  
>  static inline void plugin_insn_append(abi_ptr pc, const void *from, size_t 
> size)
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index 55e892b684..34be1b940c 100644
> --- a/accel/tcg/plugin-gen.c
> +++ b/accel/tcg/plugin-gen.c
> @@ -124,23 +124,20 @@ static void gen_empty_inline_cb(void)
>  tcg_temp_free_i64(val);
>  }
>  
> -static void gen_empty_mem_cb(TCGv vaddr, uint32_t info)
> +static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)

You bounced the name a few times in this series:

  445a4a2f14 accel/tcg: Widen plugin_gen_empty_mem_callback to i64
  modified   accel/tcg/plugin-gen.c
  @@ -127,3 +127,3 @@
  -static void gen_empty_mem_cb(TCGv vaddr, uint32_t info)
  +static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
   {
   TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();

  8b99baa592 accel/tcg: Merge do_gen_mem_cb into caller
  modified   accel/tcg/plugin-gen.c
  @@ -148,3 +127,3 @@
  -static void gen_empty_mem_cb(TCGv addr, uint32_t info)
  +static void gen_empty_mem_cb(TCGv vaddr, uint32_t info)
   {
  -do_gen_mem_cb(addr, info);
  +TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();

  38b47b19ec plugin-gen: add module for TCG-related code
  modified   accel/tcg/plugin-gen.c
  @@ -0,0 +145,3 @@
  +static void gen_empty_mem_cb(TCGv addr, uint32_t info)
  +{
  +do_gen_mem_cb(addr, info);

Otherwise:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2] migration: Add documentation for backwards compatiblity

2023-05-11 Thread Juan Quintela

Vladimir Sementsov-Ogievskiy  wrote:
> On 11.05.23 15:00, Juan Quintela wrote:
 +Now we start with the more interesting cases.  Let start with the
 +same qemu but not the same machine type.
>>> sounds like "different machine type on source and target" for me..
>>>
>>> Maybe, "not latest machine type" ?
>> Now we start with the more interesting cases.  Let start with a the
>> same QEMU process and a different QEMU version machine type.
>> Better?
>
> No)
>
> Neither I have good wording in mind. That doesn't really matter I
> think, so don't worry, meaning is obvious from the context anyway.
>
> I just mean, that for me:
>
> "same" here: source.qemu.version == target.qemu.version
>
> "different" here: source.qemu.machine_type != target.qemu.machine_type
> -- but you don't mean this and this case doesn't work anyway

Ah, I see what you mean know.

>
> What you mean by "different" that machine type is not equal to qemu
> version.. But formally, it's never "equal", actually, latest machine
> type of the qemu version "corresponds" to that qemu version.
>
> Maybe:
>
> "Consider the case with same QEMU version (5.2) but not latest (not 5.2) 
> machine type:"

Now we start with the more interesting cases.  Consider the case where
we have the same QEMU version in both sides (qemu-5.2) but we are using
the latest machine type for that version (pc-5.2) but one of an older
QEMU version, in this case pc-5.1.

Better?

Later, Juan.

Re: [PATCH 09/84] tcg: Reduce copies for plugin_gen_mem_callbacks

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> We only need to make copies for loads, when the destination
> overlaps the address.  For now, only eliminate the copy for
> stores and 128-bit loads.
>
> Rename plugin_prep_mem_callbacks to plugin_maybe_preserve_addr,
> returning NULL if no copy is made.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg-op-ldst.c | 38 --
>  1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
> index 17fe35b93c..cbd85f793c 100644
> --- a/tcg/tcg-op-ldst.c
> +++ b/tcg/tcg-op-ldst.c
> @@ -114,7 +114,8 @@ static void tcg_gen_req_mo(TCGBar type)
>  }
>  }
>  
> -static inline TCGv plugin_prep_mem_callbacks(TCGv vaddr)
> +/* Only required for loads, where value might overlap addr. */

maybe mention we don't bother checking if the temps actually overlap
because we will need to futz around with the copy of addr for plugins.

Otherwise:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 08/84] accel/tcg: Merge do_gen_mem_cb into caller

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> As do_gen_mem_cb is called once, merge it into gen_empty_mem_cb.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 07/84] accel/tcg: Merge gen_mem_wrapped with plugin_gen_empty_mem_callback

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> As gen_mem_wrapped is only used in plugin_gen_empty_mem_callback,
> we can avoid the curiosity of union mem_gen_fn by inlining it.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 06/84] tcg: Widen tcg_gen_code pc_start argument to uint64_t

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

RE: [PATCH 1/2] Hexagon (iclass): update J4_hintjumpr slot constraints

2023-05-11 Thread Taylor Simpson




> -Original Message-
> From: Matheus Tavares Bernardino 
> Sent: Friday, January 13, 2023 7:39 AM
> To: qemu-devel@nongnu.org
> Cc: Taylor Simpson ; Brian Cain
> ; richard.hender...@linaro.org
> Subject: [PATCH 1/2] Hexagon (iclass): update J4_hintjumpr slot constraints
> 
> The Hexagon PRM says that "The assembler automatically encodes
> instructions in the packet in the proper order. In the binary encoding of a
> packet, the instructions must be ordered from Slot 3 down to Slot 0."
> 
> Prior to the architecture version v73, the slot constraints from instruction
> "hintjr" only allowed it to be executed at slot 2.
> With that in mind, consider the packet:
> 
> {
> hintjr(r0)
> nop
> nop
> if (!p0) memd(r1+#0) = r1:0
> }
> 
> To satisfy the ordering rule quoted from the PRM, the assembler would,
> thus, move one of the nops to the first position, so that it can be assigned 
> to
> slot 3 and the subsequent hintjr to slot 2.
> 
> However, since v73, hintjr can be executed at either slot 2 or 3. So there is 
> no
> need to reorder that packet and the assembler will encode it as is. When
> QEMU tries to execute it, however, we end up hitting a "misaliged store"
> exception because both the store and the hintjr will be assigned to store 0,
> and some functions like `slot_is_predicated()` expect the decode machinery
> to assign only one instruction per slot. In particular, the mentioned function
> will traverse the packet until it finds the first instruction at the desired 
> slot
> which, for slot 0, will be hintjr. Since hintjr is not predicated, the result 
> is that
> we try to execute the store regardless of the predicate. And because the
> predicate is false, we had not previously loaded hex_store_addr[0] or
> hex_store_width[0]. As a result, the store will decide de width based on
> trash memory, causing it to be misaligned.
> 
> Update the slot constraints for hintjr so that QEMU can properly handle such
> encodings.
> 
> Note: to avoid similar-but-not-identical issues in the future, we should look
> for multiple instructions at the same slot during decoding time and throw an
> invalid packet exception. That will be done in the subsequent commit.
> 
> Signed-off-by: Matheus Tavares Bernardino 
> ---
>  target/hexagon/iclass.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/target/hexagon/iclass.c b/target/hexagon/iclass.c index
> 6091286993..081116fc4d 100644
> --- a/target/hexagon/iclass.c
> +++ b/target/hexagon/iclass.c
> @@ -51,8 +51,10 @@ SlotMask find_iclass_slots(Opcode opcode, int itype)
>  return SLOTS_0;
>  } else if ((opcode == J2_trap0) ||
> (opcode == Y2_isync) ||
> -   (opcode == J2_pause) || (opcode == J4_hintjumpr)) {
> +   (opcode == J2_pause)) {
>  return SLOTS_2;
> +} else if (opcode == J4_hintjumpr) {
> +return SLOTS_23;
>  } else if (GET_ATTRIB(opcode, A_CRSLOT23)) {
>  return SLOTS_23;
>  } else if (GET_ATTRIB(opcode, A_RESTRICT_PREFERSLOT0)) {

Reviewed-by: Taylor Simpson

Re: [PATCH 05/84] tcg: Widen helper_atomic_* addresses to uint64_t

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Always pass the target address as uint64_t.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PULL 09/28] block: bdrv/blk_co_unref() for calls in coroutine context

2023-05-11 Thread Michael Tokarev


10.05.2023 15:20, Kevin Wolf wrote:

These functions must not be called in coroutine context, because they
need write access to the graph.


How important for this and 2 surrounding changes to be for 7.2-stable
(if we'll ever release one)? It smells like real bugs are being fixed
here, is it ever possible to hit those in 7.2?

Provided that whole no_coroutine_fn  infrastructure is missing there,
including the no_co_wrapper parts?  It's not difficult to back-port some
of that stuff to 7.2.

Thanks,

/mjt

Re: [PATCH 04/84] tcg: Widen helper_{ld, st}_i128 addresses to uint64_t

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Always pass the target address as uint64_t.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 03/84] accel/tcg: Widen tcg-ldst.h addresses to uint64_t

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Always pass the target address as uint64_t.
> Adjust tcg_out_{ld,st}_helper_args to match.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 02/84] tcg: Widen gen_insn_data to uint64_t

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> We already pass uint64_t to restore_state_to_opc; this changes all
> of the other uses from insn_start through the encoding to decoding.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 01/84] tcg: Split out memory ops to tcg-op-ldst.c

2023-05-11 Thread Alex Bennée



Richard Henderson  writes:

> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v3 3/3] target/openrisc: Setup FPU for detecting tininess before rounding

2023-05-11 Thread Stafford Horne

OpenRISC defines tininess to be detected before rounding.  Setup qemu to
obey this.

Signed-off-by: Stafford Horne 
Reviewed-by: Richard Henderson 
---
Since v2:
 - Add reviewed-by
Since v1:
 - Remove setting default NaN behavior.

 target/openrisc/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index 0ce4f796fa..61d748cfdc 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -22,6 +22,7 @@
 #include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
+#include "fpu/softfloat-helpers.h"
 #include "tcg/tcg.h"
 
 static void openrisc_cpu_set_pc(CPUState *cs, vaddr value)
@@ -90,6 +91,9 @@ static void openrisc_cpu_reset_hold(Object *obj)
 s->exception_index = -1;
 cpu_set_fpcsr(>env, 0);
 
+set_float_detect_tininess(float_tininess_before_rounding,
+  >env.fp_status);
+
 #ifndef CONFIG_USER_ONLY
 cpu->env.picmr = 0x;
 cpu->env.picsr = 0x;
-- 
2.39.1

[PATCH v3 0/3] OpenRISC updates for user space FPU

2023-05-11 Thread Stafford Horne

Hello,

Since v2:
 - Add reviewed-by's from Richard
 - Pull cpu definition out of ifdef in helper_mfspr
Since v1:
 - Fixups suggested by Richard Henderson

This series adds support for the FPU related architecture changes defined in
architecture spec revision v1.4.

 - https://openrisc.io/revisions/r1.4

In summary the architecture changes are:

 - Change FPCSR SPR permissions to allow for reading and writing from user
   space.
 - Clarify that FPU underflow detection is done by detecting tininess before
   rounding.

Previous to this series FPCSR reads and writes from user-mode in QEMU would
throw an illegal argument exception.  The proper behavior should have been to
treat these operations as no-ops as the cpu implementations do.  As mentioned
series changes FPCSR read/write to follow the spec.

The series has been tested with the FPU support added in glibc test suite and
all math tests are passing.


Stafford Horne (3):
  target/openrisc: Allow fpcsr access in user mode
  target/openrisc: Set PC to cpu state on FPU exception
  target/openrisc: Setup FPU for detecting tininess before rounding

 target/openrisc/cpu.c|  4 ++
 target/openrisc/fpu_helper.c | 13 ++-
 target/openrisc/sys_helper.c | 45 --
 target/openrisc/translate.c  | 72 
 4 files changed, 81 insertions(+), 53 deletions(-)

-- 
2.39.1

[PATCH v3 1/3] target/openrisc: Allow fpcsr access in user mode

2023-05-11 Thread Stafford Horne

As per OpenRISC spec 1.4 FPCSR can be read and written in user mode.

Update mtspr and mfspr helpers to support this by moving the is_user
check into the helper.

Link: 
https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf
Signed-off-by: Stafford Horne 
Reviewed-by: Richard Henderson 
---

Since v2:
 - Add reviewed-by
 - In helper_mfspr bring cpu out of ifdef to avoid replicatig the definition.
   Originally I left it in the ifdef to avoid having to mix having pointers and
   the data array defined on the stack.  But that's overthinking.
Since v1:
 - Update commit message to remove text about no-existant logic change.

 target/openrisc/sys_helper.c | 45 --
 target/openrisc/translate.c  | 72 
 2 files changed, 66 insertions(+), 51 deletions(-)

diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index ec145960e3..ccdee3b8be 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -29,17 +29,37 @@
 
 #define TO_SPR(group, number) (((group) << 11) + (number))
 
+static inline bool is_user(CPUOpenRISCState *env)
+{
+#ifdef CONFIG_USER_ONLY
+return true;
+#else
+return (env->sr & SR_SM) == 0;
+#endif
+}
+
 void HELPER(mtspr)(CPUOpenRISCState *env, target_ulong spr, target_ulong rb)
 {
-#ifndef CONFIG_USER_ONLY
 OpenRISCCPU *cpu = env_archcpu(env);
+#ifndef CONFIG_USER_ONLY
 CPUState *cs = env_cpu(env);
 target_ulong mr;
 int idx;
 #endif
 
+/* Handle user accessible SPRs first.  */
 switch (spr) {
+case TO_SPR(0, 20): /* FPCSR */
+cpu_set_fpcsr(env, rb);
+return;
+}
+
+if (is_user(env)) {
+raise_exception(cpu, EXCP_ILLEGAL);
+}
+
 #ifndef CONFIG_USER_ONLY
+switch (spr) {
 case TO_SPR(0, 11): /* EVBAR */
 env->evbar = rb;
 break;
@@ -187,27 +207,33 @@ void HELPER(mtspr)(CPUOpenRISCState *env, target_ulong 
spr, target_ulong rb)
 cpu_openrisc_timer_update(cpu);
 qemu_mutex_unlock_iothread();
 break;
-#endif
-
-case TO_SPR(0, 20): /* FPCSR */
-cpu_set_fpcsr(env, rb);
-break;
 }
+#endif
 }
 
 target_ulong HELPER(mfspr)(CPUOpenRISCState *env, target_ulong rd,
target_ulong spr)
 {
+OpenRISCCPU *cpu = env_archcpu(env);
 #ifndef CONFIG_USER_ONLY
 uint64_t data[TARGET_INSN_START_WORDS];
 MachineState *ms = MACHINE(qdev_get_machine());
-OpenRISCCPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
 int idx;
 #endif
 
+/* Handle user accessible SPRs first.  */
 switch (spr) {
+case TO_SPR(0, 20): /* FPCSR */
+return env->fpcsr;
+}
+
+if (is_user(env)) {
+raise_exception(cpu, EXCP_ILLEGAL);
+}
+
 #ifndef CONFIG_USER_ONLY
+switch (spr) {
 case TO_SPR(0, 0): /* VR */
 return env->vr;
 
@@ -324,11 +350,8 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState *env, 
target_ulong rd,
 cpu_openrisc_count_update(cpu);
 qemu_mutex_unlock_iothread();
 return cpu_openrisc_count_get(cpu);
-#endif
-
-case TO_SPR(0, 20): /* FPCSR */
-return env->fpcsr;
 }
+#endif
 
 /* for rd is passed in, if rd unchanged, just keep it back.  */
 return rd;
diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index 76e53c78d4..43ba0cc1ad 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -819,45 +819,12 @@ static bool trans_l_xori(DisasContext *dc, arg_rri *a)
 
 static bool trans_l_mfspr(DisasContext *dc, arg_l_mfspr *a)
 {
-check_r0_write(dc, a->d);
-
-if (is_user(dc)) {
-gen_illegal_exception(dc);
-} else {
-TCGv spr = tcg_temp_new();
-
-if (tb_cflags(dc->base.tb) & CF_USE_ICOUNT) {
-gen_io_start();
-if (dc->delayed_branch) {
-tcg_gen_mov_tl(cpu_pc, jmp_pc);
-tcg_gen_discard_tl(jmp_pc);
-} else {
-tcg_gen_movi_tl(cpu_pc, dc->base.pc_next + 4);
-}
-dc->base.is_jmp = DISAS_EXIT;
-}
+TCGv spr = tcg_temp_new();
 
-tcg_gen_ori_tl(spr, cpu_R(dc, a->a), a->k);
-gen_helper_mfspr(cpu_R(dc, a->d), cpu_env, cpu_R(dc, a->d), spr);
-}
-return true;
-}
-
-static bool trans_l_mtspr(DisasContext *dc, arg_l_mtspr *a)
-{
-if (is_user(dc)) {
-gen_illegal_exception(dc);
-} else {
-TCGv spr;
+check_r0_write(dc, a->d);
 
-if (tb_cflags(dc->base.tb) & CF_USE_ICOUNT) {
-gen_io_start();
-}
-/* For SR, we will need to exit the TB to recognize the new
- * exception state.  For NPC, in theory this counts as a branch
- * (although the SPR only exists for use by an ICE).  Save all
- * of the cpu state first, allowing it to be overwritten.
- */
+if (tb_cflags(dc->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
 if

[PATCH v3 2/3] target/openrisc: Set PC to cpu state on FPU exception

2023-05-11 Thread Stafford Horne

Store the PC to ensure the correct value can be read in the exception
handler.

Signed-off-by: Stafford Horne 
Reviewed-by: Richard Henderson 
---
Since v2:
 - Add reviewed-by
Since v1:
 - Use function do_fpe (similar to do_range) to raise exception.

 target/openrisc/fpu_helper.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/target/openrisc/fpu_helper.c b/target/openrisc/fpu_helper.c
index f9e34fa2cc..8b81d2f62f 100644
--- a/target/openrisc/fpu_helper.c
+++ b/target/openrisc/fpu_helper.c
@@ -20,8 +20,8 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "exec/exec-all.h"
 #include "exec/helper-proto.h"
-#include "exception.h"
 #include "fpu/softfloat.h"
 
 static int ieee_ex_to_openrisc(int fexcp)
@@ -45,6 +45,15 @@ static int ieee_ex_to_openrisc(int fexcp)
 return ret;
 }
 
+static G_NORETURN
+void do_fpe(CPUOpenRISCState *env, uintptr_t pc)
+{
+CPUState *cs = env_cpu(env);
+
+cs->exception_index = EXCP_FPE;
+cpu_loop_exit_restore(cs, pc);
+}
+
 void HELPER(update_fpcsr)(CPUOpenRISCState *env)
 {
 int tmp = get_float_exception_flags(>fp_status);
@@ -55,7 +64,7 @@ void HELPER(update_fpcsr)(CPUOpenRISCState *env)
 if (tmp) {
 env->fpcsr |= tmp;
 if (env->fpcsr & FPCSR_FPEE) {
-helper_exception(env, EXCP_FPE);
+do_fpe(env, GETPC());
 }
 }
 }
-- 
2.39.1

Re: [PATCH v5 6/6] igb: packet-split descriptors support

2023-05-11 Thread Akihiko Odaki


Hi,

Thank you for continuously working on this series. I have some comments, 
but I guess this series will be ready after one or two more rounds.


On 2023/05/10 17:22, Tomasz Dzieciol wrote:

Packet-split descriptors are used by Linux VF driver for MTU values from 2048

Signed-off-by: Tomasz Dzieciol 
---
  hw/net/e1000e_core.c |  10 +-
  hw/net/igb_core.c| 403 ++-
  hw/net/igb_regs.h|   7 +
  hw/net/trace-events  |   2 +-
  4 files changed, 373 insertions(+), 49 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index f9ff31fd70..be0cf2f941 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c


Please extract changes for e1000e into another patch.


@@ -1397,15 +1397,15 @@ e1000e_pci_dma_write_rx_desc(E1000ECore *core, 
dma_addr_t addr,
  }
  }
  
-typedef struct e1000e_ba_state_st {

+typedef struct E1000EBAState {
  uint16_t written[MAX_PS_BUFFERS];
  uint8_t cur_idx;
-} e1000e_ba_state;
+} E1000EBAState;
  
  static inline void

  e1000e_write_hdr_to_rx_buffers(E1000ECore *core,
 hwaddr ba[MAX_PS_BUFFERS],
-   e1000e_ba_state *bastate,
+   E1000EBAState *bastate,
 const char *data,
 dma_addr_t data_len)
  {
@@ -1420,7 +1420,7 @@ e1000e_write_hdr_to_rx_buffers(E1000ECore *core,
  static void
  e1000e_write_payload_frag_to_rx_buffers(E1000ECore *core,
  hwaddr ba[MAX_PS_BUFFERS],
-e1000e_ba_state *bastate,
+E1000EBAState *bastate,
  const char *data,
  dma_addr_t data_len)
  {
@@ -1530,7 +1530,7 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct 
NetRxPkt *pkt,
  
  do {

  hwaddr ba[MAX_PS_BUFFERS];
-e1000e_ba_state bastate = { { 0 } };
+E1000EBAState bastate = { { 0 } };
  bool is_last = false;
  
  desc_size = total_size - desc_offset;

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 6d95cccea3..9a08f2e7d3 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -267,6 +267,29 @@ igb_rx_use_legacy_descriptor(IGBCore *core)
  return false;
  }
  
+typedef struct E1000ERingInfo {

+int dbah;
+int dbal;
+int dlen;
+int dh;
+int dt;
+int idx;
+} E1000ERingInfo;
+
+static uint32_t
+igb_rx_queue_desctyp_get(IGBCore *core, const E1000ERingInfo *r)
+{
+return core->mac[E1000_SRRCTL(r->idx) >> 2] & E1000_SRRCTL_DESCTYPE_MASK;
+}
+
+static bool
+igb_rx_use_ps_descriptor(IGBCore *core, const E1000ERingInfo *r)
+{
+uint32_t desctyp = igb_rx_queue_desctyp_get(core, r);
+return desctyp == E1000_SRRCTL_DESCTYPE_HDR_SPLIT ||
+   desctyp == E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
+}
+
  static inline bool
  igb_rss_enabled(IGBCore *core)
  {
@@ -694,15 +717,6 @@ static uint32_t igb_rx_wb_eic(IGBCore *core, int queue_idx)
  return (ent & E1000_IVAR_VALID) ? BIT(ent & 0x1f) : 0;
  }
  
-typedef struct E1000ERingInfo {

-int dbah;
-int dbal;
-int dlen;
-int dh;
-int dt;
-int idx;
-} E1000ERingInfo;
-
  static inline bool
  igb_ring_empty(IGBCore *core, const E1000ERingInfo *r)
  {
@@ -1233,12 +1247,54 @@ igb_read_lgcy_rx_descr(IGBCore *core, struct 
e1000_rx_desc *desc,
  }
  
  static inline void

-igb_read_adv_rx_descr(IGBCore *core, union e1000_adv_rx_desc *desc,
-  hwaddr *buff_addr)
+igb_read_adv_rx_single_buf_descr(IGBCore *core, union e1000_adv_rx_desc *desc,
+ hwaddr *buff_addr)
  {
  *buff_addr = le64_to_cpu(desc->read.pkt_addr);
  }
  
+typedef struct IGBRxDescBufAddrs {

+hwaddr hdr_buf_addr;
+hwaddr pkt_buf_addr;
+} IGBRxDescBufAddrs;
+
+typedef enum IGBBastateCurrentBuffer {
+IgbHdrBuf = 0,
+IgbPktBuf = 1
+} IGBBastateCurrentBuffer;
+
+static hwaddr *
+igb_buf_addr(IGBRxDescBufAddrs *buf_addrs, IGBBastateCurrentBuffer cur_buf)
+{
+return (cur_buf == IgbPktBuf) ? _addrs->pkt_buf_addr :
+_addrs->hdr_buf_addr;
+}
+
+static inline void
+igb_read_adv_rx_split_buf_descr(IGBCore *core, union e1000_adv_rx_desc *desc,
+IGBRxDescBufAddrs *buf_addr)
+{
+buf_addr->hdr_buf_addr = le64_to_cpu(desc->read.hdr_addr);
+buf_addr->pkt_buf_addr = le64_to_cpu(desc->read.pkt_addr);
+}
+
+typedef struct IGBRxDescBuffWritten {
+uint16_t hdr_buf_written;
+uint16_t pkt_buf_written;
+} IGBRxDescBuffWritten;
+
+static uint16_t *
+igb_buf_written(IGBRxDescBuffWritten *written, IGBBastateCurrentBuffer cur_buf)
+{
+return (cur_buf == IgbPktBuf) ? >pkt_buf_written :
+>hdr_buf_written;
+}
+
+typedef struct IGBBAState {
+IGBRxDescBuffWritten written;
+

Re: [PATCH 0/8] migration: Add precopy initial data capability and VFIO precopy support

2023-05-11 Thread Avihai Horon




On 11/05/2023 16:09, Juan Quintela wrote:

External email: Use caution opening links or attachments


Avihai Horon  wrote:

On 10/05/2023 19:41, Juan Quintela wrote:

Does this makes sense?

Yes, thanks a lot for the full and detailed explanation!

Thank you.


This indeed solves the problem in the scenario I mentioned above.

However, this relies on the fact that a device support for this
feature depends only on the QEMU version.
This is not the case for VFIO devices.

What a surprise :-)

Yes, I couldn't resist.


To support explicit-switchover, a VFIO device also needs host kernel
support for VFIO precopy, i.e., it needs to have the
VFIO_MIGRATION_PRE_COPY flag set.
So, theoretically we could have the following:
- Source and destination QEMU are the same version.
- We migrate two different VFIO devices (i.e., they don't share the
   same kernel driver), device X and device Y.
- Host kernel in source supports VIFO precopy for device X but not for
   device Y.
- Host kernel in destination supports VFIO precopy for both device X
   and device Y.
Without explicit-switchover, migration should work.
But if we enable explicit-switchover and do migration, we would end up
in the same situation where switchover_pending=2 in destination and it
never reaches zero so migration is stuck.

I think this is too much for qemu.  You need to work at the
libvirt/management level.


This could be solved by moving the switchover_pending counter to the
source and sending multiple MIG_RP explicit-switchover ACK messages.
However, I also raised a concern about this in my last mail to Peter
[1], where this is not guaranteed to work, depending on the device
implementation for explicit-switchover feature.

I will not try to be extra clever here.  We have removed qemu support of
the question, as it is the same qemu in both sides.

So what we have is this configuration:

Host A
--
device X explicit_switchoever=on
device Y explicit_switchoever=off

Host B
--
device X explicit_switchoever=on
device Y explicit_switchoever=on

The configuration is different.  That is something that qemu protocol
don't know how to handle, and it is up to stack.

You need to configure explicitely in qemu command line on host B:
device=Y,explicit_switchover=off

Or whatever is that configured off.


I understand.



It is exactly the same problem than:

Host A
--

Intel CPU genX

Host B
--

intel CPU genX-1

i.e. there are features that Host A has but host B don't have.  The only
way to make this work is that you need to configure qemu when launched
in Host A with a cpu type that host B is able to run (i.e. one that
don't have any features that Host B is missing).

What is the difference between this and yours?


Hmm, yes, I see your point.





Not sure though if I'm digging too deep in some improbable future
corner cases.

Oh, you are just starting.  The compat layers that CPU have had to do
over the years.  At some point even migration between AMD and Intel
CPU's worked.


Let's go back to the basic question, which is whether we need to send
an "advise" message for each device that supports explicit-switchover.
I think it gives us more flexibility and although not needed at the
moment, might be useful in the future.

I think that is not a good idea, see my previous comment.  We have two
cases:
- both devices have the same features in both places
- they have different features in any of the places

First case, we don't care.  It always work.
Second case, we need to configure it correctly, and that means disable
features that are not on the other side.


Yep, I understand.




If you want I can send a v2 that addresses the comments and simplifies
the code in other areas and we'll continue discussing the necessity of
the "advise" message then.

Yeap.  I think is the best course of action.


OK, so let me digest all the new info of this discussion and get back 
with v2 / conclusions / questions.


Thanks for all the help!

Re: [PATCH 2/3] softmmu/ioport.c: QOMify MemoryRegionPortioList

2023-05-11 Thread Mark Cave-Ayland


On 11/05/2023 14:50, Philippe Mathieu-Daudé wrote:


On 19/4/23 17:16, Mark Cave-Ayland wrote:

The aim of QOMification is so that the lifetime of the MemoryRegionPortioList
structure can be managed using QOM's in-built refcounting instead of having to
handle this manually.

Due to the use of an opaque pointer it isn't possible to model the new
TYPE_MEMORY_REGION_PORTIO_LIST directly using QOM properties, however since
use of the new object is restricted to the portio API we can simply set the
opaque pointer (and the heap-allocated port list) internally.

Signed-off-by: Mark Cave-Ayland 
---
  softmmu/ioport.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)




  static uint64_t unassigned_io_read(void *opaque, hwaddr addr, unsigned size)
  {
@@ -228,7 +233,8 @@ static void portio_list_add_1(PortioList *piolist,
  unsigned i;
  /* Copy the sub-list and null-terminate it.  */
-    mrpio = g_malloc0(sizeof(MemoryRegionPortioList));
+    mrpio = MEMORY_REGION_PORTIO_LIST(
+    object_new(TYPE_MEMORY_REGION_PORTIO_LIST));


Shouldn't we need to replace the g_free() call by object_unref()
in portio_list_destroy()?


Both the existing g_free() call and replacing it with object_unref() cause QEMU to 
segfault if you call portio_list_destroy() before the final patch in this series. So 
in effect we'd end up causing more code churn just to replace one crash with another ;)


The real fix is in patch 3 which alters the refcounting/ownership in order to solve 
the underlying issue, which I hope I have described in enough detail in that commit 
message.



  mrpio->portio_opaque = piolist->opaque;
  mrpio->ports = g_malloc0(sizeof(MemoryRegionPortio) * (count + 1));
  memcpy(mrpio->ports, pio_init, sizeof(MemoryRegionPortio) * count);
@@ -298,3 +304,16 @@ void portio_list_del(PortioList *piolist)
  memory_region_del_subregion(piolist->address_space, >mr);
  }
  }



ATB,

Mark.

1 2 3 4 >

1 - 100 of 310 matches

Mail list logo