date:20240229

Re: [PATCH 5/8] tests/unit/test-smp-parse.c: Test "drawers" parameter in -smp

2024-02-29 Thread Thomas Huth


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

Although drawer was introduced to -smp along with book by s390 machine,
as a general topology level in QEMU that may be reused by other arches
in the future, it is desirable to cover this parameter's parsing in a
separate case.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 89 +
  1 file changed, 89 insertions(+)



Reviewed-by: Thomas Huth

Re: [PATCH 4/8] tests/unit/test-smp-parse.c: Test "books" parameter in -smp

2024-02-29 Thread Thomas Huth


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

Although book was introduced to -smp along with drawer by s390 machine,
as a general topology level in QEMU that may be reused by other arches
in the future, it is desirable to cover this parameter's parsing in a
separate case.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 105 
  1 file changed, 105 insertions(+)


Reviewed-by: Thomas Huth

Re: [PATCH 3/8] tests/unit/test-smp-parse.c: Make test cases aware of the book/drawer

2024-02-29 Thread Thomas Huth


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

Currently, -smp supports 2 more new levels: book and drawer.

It is necessary to consider the effects of book and drawer in the test
cases to ensure that the calculations are correct. This is also the
preparation to add new book and drawer test cases.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 55ba13bf7d15..a8eb3bbb35ed 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -384,6 +384,8 @@ static char *smp_config_to_string(const SMPConfiguration 
*config)
  return g_strdup_printf(
  "(SMPConfiguration) {\n"
  ".has_cpus = %5s, cpus = %" PRId64 ",\n"
+".has_drawers  = %5s, drawers  = %" PRId64 ",\n"
+".has_books= %5s, books= %" PRId64 ",\n"
  ".has_sockets  = %5s, sockets  = %" PRId64 ",\n"
  ".has_dies = %5s, dies = %" PRId64 ",\n"
  ".has_clusters = %5s, clusters = %" PRId64 ",\n"
@@ -392,6 +394,8 @@ static char *smp_config_to_string(const SMPConfiguration 
*config)
  ".has_maxcpus  = %5s, maxcpus  = %" PRId64 ",\n"
  "}",
  config->has_cpus ? "true" : "false", config->cpus,
+config->has_drawers ? "true" : "false", config->drawers,
+config->has_books ? "true" : "false", config->books,
  config->has_sockets ? "true" : "false", config->sockets,
  config->has_dies ? "true" : "false", config->dies,
  config->has_clusters ? "true" : "false", config->clusters,
@@ -404,10 +408,10 @@ static char *smp_config_to_string(const SMPConfiguration 
*config)
  static unsigned int cpu_topology_get_threads_per_socket(const CpuTopology 
*topo)
  {
  /* Check the divisor to avoid invalid topology examples causing SIGFPE. */
-if (!topo->sockets) {
+if (!topo->drawers || !topo->books || !topo->sockets) {
  return 0;
  } else {
-return topo->max_cpus / topo->sockets;
+return topo->max_cpus / topo->drawers / topo->books / topo->sockets;
  }
  }
  
@@ -429,6 +433,8 @@ static char *cpu_topology_to_string(const CpuTopology *topo,

  return g_strdup_printf(
  "(CpuTopology) {\n"
  ".cpus   = %u,\n"
+".drawers= %u,\n"
+".books  = %u,\n"
  ".sockets= %u,\n"
  ".dies   = %u,\n"
  ".clusters   = %u,\n"
@@ -438,7 +444,8 @@ static char *cpu_topology_to_string(const CpuTopology *topo,
  ".threads_per_socket = %u,\n"
  ".cores_per_socket   = %u,\n"
  "}",
-topo->cpus, topo->sockets, topo->dies, topo->clusters,
+topo->cpus, topo->drawers, topo->books,
+topo->sockets, topo->dies, topo->clusters,
  topo->cores, topo->threads, topo->max_cpus,
  threads_per_socket, cores_per_socket);
  }
@@ -473,6 +480,8 @@ static void check_parse(MachineState *ms, const 
SMPConfiguration *config,
  if (is_valid) {
  if ((err == NULL) &&
  (ms->smp.cpus == expect_topo->cpus) &&
+(ms->smp.drawers == expect_topo->drawers) &&
+(ms->smp.books == expect_topo->books) &&
  (ms->smp.sockets == expect_topo->sockets) &&
  (ms->smp.dies == expect_topo->dies) &&
  (ms->smp.clusters == expect_topo->clusters) &&
@@ -564,6 +573,16 @@ static void unsupported_params_init(const MachineClass 
*mc, SMPTestData *data)
  data->expect_prefer_sockets.clusters = 1;
  data->expect_prefer_cores.clusters = 1;
  }
+
+if (!mc->smp_props.books_supported) {
+data->expect_prefer_sockets.books = 1;
+data->expect_prefer_cores.books = 1;
+}
+
+if (!mc->smp_props.drawers_supported) {
+data->expect_prefer_sockets.drawers = 1;
+data->expect_prefer_cores.drawers = 1;
+}
  }
  


Reviewed-by: Thomas Huth

Re: [PATCH V2] migration: export fewer options

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 07:03:36AM +0100, Markus Armbruster wrote:
> Steven Sistare  writes:
> 
> > Just a reminder, after our further discussion in the V1 thread, 
> > this patch is still what I propose, no updates needed.
> >
> > Markus, I think Peter is looking for your blessing on the new
> > file name: include/migration/client-options.h.
> 
> Not my preference, but no objection.

There's yet one alternative, which is to put these exported option
functions into misc.h directly.  After all that's not so much, and misc.h
already hold random stuff from elsewhere.

Steve, would you repost this patch (with/without my above comment taken)
along with your other series with a rebase to migration-next?  It doesn't
apply there.  Re the other series: one nitpick comment on the last patch,
where you may consider splitting the removal of the unused 2 functions into
a standalone patch.  Other than that it looks good to me.

https://gitlab.com/peterx/qemu/-/tree/migration-next

Thanks,

-- 
Peter Xu

Re: [PATCH 2/8] tests/unit/test-smp-parse.c: Bump max_cpus to 4096

2024-02-29 Thread Zhao Liu

Hi Thomas,

On Fri, Mar 01, 2024 at 08:40:54AM +0100, Thomas Huth wrote:
> Date: Fri, 1 Mar 2024 08:40:54 +0100
> From: Thomas Huth 
> Subject: Re: [PATCH 2/8] tests/unit/test-smp-parse.c: Bump max_cpus to 4096
> 
> On 18/01/2024 15.48, Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > PC mahine is trying to support up to 4096 vCPUs [1], so it's necessary
> 
> s/mahine/machine/
> 
> > to bump max_cpus in test-smp-parse to 4096 to cover the topological
> > needs of future machines.
> > 
> > [1]: 
> > https://lore.kernel.org/qemu-devel/20231208122611.32311-1-anisi...@redhat.com/
> 
> Is it ok for this test patch here to get included without that patch
> already, or shall this wait for the patch from Ani first?
>

This patch doesn't rely on Ani's, but given my typo and outdated links
(Ani already has v5), I can respin the series soon!

Thanks,
Zhao

Re: Intention to work on GSoC project

2024-02-29 Thread Eugenio Perez Martin

On Thu, Feb 29, 2024 at 9:32 AM Stefano Garzarella  wrote:
>
> Hi Sahil,
>
> On Sun, Feb 25, 2024 at 10:38 PM Sahil  wrote:
> >
> > Hi,
> >
> > My name is Sahil and I go by the pseudonym 'valdaarhun' on Github. I have
> > never contributed to QEMU before but I have used it a few times as an end
> > user. I developed an interest in virtualization during my internship at
> > VMware and would like to dive deeper in this subfield.
> >
> > My current full-time job does not allow me to take part in external programs
> > that are paid. I would like to work on one of the proposed projects outside
> > of GSoC.
>
> Sure, not a problem at all, also because for this year QEMU was not
> accepted in GSoC, so anybody can work on those projects if they have
> time
>
> > I have gone through QEMU's list of GSoC '24 projects [1] and am
> > interested in two of them:
> >
> > 1. Add packed virtqueue to Shadow Virtqueue
> > 2. vhost-user memory isolation
> >
> > Based on what I have understood, they are somewhat related and are part
> > of the migration subsystem. I feel the learning curve of the first project
> > will be less steep and will make me better prepared to tackle the second
> > project as well.
>
> The first project is for sure related with migration. While vhost-user
> memory isolation is not really related to migration, but both are
> related to virtio devices.
> Anyway, your plan looks good to me!
>

I agree with all Stefano's answers above.

> >
> > I have read the "Getting Started for Developers" [2] wiki page. I have also
> > built QEMU from source.
>
> Great!
>
> >
> > I think my next step should be to read the documentation on the migration
> > subsystem [3], the blog posts attached in the first project's description
> > and virtqueue's implementation.

You can add the recently published
"virtio-live-migration-technical-deep-dive" :) [1].

[1] 
https://developers.redhat.com/articles/2024/02/21/virtio-live-migration-technical-deep-dive

> > Would you also recommend that I work on a
> > QEMU issue that is open on Gitlab and related to virtqueues/virtio to
> > familiarize
> > myself with the codebase? I went through the issues tagged as 
> > "device:virtio"
> > [4]
> > but can't really tell if any of them are good for beginners. One of them has
> > the
> > "bite-size" tag [5]. It also has a patch attached but hasn't been merged.
> > Shall I
> > work on getting that merged?
>
> Yeah, "bite-size" issues should be better to understand how to
> contribute to QEMU.
> Feel free to work on any issue, doing the work or helping to complete
> old patches.
>

Right,

There are few other tasks pending for QEMU in general and SVQ in
particular. I'll add them as gitlab issues by today.

Thanks!

> >
> > I have worked on a few smaller systems programming issues in other
> > organizations (eg: strace [6], htop [7]) in the past.
> >
> > I look forward to hearing from you.
>
> Feel free to reach us if you have more questions on the projects.
>
> Thanks,
> Stefano
>

Re: [PATCH v4 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-29 Thread Markus Armbruster

Hao Xiang  writes:

> This change extends the MigrationStatus interface to track zero pages
> and zero bytes counter.
>
> Signed-off-by: Hao Xiang 

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index ca9561fbf1..03b850bab7 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -63,6 +63,10 @@
>  # between 0 and @dirty-sync-count * @multifd-channels.  (since
>  # 7.1)
>  #
> +# @zero-pages: number of zero pages (since 9.0)
> +#
> +# @zero-bytes: number of zero bytes sent (since 9.0)
> +#

Discussion of v3 has led me to believe:

1. A page is either migrated as a normal page or as a zero page.

2. The following equations hold:

@normal-bytes = @normal * @page-size

@zero-bytes = @zero-pages * @page-size

3. @zero-pages is the same as @duplicate, with a better name.  We intend
   to drop @duplicate eventually.

If this is correct, I'd like you to

A. Name it @zero for consistency with @normal.  Disregard my advice to
   name it @zero-pages; two consistent bad names are better than one bad
   name, one good name, and inconsistency.

B. Add @zero and @zero-bytes next to @normal and @normal-bytes.

C. Deprecate @duplicate (item 3).  Separate patch, please.

D. Consider documenting more clearly what normal and zero pages are
   (item 1), and how @FOO, @FOO-pages and @page-size are related (item
   2).  Could be done in a followup patch.

>  # Features:
>  #
>  # @deprecated: Member @skipped is always zero since 1.5.3
> @@ -81,7 +85,8 @@
> 'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
> 'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
> 'postcopy-bytes': 'uint64',
> -   'dirty-sync-missed-zero-copy': 'uint64' } }
> +   'dirty-sync-missed-zero-copy': 'uint64',
> +   'zero-pages': 'int', 'zero-bytes': 'size' } }
>  

[...]

Re: [PATCH 2/8] tests/unit/test-smp-parse.c: Bump max_cpus to 4096

2024-02-29 Thread Thomas Huth


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

PC mahine is trying to support up to 4096 vCPUs [1], so it's necessary


s/mahine/machine/


to bump max_cpus in test-smp-parse to 4096 to cover the topological
needs of future machines.

[1]: 
https://lore.kernel.org/qemu-devel/20231208122611.32311-1-anisi...@redhat.com/


Is it ok for this test patch here to get included without that patch 
already, or shall this wait for the patch from Ani first?


 Thomas



Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 20c663a006b3..55ba13bf7d15 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -20,8 +20,8 @@
  #define T true
  #define F false
  
-#define MIN_CPUS 1   /* set the min CPUs supported by the machine as 1 */

-#define MAX_CPUS 512 /* set the max CPUs supported by the machine as 512 */
+#define MIN_CPUS 1/* set the min CPUs supported by the machine as 1 */
+#define MAX_CPUS 4096 /* set the max CPUs supported by the machine as 4096 */
  
  #define SMP_MACHINE_NAME "TEST-SMP"
  
@@ -333,13 +333,13 @@ static const struct SMPTestData data_generic_invalid[] = {

  "by machine '" SMP_MACHINE_NAME "' is 2",
  }, {
  /*
- * config: -smp 512
+ * config: -smp 4096
   * The test machine should tweak the supported max CPUs to
- * 511 (MAX_CPUS - 1) for testing.
+ * 4095 (MAX_CPUS - 1) for testing.
   */
-.config = SMP_CONFIG_GENERIC(T, MAX_CPUS, F, 0, F, 0, F, 0, F, 0),
-.expect_error = "Invalid SMP CPUs 512. The max CPUs supported "
-"by machine '" SMP_MACHINE_NAME "' is 511",
+.config = SMP_CONFIG_GENERIC(T, 4096, F, 0, F, 0, F, 0, F, 0),
+.expect_error = "Invalid SMP CPUs 4096. The max CPUs supported "
+"by machine '" SMP_MACHINE_NAME "' is 4095",
  },
  };

Re: [PATCH 1/8] tests/unit/test-smp-parse.c: Use CPU number macros in invalid topology case

2024-02-29 Thread Thomas Huth


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

Use MAX_CPUS/MIN_CPUS micros in invalid topology case. This gives us the


s/micros/macros/

Apart from that typo, FWIW:
Reviewed-by: Thomas Huth 


flexibility to change the maximum and minimum CPU limits.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 24972666a74d..20c663a006b3 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -323,15 +323,21 @@ static const struct SMPTestData data_generic_invalid[] = {
  "sockets (2) * cores (4) * threads (2) "
  "== maxcpus (16) < smp_cpus (18)",
  }, {
-/* config: -smp 1
- * should tweak the supported min CPUs to 2 for testing */
-.config = SMP_CONFIG_GENERIC(T, 1, F, 0, F, 0, F, 0, F, 0),
+/*
+ * config: -smp 1
+ * The test machine should tweak the supported min CPUs to
+ * 2 (MIN_CPUS + 1) for testing.
+ */
+.config = SMP_CONFIG_GENERIC(T, MIN_CPUS, F, 0, F, 0, F, 0, F, 0),
  .expect_error = "Invalid SMP CPUs 1. The min CPUs supported "
  "by machine '" SMP_MACHINE_NAME "' is 2",
  }, {
-/* config: -smp 512
- * should tweak the supported max CPUs to 511 for testing */
-.config = SMP_CONFIG_GENERIC(T, 512, F, 0, F, 0, F, 0, F, 0),
+/*
+ * config: -smp 512
+ * The test machine should tweak the supported max CPUs to
+ * 511 (MAX_CPUS - 1) for testing.
+ */
+.config = SMP_CONFIG_GENERIC(T, MAX_CPUS, F, 0, F, 0, F, 0, F, 0),
  .expect_error = "Invalid SMP CPUs 512. The max CPUs supported "
  "by machine '" SMP_MACHINE_NAME "' is 511",
  },
@@ -575,8 +581,8 @@ static void machine_generic_invalid_class_init(ObjectClass 
*oc, void *data)
  MachineClass *mc = MACHINE_CLASS(oc);
  
  /* Force invalid min CPUs and max CPUs */

-mc->min_cpus = 2;
-mc->max_cpus = 511;
+mc->min_cpus = MIN_CPUS + 1;
+mc->max_cpus = MAX_CPUS - 1;
  }
  
  static void machine_with_dies_class_init(ObjectClass *oc, void *data)

[PATCH] hw: gpio: introduce pcf8574 driver

2024-02-29 Thread Dmitriy Sharikhin

NXP PCF8574 and compatible ICs are simple I2C GPIO expanders.
PCF8574 incorporates quasi-bidirectional IO, and simple
communication protocol, when IO read is I2C byte read, and
IO write is I2C byte write. User can think of it as
open-drain port, when line high state is input and line low
state is output.

This patch allow to instantiate virtual I2C device called
"pcf8574" in machine init code via generic mechanism.

Signed-off-by: Dmitrii Sharikhin 
---
 hw/gpio/Kconfig   |   4 ++
 hw/gpio/meson.build   |   1 +
 hw/gpio/pcf8574.c | 139 ++
 include/hw/gpio/pcf8574.h |  15 
 4 files changed, 159 insertions(+)
 create mode 100644 hw/gpio/pcf8574.c
 create mode 100644 include/hw/gpio/pcf8574.h

diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
index d2cf3accc8..bb731ff4ce 100644
--- a/hw/gpio/Kconfig
+++ b/hw/gpio/Kconfig
@@ -16,3 +16,7 @@ config GPIO_PWR
 
 config SIFIVE_GPIO
 bool
+
+config PCF8574
+bool
+depends on I2C
diff --git a/hw/gpio/meson.build b/hw/gpio/meson.build
index 8a8d03d885..c0d9a3c757 100644
--- a/hw/gpio/meson.build
+++ b/hw/gpio/meson.build
@@ -15,3 +15,4 @@ system_ss.add(when: 'CONFIG_RASPI', if_true: files(
 ))
 system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('aspeed_gpio.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_GPIO', if_true: files('sifive_gpio.c'))
+system_ss.add(when: 'CONFIG_PCF8574', if_true: files('pcf8574.c'))
diff --git a/hw/gpio/pcf8574.c b/hw/gpio/pcf8574.c
new file mode 100644
index 00..a6c6bd36fa
--- /dev/null
+++ b/hw/gpio/pcf8574.c
@@ -0,0 +1,139 @@
+/*
+ * NXP PCF8574 8-port I2C GPIO expansion chip.
+ *
+ * Copyright (c) 2024 KNS Group (YADRO).
+ * Written by Dmitrii Sharikhin 
+ *
+ * This file is licensed under GNU GPL.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i2c/i2c.h"
+#include "hw/gpio/pcf8574.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qom/object.h"
+
+/**
+ * PCF8574 and compatible chips incorporate quasi-bidirectional
+ * IO. Electrically it means that device sustain pull-up to line
+ * unless IO port is configured as output _and_ driven low.
+ * 
+ * IO access is implemented as simple I2C single-byte read
+ * or write operation. So, to configure line to input user write 1
+ * to corresponding bit. To configure line to output and drive it low
+ * user write 0 to corresponding bit.
+ * 
+ * In essence, user can think of quasi-bidirectional IO as
+ * open-drain line, except presence of builtin rising edge acceleration
+ * embedded in PCF8574 IC
+ **/
+
+OBJECT_DECLARE_SIMPLE_TYPE(PCF8574State, PCF8574)
+
+struct PCF8574State {
+I2CSlave parent_obj;
+uint8_t  input;  /* external electrical line state */
+uint8_t  output; /* Pull-up (1) or drive low (0) on bit */
+qemu_irq handler[8];
+qemu_irq *gpio_in;
+};
+
+static void pcf8574_reset(DeviceState *dev)
+{
+PCF8574State *s = PCF8574(dev);
+s->input  = 0xFF;
+s->output = 0xFF;
+}
+
+static inline uint8_t pcf8574_line_state(PCF8574State *s)
+{
+// we driving line low or external circuit does that
+return s->input & s->output; 
+}
+
+static uint8_t pcf8574_rx(I2CSlave *i2c)
+{
+return pcf8574_line_state(PCF8574(i2c));
+}
+
+static int pcf8574_tx(I2CSlave *i2c, uint8_t data)
+{
+PCF8574State *s = PCF8574(i2c);
+uint8_t prev;
+uint8_t diff;
+uint8_t actual;
+int line = 0;
+
+prev = pcf8574_line_state(s);
+s->output = data;
+actual = pcf8574_line_state(s);
+
+for (diff = (actual ^ prev); diff; diff &= ~(1 << line))
+{
+line = ctz32(diff);
+if (s->handler[line])
+qemu_set_irq(s->handler[line], (actual >> line) & 1);
+}
+
+return 0;
+}
+
+static const VMStateDescription vmstate_pcf8574 = {
+.name = "pcf8574",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8(input,  PCF8574State),
+VMSTATE_UINT8(output, PCF8574State),
+VMSTATE_I2C_SLAVE(parent_obj, PCF8574State),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void pcf8574_gpio_set(void *opaque, int line, int level)
+{
+PCF8574State *s = (PCF8574State *) opaque;
+assert(line >= 0 && line < ARRAY_SIZE(s->handler)); 
+
+if (level)
+s->input |=  (1 << line);
+else
+s->input &= ~(1 << line);
+}
+
+static void pcf8574_realize(DeviceState *dev, Error **errp)
+{
+PCF8574State *s = PCF8574(dev);
+
+qdev_init_gpio_in(dev, pcf8574_gpio_set, ARRAY_SIZE(s->handler));
+qdev_init_gpio_out(dev, s->handler, ARRAY_SIZE(s->handler));
+}
+
+static void pcf8574_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass   *dc = DEVICE_CLASS(klass);
+I2CSlaveClass *k  = I2C_SLAVE_CLASS(klass);
+
+k->recv = pcf8574_rx;
+k->send = pcf8574_tx;
+dc->realize = pcf8574_realize;
+dc->reset   = pcf8574_reset;
+dc->vmsd= &vmstate_pcf8574;
+}
+
+static con

Re: [PATCH v4 2/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-02-29 Thread Markus Armbruster

Hao Xiang  writes:

> 1. Add zero_pages field in MultiFDPacket_t.
> 2. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 3. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
> 5. Adds zero page counters and updates multifd send/receive tracing
> format to track the newly added counters.
>
> Signed-off-by: Hao Xiang 

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8da05dba47..846d0411d5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -671,10 +671,15 @@
>  #
>  # @legacy: Perform zero page checking in main migration thread.
>  #
> +# @multifd: Perform zero page checking in multifd sender thread.
> +# This option only takes effect if migration capability multifd
> +# is set.  Otherwise, it will have the same effect as legacy.

Suggest

   # @multifd: Perform zero page checking in multifd sender thread if
   # multifd migration is enabled, else in the main migration
   # thread as for @legacy.

Thoughts?

> +#
>  # Since: 9.0
> +#
>  ##
>  { 'enum': 'ZeroPageDetection',
> -  'data': [ 'none', 'legacy' ] }
> +  'data': [ 'none', 'legacy', 'multifd' ] }
>  
>  ##
>  # @BitmapMigrationBitmapAliasTransform:

QAPI schema
Acked-by: Markus Armbruster

Re: [PATCH v4 1/7] migration/multifd: Add new migration option zero-page-detection.

2024-02-29 Thread Markus Armbruster

Hao Xiang  writes:

> This new parameter controls where the zero page checking is running.
> 1. If this parameter is set to 'legacy', zero page checking is
> done in the migration main thread.
> 2. If this parameter is set to 'none', zero page checking is disabled.
>
> Signed-off-by: Hao Xiang 

QAPI schema
Acked-by: Markus Armbruster

Re: [PATCH v6 00/23] migration: File based migration with multifd and mapped-ram

2024-02-29 Thread Markus Armbruster

Peter Xu  writes:

> On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
>> Based-on: 74aa0fb297 (migration: options incompatible with cpr) # 
>> peterx/migration-next
>> 
>> Hi,
>> 
>> In this v6:
>> 
>> - Minor fixes to 17/23 and 19/23
>
> The whole set looks good to me now.  I plan to queue it before the
> direct-io stuff.  Any other comments / concerns from anyone?

No.  My remaining review comments all apply to the direct-io part, which
got split off this series..

> Dan, would it be fine I queue the IO patches together?
>
> Thanks,

Re: [PULL 26/29] contrib/plugins: extend execlog to track register changes

2024-02-29 Thread Zhao Liu

Hi Alex,

I hit the following warnings (with "./configure --enable-werror"):

/qemu/contrib/plugins/execlog.c: In function ‘registers_init’:
/qemu/contrib/plugins/execlog.c:330:17: warning: ‘g_pattern_match_string’ is 
deprecated: Use 'g_pattern_spec_match_string' instead 
[-Wdeprecated-declarations]
  330 | if (g_pattern_match_string(pat, rd->name) ||
  | ^~
In file included from /usr/include/glib-2.0/glib.h:65,
 from /qemu/contrib/plugins/execlog.c:9:
/usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
   55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
  |   ^~
/qemu/contrib/plugins/execlog.c:331:21: warning: ‘g_pattern_match_string’ is 
deprecated: Use 'g_pattern_spec_match_string' instead 
[-Wdeprecated-declarations]
  331 | g_pattern_match_string(pat, rd_lower)) {
  | ^~
In file included from /usr/include/glib-2.0/glib.h:65,
 from /qemu/contrib/plugins/execlog.c:9:
/usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
   55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
  |   ^~
/qemu/contrib/plugins/execlog.c:339:63: warning: passing argument 2 of 
‘g_ptr_array_add’ discards ‘const’ qualifier from pointer target type 
[-Wdiscarded-qualifiers]
  339 | g_ptr_array_add(all_reg_names, reg->name);
  |~~~^~
In file included from /usr/include/glib-2.0/glib.h:31,
 from /qemu/contrib/plugins/execlog.c:9:
/usr/include/glib-2.0/glib/garray.h:192:62: note: expected ‘gpointer’ {aka 
‘void *’} but argument is of type ‘const char *’
  192 |gpointer  data);
  |~~^~~~

In addition, I checked my glic version:

$ldd --version
ldd (Ubuntu GLIBC 2.35-0ubuntu3.5) 2.35

I think it's v2.35. Are these three warning reports valid?

I also noticed in target/arm/helper.c, there's another
g_pattern_match_string() but I haven't met the warning.

Regards,
Zhao

On Wed, Feb 28, 2024 at 11:56:58AM +, Alex Bennée wrote:
> Date: Wed, 28 Feb 2024 11:56:58 +
> From: Alex Bennée 
> Subject: [PULL 26/29] contrib/plugins: extend execlog to track register
>  changes
> X-Mailer: git-send-email 2.39.2
> 
> With the new plugin register API we can now track changes to register
> values. Currently the implementation is fairly dumb which will slow
> down if a large number of register values are being tracked. This
> could be improved by only instrumenting instructions which mention
> registers we are interested in tracking.
> 
> Example usage:
> 
>   ./qemu-aarch64 -D plugin.log -d plugin \
>  -cpu max,sve256=on \
>  -plugin contrib/plugins/libexeclog.so,reg=sp,reg=z\* \
>  ./tests/tcg/aarch64-linux-user/sha512-sve
> 
> will display in the execlog any changes to the stack pointer (sp) and
> the SVE Z registers.
> 
> As testing registers every instruction will be quite a heavy operation
> there is an additional flag which attempts to optimise the register
> tracking by only instrumenting instructions which are likely to change
> its value. This relies on the QEMU disassembler showing up the register
> names in disassembly so is an explicit opt-in.
> 
> Reviewed-by: Pierrick Bouvier 
> Cc: Akihiko Odaki 
> Based-On:  <20231025093128.33116-19-akihiko.od...@daynix.com>
> Signed-off-by: Alex Bennée 
> Message-Id: <20240227144335.1196131-27-alex.ben...@linaro.org>
> 
> diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
> index 81dcd43a612..fa7421279f5 100644
> --- a/docs/devel/tcg-plugins.rst
> +++ b/docs/devel/tcg-plugins.rst
> @@ -497,6 +497,22 @@ arguments if required::
>$ qemu-system-arm $(QEMU_ARGS) \
>  -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 
> -d plugin
>  
> +This plugin can also dump registers when they change value. Specify the name 
> of the
> +registers with multiple ``reg`` options. You can also use glob style 
> matching if you wish::
> +
> +  $ qemu-system-arm $(QEMU_ARGS) \
> +-plugin ./contrib/plugins/libexeclog.so,reg=\*_el2,reg=sp -d plugin
> +
> +Be aware that each additional register to check will slow down
> +execution quite considerably. You can optimise the number of register
> +checks done by using the rdisas option. This will only instrument
> +instructions that mention the registers in question in disassembly.
> +This is not foolproof as some instructions implicitly change
> +instructions. You can use the ifilter to catch these cases:
> +
> +  $ qemu-system-arm $(QEMU_ARGS) \
> +-plugin 
> ./contrib/plugins/libexeclog.so,ifilter=msr,ifilter=blr,reg=x30,reg=\*_el1,rdisas=on
> +
>  - contrib/plugins/cache.c
>  
>  Cache modelling plugin that measures the perform

Re: [PATCH 17/17] target/s390x/cpu_models: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Zhao Liu

> > diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> > index a63d990e4e8e..1a1c09612271 100644
> > --- a/target/s390x/cpu_models.c
> > +++ b/target/s390x/cpu_models.c
> > @@ -503,6 +503,7 @@ static void error_prepend_missing_feat(const char 
> > *name, void *opaque)
> >   static void check_compatibility(const S390CPUModel *max_model,
> >   const S390CPUModel *model, Error **errp)
> >   {
> > +ERRP_GUARD();
> >   S390FeatBitmap missing;
> >   if (model->def->gen > max_model->def->gen) {
> > @@ -566,6 +567,7 @@ S390CPUModel *get_max_cpu_model(Error **errp)
> >   void s390_realize_cpu_model(CPUState *cs, Error **errp)
> >   {
> > +ERRP_GUARD();
> >   Error *err = NULL;
> 
> I think that function could use an additional clean-up now: Remove the local
> "err" variable and use "errp" only instead.
>

Thanks! It's cleaner. Will do this.

Regards,
Zhao

Re: [PATCH 04/17] hw/scsi/vhost-scsi: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Zhao Liu

> > @@ -220,6 +220,7 @@ static int vhost_scsi_set_workers(VHostSCSICommon *vsc, 
> > bool per_virtqueue)
> >   static void vhost_scsi_realize(DeviceState *dev, Error **errp)
> >   {
> > +ERRP_GUARD();
> >   VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> >   VHostSCSICommon *vsc = VHOST_SCSI_COMMON(dev);
> >   Error *err = NULL;
> 
> I think you could remove the "err" variable and the error_propagate stuff in
> here now.
>

OK, I'll. Thanks.

-Zhao

Re: [PATCH 01/17] hw/misc/ivshmem: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Zhao Liu

> > @@ -832,6 +832,7 @@ static void ivshmem_write_config(PCIDevice *pdev, 
> > uint32_t address,
> >   static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
> >   {
> > +ERRP_GUARD();
> >   IVShmemState *s = IVSHMEM_COMMON(dev);
> >   Error *err = NULL;
> 
> Please remove "err" and the error_propagate in here now.

OK, I hadn't thought of that. Thanks.

Regards,
Zhao

Re: [PATCH V2] migration: massage cpr-reboot documentation

2024-02-29 Thread Peter Xu

On Fri, Mar 01, 2024 at 06:57:32AM +0100, Markus Armbruster wrote:
> Upper case "QEMU", please.  More of the same below.

Fixed five occurances in my queue, attached R-b.

https://gitlab.com/peterx/qemu/-/commit/87a2848715f5fc4fa114574dbbf7a5564cb4cdd7

Thanks!

-- 
Peter Xu

Re: [PATCH v9 18/21] hw/i386/pc: Support smp.modules for x86 PC machine

2024-02-29 Thread Zhao Liu

> >> You have added drawers, books here. Were they missing before?
> >>
> > 
> > ...so yes, I think those 2 parameters are missed at this place.
> 
> ok. If there is another revision then add a line about this change in the
> commit message. Otherwise it is fine.
> 
> Reviewed-by: Babu Moger 
>

Sure! Thank you.

-Zhao

Re: [PATCH V2] migration: massage cpr-reboot documentation

2024-02-29 Thread Markus Armbruster

Steve Sistare  writes:

> Re-wrap the cpr-reboot documentation to 70 columns, use '@' for
> cpr-reboot references, capitalize COLO and VFIO, and tweak the
> wording.
>
> Suggested-by: Markus Armbruster 
> Signed-off-by: Steve Sistare 
> ---
>  qapi/migration.json | 36 +++-
>  1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index c6bfe2e..c86113b 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -636,28 +636,30 @@
>  #
>  # @normal: the original form of migration. (since 8.2)
>  #
> -# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
> -# After quitting qemu, the user resumes by running qemu -incoming.
> +# @cpr-reboot: The migrate command stops the VM and saves state to
> +# the URI.  After quitting qemu, the user resumes by running
> +# qemu -incoming.
>  #
> -# This mode allows the user to quit qemu, and restart an updated version
> -# of qemu.  The user may even update and reboot the OS before restarting,
> -# as long as the URI persists across a reboot.
> +# This mode allows the user to quit qemu, optionally update and

Upper case "QEMU", please.  More of the same below.

> +# reboot the OS, and restart qemu.  If the user reboots, the URI
> +# must persist across the reboot, such as by using a file.
>  #
> -# Unlike normal mode, the use of certain local storage options does not
> -# block the migration, but the user must not modify guest block devices
> -# between the quit and restart.
> +# Unlike normal mode, the use of certain local storage options
> +# does not block the migration, but the user must not modify the
> +# contents of guest block devices between the quit and restart.
>  #
> -# This mode supports vfio devices provided the user first puts the guest
> -# in the suspended runstate, such as by issuing guest-suspend-ram to the
> -# qemu guest agent.
> +# This mode supports VFIO devices provided the user first puts
> +# the guest in the suspended runstate, such as by issuing
> +# guest-suspend-ram to the qemu guest agent.
>  #
> -# Best performance is achieved when the memory backend is shared and the
> -# @x-ignore-shared migration capability is set, but this is not required.
> -# Further, if the user reboots before restarting such a configuration, 
> the
> -# shared backend must be be non-volatile across reboot, such as by 
> backing
> -# it with a dax device.
> +# Best performance is achieved when the memory backend is shared
> +# and the @x-ignore-shared migration capability is set, but this
> +# is not required.  Further, if the user reboots before restarting
> +# such a configuration, the shared memory must persist across the
> +# reboot, such as by backing it with a dax device.
>  #
> -# cpr-reboot may not be used with postcopy, colo, or background-snapshot.
> +# @cpr-reboot may not be used with postcopy, background-snapshot,
> +# or COLO.
>  #
>  # (since 8.2)
>  ##

With that
Reviewed-by: Markus Armbruster 

Thanks!

Re: [PATCH] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread Sven Schnelle

BALATON Zoltan  writes:

> On Thu, 29 Feb 2024, Sven Schnelle wrote:
>> BALATON Zoltan  writes:
>> Yes, i changed it to describe the new method:
>>
>>/*
>> * Some windows drivers make the device spin waiting for a memory location
>> * to change. If we have executed more than LSI_MAX_INSN instructions then
>> * assume this is the case and start a timer. Until the timer fires, the
>> * host CPU has a chance to run and change the memory location.
>
> Is that the host or guest CPU? I thought the guest vcpu needs to get a
> chance to do something about this but maybe I did not get the problem
> at all.

Of course it's the guest CPU. I'll wait for other review comments and
fix it in the next version.

Thanks!
Sven

Re: [External] Re: [PATCH v3 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-29 Thread Markus Armbruster

Hao Xiang  writes:

> On Wed, Feb 28, 2024 at 10:01 PM Markus Armbruster  wrote:
>>
>> Hao Xiang  writes:
>>
>> > On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster  
>> > wrote:
>> >>
>> >> Hao Xiang  writes:
>> >>
>> >> > This change extends the MigrationStatus interface to track zero pages
>> >> > and zero bytes counter.
>> >> >
>> >> > Signed-off-by: Hao Xiang 
>> >>
>> >> [...]
>> >>
>> >> > diff --git a/qapi/migration.json b/qapi/migration.json
>> >> > index a0a85a0312..171734c07e 100644
>> >> > --- a/qapi/migration.json
>> >> > +++ b/qapi/migration.json
>> >> > @@ -63,6 +63,10 @@
>> >> >  # between 0 and @dirty-sync-count * @multifd-channels.  (since
>> >> >  # 7.1)
>> >> >  #
>> >> > +# @zero-pages: number of zero pages (since 9.0)
>> >> > +#
>> >> > +# @zero-bytes: number of zero bytes sent (since 9.0)
>> >> > +#
>> >>
>> >> Awfully terse.  How are these two related?
>> >
>> > Sorry I forgot to address the same feedback from the last version.
>>
>> Happens :)
>>
>> > zero-pages are the number of pages being detected as all "zero" and
>> > hence the payload isn't sent over the network. zero-bytes is basically
>> > zero-pages * page_size. It's the number of bytes migrated (but not
>> > actually sent through the network) because they are all "zero". These
>> > two are related to the existing interface below. normal and
>> > normal-bytes are the same representation of pages who are not all
>> > "zero" and are actually sent through the network.
>> >
>> > # @normal: number of normal pages (since 1.2)
>> > #
>> > # @normal-bytes: number of normal bytes sent (since 1.2)
>>
>> We also have
>>
>>   # @duplicate: number of duplicate (zero) pages (since 1.2)
>>   #
>>   # @skipped: number of skipped zero pages. Always zero, only provided for
>>   # compatibility (since 1.5)
>>
>> Page skipping was introduced in 1.5, and withdrawn in 1.5.3 and 1.6.
>> @skipped was formally deprecated in 8.1.  It'll soon be gone, no need to
>> worry about it now.
>>
>> That leaves three values related to pages sent: @normal (and
>> @normal-bytes), @duplicate (but no @duplicate-bytes), and @zero-pages
>> (and @zero-bytes).
>>
>> I unwittingly created a naming inconsistency between @normal,
>> @duplicate, and @zero-pages when I asked you to rename @zero to
>> @zero-pages.
>>
>> The meaning of the three values is not obvious, and the doc comments
>> don't explain them.  Can you, or anybody familiar with migration,
>> explain them to me?
>>
>> MigrationStats return some values as bytes, some as pages, and some as
>> both.  I hate that.  Can we standardize on bytes?
>
> I added zero/zero-bytes because I thought they were not there. But it
> turns out "duplicate" is for that purpose. "zero/zero-bytes" is really
> additional information to "normal/normal-bytes". Peter suggested that
> if we add "zero/zero-bytes" we can slowly retire "duplicate" at a
> later point.

"zero" is a better name than "duplicate".  Identical non-zero pages are
possible, and they are duplicates, too.

If you add @zero with the intent to replace @duplicate, you should
immediately deprecate @duplicate.  If you need assistance with that,
just ask.

> I don't know the historical reason why pages/bytes are used the way it
> is today. The way I understand migration, the granularity of ram
> migration is "page". There are only two types of pages 1) normal 2)
> zero. Zero pages' playload are not sent through the network because we
> already know what it looks like. Only the page offset is sent. Normal
> pages are pages that are not zero. The entire page is sent through the
> network to the target host.

This is not at all clear from the documentation of MigrationStats.  I
think the documentation needs improvement there.

> if a user knows the zero/normal count,
> they can already calculate the zero-bytes/normal-bytes (zero/normal *
> page size)

Yes, because member @page-size tells them the multiplier.

>but it's just convenient to see both. During development, I
> check on these counters a lot and they are useful.

QMP is for machines.  Machines don't need or want the same quantity in
two units.  Providing them both bytes and pages is a design mistake.
Whether it's worth correcting now is of course debatable.

Regardless, the fact @normal-bytes = @normal * @page-size needs to be
documented.  We have

# @page-size: The number of bytes per page for the various page-based
# statistics (since 2.10)

The fact that I inquired how zero-pages and zero-bytes are related might
indicate that this isn't quite clear enough.

[...]

Re: [PATCH 3/3] physmem: Fix wrong MR in large address_space_read/write_cached_slow()

2024-02-29 Thread Peter Xu

On Thu, Feb 15, 2024 at 02:28:17PM +, Jonathan Cameron wrote:

Can we rename the subject?

  physmem: Fix wrong MR in large address_space_read/write_cached_slow()

IMHO "wrong MR" is misleading, as the MR was wrong only because the address
passed over is wrong at the first place.  Perhaps s/MR/addr/?

> If the access is bigger than the MemoryRegion supports,
> flatview_read/write_continue() will attempt to update the Memory Region.
> but the address passed to flatview_translate() is relative to the cache, not
> to the FlatView.
> 
> On arm/virt with interleaved CXL memory emulation and virtio-blk-pci this
> lead to the first part of descriptor being read from the CXL memory and the
> second part from PA 0x8 which happens to be a blank region
> of a flash chip and all ffs on this particular configuration.
> Note this test requires the out of tree ARM support for CXL, but
> the problem is more general.
> 
> Avoid this by adding new address_space_read_continue_cached()
> and address_space_write_continue_cached() which share all the logic
> with the flatview versions except for the MemoryRegion lookup.
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  system/physmem.c | 78 
>  1 file changed, 72 insertions(+), 6 deletions(-)
> 

[...]

> +/* Called within RCU critical section.  */
> +static MemTxResult address_space_read_continue_cached(MemoryRegionCache 
> *cache,
> +  hwaddr addr,
> +  MemTxAttrs attrs,
> +  void *ptr, hwaddr len,
> +  hwaddr addr1, hwaddr l,
> +  MemoryRegion *mr)

It looks like "addr" (of flatview AS) is not needed for a cached RW (see
below), because we should have a stable (cached) MR to operate anyway?

How about we also use "mr_addr" as the single addr of the MR, then drop
addr1?

> +{
> +MemTxResult result = MEMTX_OK;
> +uint8_t *buf = ptr;
> +
> +fuzz_dma_read_cb(addr, len, mr);
> +for (;;) {
> +

Remove empty line?

> +result |= flatview_read_continue_step(addr, attrs, buf, len, addr1,
> +  &l, mr);
> +len -= l;
> +buf += l;
> +addr += l;
> +
> +if (!len) {
> +break;
> +}
> +l = len;
> +
> +mr = address_space_translate_cached(cache, addr, &addr1, &l, false,
> +attrs);

Here IIUC the mr will always be the same as before?  If so, maybe "mr_addr
+= l" should be enough?

(similar comment applies to the writer side too)

> +}
> +
> +return result;
> +}
> +
>  /* Called from RCU critical section. address_space_read_cached uses this
>   * out of line function when the target is an MMIO or IOMMU region.
>   */
> @@ -3390,9 +3456,9 @@ address_space_read_cached_slow(MemoryRegionCache 
> *cache, hwaddr addr,
>  l = len;
>  mr = address_space_translate_cached(cache, addr, &addr1, &l, false,
>  MEMTXATTRS_UNSPECIFIED);
> -return flatview_read_continue(cache->fv,
> -  addr, MEMTXATTRS_UNSPECIFIED, buf, len,
> -  addr1, l, mr);
> +return address_space_read_continue_cached(cache, addr,
> +  MEMTXATTRS_UNSPECIFIED, buf, 
> len,
> +  addr1, l, mr);
>  }
>  
>  /* Called from RCU critical section. address_space_write_cached uses this
> @@ -3408,9 +3474,9 @@ address_space_write_cached_slow(MemoryRegionCache 
> *cache, hwaddr addr,
>  l = len;
>  mr = address_space_translate_cached(cache, addr, &addr1, &l, true,
>  MEMTXATTRS_UNSPECIFIED);
> -return flatview_write_continue(cache->fv,
> -   addr, MEMTXATTRS_UNSPECIFIED, buf, len,
> -   addr1, l, mr);
> +return address_space_write_continue_cached(cache, addr,
> +   MEMTXATTRS_UNSPECIFIED,
> +   buf, len, addr1, l, mr);
>  }
>  
>  #define ARG1_DECLMemoryRegionCache *cache
> -- 
> 2.39.2
> 

-- 
Peter Xu

Re: [PATCH 0/2] Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"

2024-02-29 Thread Michael S. Tsirkin

On Thu, Feb 29, 2024 at 07:38:06PM +, Bernhard Beschow wrote:
> 
> 
> Am 26. Februar 2024 21:59:07 UTC schrieb Bernhard Beschow :
> >As reported by Volker [1], commit 6f6ad2b24582 "hw/i386/pc: Confine system
> >
> >flash handling to pc_sysfw" causes a regression when specifying the property
> >
> >`-M pflash0` in the PCI PC machines:
> >
> >  qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
> >
> >Revert the commit for now until a solution is found.
> >
> 
> Ping

tagged, thanks!

> >
> >
> >Best regards,
> >
> >Bernhard
> >
> >
> >
> >[1] 
> >https://lore.kernel.org/qemu-devel/9e82a04b-f2c1-4e34-b4b6-46a0581b5...@t-online.de/
> >
> >
> >
> >Bernhard Beschow (2):
> >
> >  Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove
> >
> >it"
> >
> >  Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
> >
> >
> >
> > include/hw/i386/pc.h |  2 ++
> >
> > hw/i386/pc.c |  1 +
> >
> > hw/i386/pc_piix.c|  1 +
> >
> > hw/i386/pc_sysfw.c   | 17 +
> >
> > 4 files changed, 17 insertions(+), 4 deletions(-)
> >
> >
> >
> >-- >
> >2.44.0
> >
> >
> >

Re: [PATCH 2/3] physmem: Factor out body of flatview_read/write_continue() loop

2024-02-29 Thread Peter Xu

On Fri, Mar 01, 2024 at 01:29:04PM +0800, Peter Xu wrote:
> On Thu, Feb 15, 2024 at 02:28:16PM +, Jonathan Cameron wrote:
> > This code will be reused for the address_space_cached accessors
> > shortly.
> > 
> > Also reduce scope of result variable now we aren't directly
> > calling this in the loop.
> > 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >  system/physmem.c | 165 ---
> >  1 file changed, 98 insertions(+), 67 deletions(-)
> > 
> > diff --git a/system/physmem.c b/system/physmem.c
> > index 39b5ac751e..74f92bb3b8 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -2677,6 +2677,54 @@ static bool flatview_access_allowed(MemoryRegion 
> > *mr, MemTxAttrs attrs,
> >  return false;
> >  }
> >  
> > +static MemTxResult flatview_write_continue_step(hwaddr addr,

One more thing: this addr var is not used, afaict.  We could drop addr1
below and use this to represent the MR offset.

I'm wondering whether we should start to use some better namings already
for memory API functions to show obviously what AS it is describing.  From
that POV, perhaps rename it to "mr_addr"?

> > +MemTxAttrs attrs,
> > +const uint8_t *buf,
> > +hwaddr len, hwaddr addr1,
> > +hwaddr *l, MemoryRegion 
> > *mr)
> > +{
> > +if (!flatview_access_allowed(mr, attrs, addr1, *l)) {
> > +return MEMTX_ACCESS_ERROR;
> > +}
> > +
> > +if (!memory_access_is_direct(mr, true)) {
> > +uint64_t val;
> > +MemTxResult result;
> > +bool release_lock = prepare_mmio_access(mr);
> > +
> > +*l = memory_access_size(mr, *l, addr1);
> > +/* XXX: could force current_cpu to NULL to avoid
> > +   potential bugs */
> > +
> > +/*
> > + * Assure Coverity (and ourselves) that we are not going to OVERRUN
> > + * the buffer by following ldn_he_p().
> > + */
> > +#ifdef QEMU_STATIC_ANALYSIS
> > +assert((*l == 1 && len >= 1) ||
> > +   (*l == 2 && len >= 2) ||
> > +   (*l == 4 && len >= 4) ||
> > +   (*l == 8 && len >= 8));
> > +#endif
> > +val = ldn_he_p(buf, *l);
> > +result = memory_region_dispatch_write(mr, addr1, val,
> > +  size_memop(*l), attrs);
> > +if (release_lock) {
> > +bql_unlock();
> > +}
> > +
> > +return result;
> > +} else {
> > +/* RAM case */
> > +uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, l, 
> > false);
> > +
> > +memmove(ram_ptr, buf, *l);
> > +invalidate_and_set_dirty(mr, addr1, *l);
> > +
> > +return MEMTX_OK;
> > +}
> > +}
> > +
> >  /* Called within RCU critical section.  */
> >  static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> > MemTxAttrs attrs,
> > @@ -2688,42 +2736,9 @@ static MemTxResult flatview_write_continue(FlatView 
> > *fv, hwaddr addr,
> >  const uint8_t *buf = ptr;
> >  
> >  for (;;) {
> > -if (!flatview_access_allowed(mr, attrs, addr1, l)) {
> > -result |= MEMTX_ACCESS_ERROR;
> > -/* Keep going. */
> > -} else if (!memory_access_is_direct(mr, true)) {
> > -uint64_t val;
> > -bool release_lock = prepare_mmio_access(mr);
> > -
> > -l = memory_access_size(mr, l, addr1);
> > -/* XXX: could force current_cpu to NULL to avoid
> > -   potential bugs */
> > -
> > -/*
> > - * Assure Coverity (and ourselves) that we are not going to 
> > OVERRUN
> > - * the buffer by following ldn_he_p().
> > - */
> > -#ifdef QEMU_STATIC_ANALYSIS
> > -assert((l == 1 && len >= 1) ||
> > -   (l == 2 && len >= 2) ||
> > -   (l == 4 && len >= 4) ||
> > -   (l == 8 && len >= 8));
> > -#endif
> > -val = ldn_he_p(buf, l);
> > -result |= memory_region_dispatch_write(mr, addr1, val,
> > -   size_memop(l), attrs);
> > -if (release_lock) {
> > -bql_unlock();
> > -}
> > -
> >  
> > -} else {
> > -/* RAM case */
> > -uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, 
> > &l,
> > -   false);
> > -memmove(ram_ptr, buf, l);
> > -invalidate_and_set_dirty(mr, addr1, l);
> > -}
> > +result |= flatview_write_continue_step(addr, attrs, buf, len, 
> > addr1, &l,
> > +   mr);
> >  
> >  len -= l;
> >  buf += l;
> > @@ -2757,6 +2772,52 @@ stati

Re: [PATCH 2/3] physmem: Factor out body of flatview_read/write_continue() loop

2024-02-29 Thread Peter Xu

On Thu, Feb 15, 2024 at 02:28:16PM +, Jonathan Cameron wrote:
> This code will be reused for the address_space_cached accessors
> shortly.
> 
> Also reduce scope of result variable now we aren't directly
> calling this in the loop.
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  system/physmem.c | 165 ---
>  1 file changed, 98 insertions(+), 67 deletions(-)
> 
> diff --git a/system/physmem.c b/system/physmem.c
> index 39b5ac751e..74f92bb3b8 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -2677,6 +2677,54 @@ static bool flatview_access_allowed(MemoryRegion *mr, 
> MemTxAttrs attrs,
>  return false;
>  }
>  
> +static MemTxResult flatview_write_continue_step(hwaddr addr,
> +MemTxAttrs attrs,
> +const uint8_t *buf,
> +hwaddr len, hwaddr addr1,
> +hwaddr *l, MemoryRegion *mr)
> +{
> +if (!flatview_access_allowed(mr, attrs, addr1, *l)) {
> +return MEMTX_ACCESS_ERROR;
> +}
> +
> +if (!memory_access_is_direct(mr, true)) {
> +uint64_t val;
> +MemTxResult result;
> +bool release_lock = prepare_mmio_access(mr);
> +
> +*l = memory_access_size(mr, *l, addr1);
> +/* XXX: could force current_cpu to NULL to avoid
> +   potential bugs */
> +
> +/*
> + * Assure Coverity (and ourselves) that we are not going to OVERRUN
> + * the buffer by following ldn_he_p().
> + */
> +#ifdef QEMU_STATIC_ANALYSIS
> +assert((*l == 1 && len >= 1) ||
> +   (*l == 2 && len >= 2) ||
> +   (*l == 4 && len >= 4) ||
> +   (*l == 8 && len >= 8));
> +#endif
> +val = ldn_he_p(buf, *l);
> +result = memory_region_dispatch_write(mr, addr1, val,
> +  size_memop(*l), attrs);
> +if (release_lock) {
> +bql_unlock();
> +}
> +
> +return result;
> +} else {
> +/* RAM case */
> +uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, l, 
> false);
> +
> +memmove(ram_ptr, buf, *l);
> +invalidate_and_set_dirty(mr, addr1, *l);
> +
> +return MEMTX_OK;
> +}
> +}
> +
>  /* Called within RCU critical section.  */
>  static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> MemTxAttrs attrs,
> @@ -2688,42 +2736,9 @@ static MemTxResult flatview_write_continue(FlatView 
> *fv, hwaddr addr,
>  const uint8_t *buf = ptr;
>  
>  for (;;) {
> -if (!flatview_access_allowed(mr, attrs, addr1, l)) {
> -result |= MEMTX_ACCESS_ERROR;
> -/* Keep going. */
> -} else if (!memory_access_is_direct(mr, true)) {
> -uint64_t val;
> -bool release_lock = prepare_mmio_access(mr);
> -
> -l = memory_access_size(mr, l, addr1);
> -/* XXX: could force current_cpu to NULL to avoid
> -   potential bugs */
> -
> -/*
> - * Assure Coverity (and ourselves) that we are not going to 
> OVERRUN
> - * the buffer by following ldn_he_p().
> - */
> -#ifdef QEMU_STATIC_ANALYSIS
> -assert((l == 1 && len >= 1) ||
> -   (l == 2 && len >= 2) ||
> -   (l == 4 && len >= 4) ||
> -   (l == 8 && len >= 8));
> -#endif
> -val = ldn_he_p(buf, l);
> -result |= memory_region_dispatch_write(mr, addr1, val,
> -   size_memop(l), attrs);
> -if (release_lock) {
> -bql_unlock();
> -}
> -
>  
> -} else {
> -/* RAM case */
> -uint8_t *ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l,
> -   false);
> -memmove(ram_ptr, buf, l);
> -invalidate_and_set_dirty(mr, addr1, l);
> -}
> +result |= flatview_write_continue_step(addr, attrs, buf, len, addr1, 
> &l,
> +   mr);
>  
>  len -= l;
>  buf += l;
> @@ -2757,6 +2772,52 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr 
> addr, MemTxAttrs attrs,
> addr1, l, mr);
>  }
>  
> +static MemTxResult flatview_read_continue_step(hwaddr addr,
> +   MemTxAttrs attrs, uint8_t 
> *buf,
> +   hwaddr len, hwaddr addr1,
> +   hwaddr *l,
> +   MemoryRegion *mr)
> +{
> +if (!flatview_access_allowed(mr, attrs, addr1, *l)) {
> +return  MEMTX_ACCESS_ERROR;
  |

Re: [PATCH 1/3] physmem: Reduce local variable scope in flatview_read/write_continue()

2024-02-29 Thread Peter Xu

On Thu, Feb 15, 2024 at 02:28:15PM +, Jonathan Cameron wrote:
> Precursor to factoring out the inner loops for reuse.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH 0/2] *** Properly apply migration compression level

2024-02-29 Thread Peter Xu

On Fri, Mar 01, 2024 at 03:58:59AM +, Bryan Zhang wrote:
> From: Bryan Zhang 
> 
> There is some glue code missing, such that the
> `qmp_migrate_set_parameters` function does not properly update the
> `multifd_zstd_level` and `multifd_zlib_level` parameters. This patch
> adds the glue code and also adds some function calls to the existing
> migration tests to make sure the set_parameters function gets tested.
> 
> Bryan Zhang (2):
>   migration: Properly apply migration compression level parameters
>   tests/migration: Set compression level in migration tests

queued, thanks!

-- 
Peter Xu

[PATCH 1/2] migration: Properly apply migration compression level parameters

2024-02-29 Thread Bryan Zhang

From: Bryan Zhang 

Some glue code was missing, so that using `qmp_migrate_set_parameters`
to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This
commit adds the glue code to fix that.

Signed-off-by: Bryan Zhang 
---
 migration/options.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..1cd3cc7c33 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1312,6 +1312,12 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_multifd_compression) {
 dest->multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+dest->multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+dest->multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
@@ -1447,6 +1453,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_multifd_compression) {
 s->parameters.multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+s->parameters.multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+s->parameters.multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
 xbzrle_cache_resize(params->xbzrle_cache_size, errp);
-- 
2.30.2

[PATCH 2/2] tests/migration: Set compression level in migration tests

2024-02-29 Thread Bryan Zhang

From: Bryan Zhang 

Adds calls to set compression level for `zstd` and `zlib` migration
tests, just to make sure that the calls work.

Signed-off-by: Bryan Zhang 
---
 tests/qtest/migration-test.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8a5bb1752e..f23e860547 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2625,6 +2625,13 @@ static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
 {
+/*
+ * Overloading this test to also check that set_parameter does not error.
+ * This is also done in the tests for the other compression methods.
+ */
+migrate_set_parameter_int(from, "multifd-zlib-level", 2);
+migrate_set_parameter_int(to, "multifd-zlib-level", 2);
+
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zlib");
 }
 
@@ -2633,6 +2640,9 @@ static void *
 test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from,
 QTestState *to)
 {
+migrate_set_parameter_int(from, "multifd-zstd-level", 2);
+migrate_set_parameter_int(to, "multifd-zstd-level", 2);
+
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zstd");
 }
 #endif /* CONFIG_ZSTD */
-- 
2.30.2

[PATCH 0/2] *** Properly apply migration compression level

2024-02-29 Thread Bryan Zhang

From: Bryan Zhang 

There is some glue code missing, such that the
`qmp_migrate_set_parameters` function does not properly update the
`multifd_zstd_level` and `multifd_zlib_level` parameters. This patch
adds the glue code and also adds some function calls to the existing
migration tests to make sure the set_parameters function gets tested.

Bryan Zhang (2):
  migration: Properly apply migration compression level parameters
  tests/migration: Set compression level in migration tests

 migration/options.c  | 12 
 tests/qtest/migration-test.c | 10 ++
 2 files changed, 22 insertions(+)

-- 
2.30.2

Re: [PATCH v4 08/10] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response

2024-02-29 Thread fan

On Tue, Feb 27, 2024 at 10:39:09AM +, Jonathan Cameron wrote:
> On Mon, 26 Feb 2024 17:01:22 -0800
> fan  wrote:
> 
> > On Mon, Feb 26, 2024 at 06:04:17PM +, Jonathan Cameron wrote:
> > > On Wed, 21 Feb 2024 10:16:01 -0800
> > > nifan@gmail.com wrote:
> > >   
> > > > From: Fan Ni 
> > > > 
> > > > Per CXL spec 3.1, two mailbox commands are implemented:
> > > > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> > > > Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> > > > 
> > > > Signed-off-by: Fan Ni   
> > > 
> > > Hi Fan, 
> > > 
> > > Comments on this are all about corner cases. If we can I think we need
> > > to cover a few more.  Linux won't hit them (I think) so it will be
> > > a bit of a pain to test but maybe raw commands enabled and some
> > > userspace code will let us exercise the corner cases?
> > > 
> > > Jonathan
> > > 
> > > 
> > >   
> > > > +
> > > > +/*
> > > > + * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (opcode 
> > > > 4803h)
> > > > + */
> > > > +static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
> > > > +  uint8_t *payload_in,
> > > > +  size_t len_in,
> > > > +  uint8_t *payload_out,
> > > > +  size_t *len_out,
> > > > +  CXLCCI *cci)
> > > > +{
> > > > +CXLUpdateDCExtentListInPl *in = (void *)payload_in;
> > > > +CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
> > > > +CXLDCExtentList *extent_list = &ct3d->dc.extents;
> > > > +CXLDCExtent *ent;
> > > > +uint32_t i;
> > > > +uint64_t dpa, len;
> > > > +CXLRetCode ret;
> > > > +
> > > > +if (in->num_entries_updated == 0) {
> > > > +return CXL_MBOX_INVALID_INPUT;
> > > > +}
> > > > +
> > > > +ret = cxl_detect_malformed_extent_list(ct3d, in);
> > > > +if (ret != CXL_MBOX_SUCCESS) {
> > > > +return ret;
> > > > +}
> > > > +
> > > > +for (i = 0; i < in->num_entries_updated; i++) {
> > > > +bool found = false;
> > > > +
> > > > +dpa = in->updated_entries[i].start_dpa;
> > > > +len = in->updated_entries[i].len;
> > > > +
> > > > +QTAILQ_FOREACH(ent, extent_list, node) {
> > > > +if (ent->start_dpa <= dpa &&
> > > > +dpa + len <= ent->start_dpa + ent->len) {
> > > > +/*
> > > > + * If an incoming extent covers a portion of an extent
> > > > + * in the device extent list, remove only the 
> > > > overlapping
> > > > + * portion, meaning
> > > > + * 1. the portions that are not covered by the incoming
> > > > + *extent at both end of the original extent will 
> > > > become
> > > > + *new extents and inserted to the extent list; and
> > > > + * 2. the original extent is removed from the extent 
> > > > list;
> > > > + * 3. dc extent count is updated accordingly.
> > > > + */
> > > > +uint64_t ent_start_dpa = ent->start_dpa;
> > > > +uint64_t ent_len = ent->len;
> > > > +uint64_t len1 = dpa - ent_start_dpa;
> > > > +uint64_t len2 = ent_start_dpa + ent_len - dpa - len;
> > > > +
> > > > +found = true;
> > > > +cxl_remove_extent_from_extent_list(extent_list, ent);
> > > > +ct3d->dc.total_extent_count -= 1;
> > > > +
> > > > +if (len1) {
> > > > +cxl_insert_extent_to_extent_list(extent_list,
> > > > + ent_start_dpa, 
> > > > len1,
> > > > + NULL, 0);
> > > > +ct3d->dc.total_extent_count += 1;
> > > > +}
> > > > +if (len2) {
> > > > +cxl_insert_extent_to_extent_list(extent_list, dpa 
> > > > + len,
> > > > + len2, NULL, 0);
> > > > +ct3d->dc.total_extent_count += 1;  
> > > 
> > > There is a non zero chance that we'll overflow however many extents we 
> > > claim
> > > to support. So we need to check that and fail the remove if it happens.
> > > Could ignore this for now though as that value is (I think!) conservative
> > > to allow for complex extent list tracking implementations.  Succeeding
> > > when a naive solution would fail due to running out of extents that it can
> > > manage is not (I think) a bug.  
> > 
> > Yeah. spec r3.1 mentioned about the overflow issue that adding/releasing
> > extent requests can raise. We should fail the operation if running out of
> > extents and report resource exhausted.
> > 
> > >   
> > > > +}
> > > > +break;
> > > > +/*Currently

Re: [RFC PATCH v5 12/22] target/arm: Handle NMI in arm_cpu_do_interrupt_aarch64()

2024-02-29 Thread Jinjie Ruan via




On 2024/3/1 7:09, Richard Henderson wrote:
> On 2/29/24 03:10, Jinjie Ruan via wrote:
>> According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
>> with superpriority is always IRQ, never FIQ, so the NMI exception trap
>> entry
>> behave like IRQ. However, VNMI can be IRQ or FIQ, FIQ can only come from
>> hcrx_el2.HCRX_VFNMI bit, IRQ can be raised from the GIC or come from the
>> hcrx_el2.HCRX_VINMI bit.
>>
>> Signed-off-by: Jinjie Ruan 
>> ---
>> v4:
>> - Also handle VNMI in arm_cpu_do_interrupt_aarch64().
>> v3:
>> - Remove the FIQ NMI handle.
>> ---
>>   target/arm/helper.c | 9 +
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index b796dbdf21..bd34b3506a 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -11459,12 +11459,21 @@ static void
>> arm_cpu_do_interrupt_aarch64(CPUState *cs)
>>   break;
>>   case EXCP_IRQ:
>>   case EXCP_VIRQ:
>> +    case EXCP_NMI:
>>   addr += 0x80;
>>   break;
>>   case EXCP_FIQ:
>>   case EXCP_VFIQ:
>>   addr += 0x100;
>>   break;
>> +    case EXCP_VNMI:
>> +    if (env->irq_line_state & CPU_INTERRUPT_VNMI ||
>> +    env->cp15.hcrx_el2 & HCRX_VINMI) {
>> +    addr += 0x80;
>> +    } else if (env->cp15.hcrx_el2 & HCRX_VFNMI) {
>> +    addr += 0x100;
>> +    }
>> +    break;
> 
> By not combining VFNMI with CPU_INTERRUPT_VNMI, you don't need this
> complication.
> Just
> 
>  case EXCP_IRQ:
>  case EXCP_VIRQ:
> +    case EXCP_NMI:

Not understand it. both VIRQ and VFIQ will set CPU_INTERRUPT_VNMI and
cause EXCP_VNMI if they have Superpriority, the distinction jump here is
necessary.

> 
> 
> r~

[PATCH v4 3/7] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.

2024-02-29 Thread Hao Xiang

1. Add a dedicated handler for MigrationOps::ram_save_target_page in
multifd live migration.
2. Refactor ram_save_target_page_legacy so that the legacy and multifd
handlers don't have internal functions calling into each other.

Signed-off-by: Hao Xiang 
Reviewed-by: Fabiano Rosas 
Message-Id: <20240226195654.934709-4-hao.xi...@bytedance.com>
---
 migration/ram.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e1fa229acf..f9d6ea65cc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1122,10 +1122,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss,
 QEMUFile *file = pss->pss_channel;
 int len = 0;
 
-if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_NONE) {
-return 0;
-}
-
 if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
 return 0;
 }
@@ -2045,7 +2041,6 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  */
 static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
 {
-RAMBlock *block = pss->block;
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
@@ -2061,17 +2056,34 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
+return ram_save_page(rs, pss);
+}
+
+/**
+ * ram_save_target_page_multifd: send one target page to multifd workers
+ *
+ * Returns 1 if the page was queued, -1 otherwise.
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+RAMBlock *block = pss->block;
+ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
 /*
- * Do not use multifd in postcopy as one whole host page should be
- * placed.  Meanwhile postcopy requires atomic update of pages, so even
- * if host page size == guest page size the dest guest during run may
- * still see partially copied pages which is data corruption.
+ * Backward compatibility support. While using multifd live
+ * migration, we still need to handle zero page checking on the
+ * migration main thread.
  */
-if (migrate_multifd() && !migration_in_postcopy()) {
-return ram_save_multifd_page(block, offset);
+if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+if (save_zero_page(rs, pss, offset)) {
+return 1;
+}
 }
 
-return ram_save_page(rs, pss);
+return ram_save_multifd_page(block, offset);
 }
 
 /* Should be called before sending a host page */
@@ -2983,7 +2995,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
-migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+
+if (migrate_multifd()) {
+migration_ops->ram_save_target_page = ram_save_target_page_multifd;
+} else {
+migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+}
 
 bql_unlock();
 ret = multifd_send_sync_main();
-- 
2.30.2

[PATCH v4 5/7] migration/multifd: Add new migration test cases for legacy zero page checking.

2024-02-29 Thread Hao Xiang

Now that zero page checking is done on the multifd sender threads by
default, we still provide an option for backward compatibility. This
change adds a qtest migration test case to set the zero-page-detection
option to "legacy" and run multifd migration with zero page checking on the
migration main thread.

Signed-off-by: Hao Xiang 
---
 tests/qtest/migration-test.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 83512bce85..8a966364b5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2660,6 +2660,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
 }
 
+static void *
+test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
+QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "legacy");
+return NULL;
+}
+
+static void *
+test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
+  QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "none");
+return NULL;
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
@@ -2691,6 +2709,36 @@ static void test_multifd_tcp_none(void)
 test_precopy_common(&args);
 }
 
+static void test_multifd_tcp_zero_page_legacy(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common(&args);
+}
+
+static void test_multifd_tcp_no_zero_page(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common(&args);
+}
+
 static void test_multifd_tcp_zlib(void)
 {
 MigrateCommon args = {
@@ -3592,6 +3640,10 @@ int main(int argc, char **argv)
 }
 migration_test_add("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/legacy",
+   test_multifd_tcp_zero_page_legacy);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/none",
+   test_multifd_tcp_no_zero_page);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
 migration_test_add("/migration/multifd/tcp/plain/zlib",
-- 
2.30.2

[PATCH v4 0/7] Introduce multifd zero page checking.

2024-02-29 Thread Hao Xiang

v4 update:
* Fix documentation for interface ZeroPageDetection.
* Fix implementation in multifd_send_zero_page_check.
* Rebase on top of c0c6a0e3528b88aaad0b9d333e295707a195587b.

v3 update:
* Change "zero" to "zero-pages" and use type size for "zero-bytes".
* Fixed ZeroPageDetection interface description.
* Move zero page unit tests to its own path.
* Removed some asserts.
* Added backward compatibility support for migration 9.0 -> 8.2.
* Removed fields "zero" and "normal" page address arrays from v2. Now
multifd_zero_page_check_send sorts normal/zero pages in the "offset" array.

v2 update:
* Implement zero-page-detection switch with enumeration "legacy",
"none" and "multifd".
* Move normal/zero pages from MultiFDSendParams to MultiFDPages_t.
* Add zeros and zero_bytes accounting.

This patchset is based on Juan Quintela's old series here
https://lore.kernel.org/all/20220802063907.18882-1-quint...@redhat.com/

In the multifd live migration model, there is a single migration main
thread scanning the page map, queuing the pages to multiple multifd
sender threads. The migration main thread runs zero page checking on
every page before queuing the page to the sender threads. Zero page
checking is a CPU intensive task and hence having a single thread doing
all that doesn't scale well. This change introduces a new function
to run the zero page checking on the multifd sender threads. This
patchset also lays the ground work for future changes to offload zero
page checking task to accelerator hardwares.

Use two Intel 4th generation Xeon servers for testing.

Architecture:x86_64
CPU(s):  192
Thread(s) per core:  2
Core(s) per socket:  48
Socket(s):   2
NUMA node(s):2
Vendor ID:   GenuineIntel
CPU family:  6
Model:   143
Model name:  Intel(R) Xeon(R) Platinum 8457C
Stepping:8
CPU MHz: 2538.624
CPU max MHz: 3800.
CPU min MHz: 800.

Perform multifd live migration with below setup:
1. VM has 100GB memory. All pages in the VM are zero pages.
2. Use tcp socket for live migration.
3. Use 4 multifd channels and zero page checking on migration main thread.
4. Use 1/2/4 multifd channels and zero page checking on multifd sender
threads.
5. Record migration total time from sender QEMU console's "info migrate"
command.

++
|zero-page-checking | total-time(ms) |
++
|main-thread| 9629   |
++
|multifd-1-threads  | 6182   |
++
|multifd-2-threads  | 4643   |
++
|multifd-4-threads  | 4143   |
++

Apply this patchset on top of commit
c0c6a0e3528b88aaad0b9d333e295707a195587b

Hao Xiang (7):
  migration/multifd: Add new migration option zero-page-detection.
  migration/multifd: Implement zero page transmission on the multifd
thread.
  migration/multifd: Implement ram_save_target_page_multifd to handle
multifd version of MigrationOps::ram_save_target_page.
  migration/multifd: Enable multifd zero page checking by default.
  migration/multifd: Add new migration test cases for legacy zero page
checking.
  migration/multifd: Add zero pages and zero bytes counter to migration
status interface.
  Update maintainer contact for migration multifd zero page checking
acceleration.

 MAINTAINERS |  5 ++
 hw/core/machine.c   |  4 +-
 hw/core/qdev-properties-system.c| 10 
 include/hw/qdev-properties-system.h |  4 ++
 migration/meson.build   |  1 +
 migration/migration-hmp-cmds.c  | 13 
 migration/migration.c   |  2 +
 migration/multifd-zero-page.c   | 92 +
 migration/multifd-zlib.c| 21 +--
 migration/multifd-zstd.c| 20 +--
 migration/multifd.c | 83 ++
 migration/multifd.h | 24 +++-
 migration/options.c | 21 +++
 migration/options.h |  1 +
 migration/ram.c | 40 +
 migration/trace-events  |  8 +--
 qapi/migration.json | 53 +++--
 tests/migration/guestperf/engine.py |  2 +
 tests/qtest/migration-test.c| 52 
 19 files changed, 412 insertions(+), 44 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

-- 
2.30.2

[PATCH v4 1/7] migration/multifd: Add new migration option zero-page-detection.

2024-02-29 Thread Hao Xiang

This new parameter controls where the zero page checking is running.
1. If this parameter is set to 'legacy', zero page checking is
done in the migration main thread.
2. If this parameter is set to 'none', zero page checking is disabled.

Signed-off-by: Hao Xiang 
---
 hw/core/qdev-properties-system.c| 10 +
 include/hw/qdev-properties-system.h |  4 
 migration/migration-hmp-cmds.c  |  9 
 migration/options.c | 21 ++
 migration/options.h |  1 +
 migration/ram.c |  4 
 qapi/migration.json | 33 ++---
 7 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 1a396521d5..228e685f52 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
 .set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
+const PropertyInfo qdev_prop_zero_page_detection = {
+.name = "ZeroPageDetection",
+.description = "zero_page_detection values, "
+   "none,legacy",
+.enum_table = &ZeroPageDetection_lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- Reserved Region --- */
 
 /*
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 06c359c190..839b170235 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -8,6 +8,7 @@ extern const PropertyInfo qdev_prop_macaddr;
 extern const PropertyInfo qdev_prop_reserved_region;
 extern const PropertyInfo qdev_prop_multifd_compression;
 extern const PropertyInfo qdev_prop_mig_mode;
+extern const PropertyInfo qdev_prop_zero_page_detection;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
@@ -47,6 +48,9 @@ extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
 #define DEFINE_PROP_MIG_MODE(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_mig_mode, \
MigMode)
+#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \
+DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \
+   ZeroPageDetection)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
 LostTickPolicy)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 99b49df5dd..7e96ae6ffd 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -344,6 +344,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESSION),
 MultiFDCompression_str(params->multifd_compression));
+assert(params->has_zero_page_detection);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_ZERO_PAGE_DETECTION),
+qapi_enum_lookup(&ZeroPageDetection_lookup,
+params->zero_page_detection));
 monitor_printf(mon, "%s: %" PRIu64 " bytes\n",
 MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
 params->xbzrle_cache_size);
@@ -634,6 +639,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_multifd_zstd_level = true;
 visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
 break;
+case MIGRATION_PARAMETER_ZERO_PAGE_DETECTION:
+p->has_zero_page_detection = true;
+visit_type_ZeroPageDetection(v, param, &p->zero_page_detection, &err);
+break;
 case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
 p->has_xbzrle_cache_size = true;
 if (!visit_type_size(v, param, &cache_size, &err)) {
diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..3c603391b0 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -179,6 +179,9 @@ Property migration_properties[] = {
 DEFINE_PROP_MIG_MODE("mode", MigrationState,
   parameters.mode,
   MIG_MODE_NORMAL),
+DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
+   parameters.zero_page_detection,
+   ZERO_PAGE_DETECTION_LEGACY),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -903,6 +906,13 @@ uint64_t migrate_xbzrle_cache_size(void)
 return s->parameters.xbzrle_cache_size;
 }
 
+ZeroPageDetection migrate_zero_page_detection(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.zero_page_detection;
+}
+
 /* parameter

[PATCH v4 2/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-02-29 Thread Hao Xiang

1. Add zero_pages field in MultiFDPacket_t.
2. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
3. Added a new value 'multifd' in ZeroPageDetection enumeration.
4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
5. Adds zero page counters and updates multifd send/receive tracing
format to track the newly added counters.

Signed-off-by: Hao Xiang 
---
 hw/core/machine.c|  4 +-
 hw/core/qdev-properties-system.c |  2 +-
 migration/meson.build|  1 +
 migration/multifd-zero-page.c| 92 
 migration/multifd-zlib.c | 21 ++--
 migration/multifd-zstd.c | 20 +--
 migration/multifd.c  | 83 +++-
 migration/multifd.h  | 24 -
 migration/ram.c  |  1 -
 migration/trace-events   |  8 +--
 qapi/migration.json  |  7 ++-
 11 files changed, 230 insertions(+), 33 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ac5d5389a..0e9d646b61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -32,7 +32,9 @@
 #include "hw/virtio/virtio-net.h"
 #include "audio/audio.h"
 
-GlobalProperty hw_compat_8_2[] = {};
+GlobalProperty hw_compat_8_2[] = {
+{ "migration", "zero-page-detection", "legacy"},
+};
 const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
 
 GlobalProperty hw_compat_8_1[] = {
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 228e685f52..6e6f68ae1b 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -682,7 +682,7 @@ const PropertyInfo qdev_prop_mig_mode = {
 const PropertyInfo qdev_prop_zero_page_detection = {
 .name = "ZeroPageDetection",
 .description = "zero_page_detection values, "
-   "none,legacy",
+   "none,legacy,multifd",
 .enum_table = &ZeroPageDetection_lookup,
 .get = qdev_propinfo_get_enum,
 .set = qdev_propinfo_set_enum,
diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..1eeb915ff6 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -22,6 +22,7 @@ system_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zero-page.c',
   'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
new file mode 100644
index 00..e695f6ff7d
--- /dev/null
+++ b/migration/multifd-zero-page.c
@@ -0,0 +1,92 @@
+/*
+ * Multifd zero page detection implementation.
+ *
+ * Copyright (c) 2024 Bytedance Inc
+ *
+ * Authors:
+ *  Hao Xiang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "exec/ramblock.h"
+#include "migration.h"
+#include "multifd.h"
+#include "options.h"
+#include "ram.h"
+
+static bool multifd_zero_page(void)
+{
+return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
+}
+
+static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
+{
+ram_addr_t temp;
+
+if (a == b) {
+return;
+}
+
+temp = pages_offset[a];
+pages_offset[a] = pages_offset[b];
+pages_offset[b] = temp;
+}
+
+/**
+ * multifd_send_zero_page_check: Perform zero page detection on all pages.
+ *
+ * Sorts normal pages before zero pages in p->pages->offset and updates
+ * p->pages->normal_num.
+ *
+ * @param p A pointer to the send params.
+ */
+void multifd_send_zero_page_check(MultiFDSendParams *p)
+{
+MultiFDPages_t *pages = p->pages;
+RAMBlock *rb = pages->block;
+int i = 0;
+int j = pages->num - 1;
+
+/*
+ * QEMU older than 9.0 don't understand zero page
+ * on multifd channel. This switch is required to
+ * maintain backward compatibility.
+ */
+if (multifd_zero_page()) {
+pages->normal_num = pages->num;
+return;
+}
+
+/*
+ * Sort the page offset array by moving all normal pages to
+ * the left and all zero pages to the right of the array.
+ */
+while (i <= j) {
+uint64_t offset = pages->offset[i];
+
+if (!buffer_is_zero(rb->host + offset, p->page_size)) {
+i++;
+continue;
+}
+
+swap_page_offset(pages->offset, i, j);
+ram_release_page(rb->idstr, offset);
+j--;
+}
+
+pages->normal_num = i;
+}
+
+void multifd_recv_zero_page_check(MultiFDRecvParams *p)
+{
+for (int i = 0; i < p->zero_num; i++) {
+void *page = p->host + p->zero[i];
+if (!buffer_is_zero(page, p->page_size)) {
+memset(page, 0, p->page_size);
+}
+}
+}
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..f749e703a9 100644
--- a/migration/multifd-zlib.

[PATCH v4 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-29 Thread Hao Xiang

This change extends the MigrationStatus interface to track zero pages
and zero bytes counter.

Signed-off-by: Hao Xiang 
---
 migration/migration-hmp-cmds.c  |  4 
 migration/migration.c   |  2 ++
 qapi/migration.json | 15 ++-
 tests/migration/guestperf/engine.py |  2 ++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..a38ad0255d 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -111,6 +111,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->ram->normal);
 monitor_printf(mon, "normal bytes: %" PRIu64 " kbytes\n",
info->ram->normal_bytes >> 10);
+monitor_printf(mon, "zero pages: %" PRIu64 " pages\n",
+   info->ram->zero_pages);
+monitor_printf(mon, "zero bytes: %" PRIu64 " kbytes\n",
+   info->ram->zero_bytes >> 10);
 monitor_printf(mon, "dirty sync count: %" PRIu64 "\n",
info->ram->dirty_sync_count);
 monitor_printf(mon, "page size: %" PRIu64 " kbytes\n",
diff --git a/migration/migration.c b/migration/migration.c
index bab68bcbef..fc4e3ef52d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1126,6 +1126,8 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->skipped = 0;
 info->ram->normal = stat64_get(&mig_stats.normal_pages);
 info->ram->normal_bytes = info->ram->normal * page_size;
+info->ram->zero_pages = stat64_get(&mig_stats.zero_pages);
+info->ram->zero_bytes = info->ram->zero_pages * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count =
 stat64_get(&mig_stats.dirty_sync_count);
diff --git a/qapi/migration.json b/qapi/migration.json
index ca9561fbf1..03b850bab7 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -63,6 +63,10 @@
 # between 0 and @dirty-sync-count * @multifd-channels.  (since
 # 7.1)
 #
+# @zero-pages: number of zero pages (since 9.0)
+#
+# @zero-bytes: number of zero bytes sent (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @skipped is always zero since 1.5.3
@@ -81,7 +85,8 @@
'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
'postcopy-bytes': 'uint64',
-   'dirty-sync-missed-zero-copy': 'uint64' } }
+   'dirty-sync-missed-zero-copy': 'uint64',
+   'zero-pages': 'int', 'zero-bytes': 'size' } }
 
 ##
 # @XBZRLECacheStats:
@@ -332,6 +337,8 @@
 #   "duplicate":123,
 #   "normal":123,
 #   "normal-bytes":123456,
+#   "zero-pages":123,
+#   "zero-bytes":123456,
 #   "dirty-sync-count":15
 # }
 #  }
@@ -358,6 +365,8 @@
 # "duplicate":123,
 # "normal":123,
 # "normal-bytes":123456,
+# "zero-pages":123,
+# "zero-bytes":123456,
 # "dirty-sync-count":15
 #  }
 #   }
@@ -379,6 +388,8 @@
 # "duplicate":123,
 # "normal":123,
 # "normal-bytes":123456,
+# "zero-pages":123,
+# "zero-bytes":123456,
 # "dirty-sync-count":15
 #  },
 #  "disk":{
@@ -405,6 +416,8 @@
 # "duplicate":10,
 # "normal":,
 # "normal-bytes":3412992,
+# "zero-pages":,
+# "zero-bytes":3412992,
 # "dirty-sync-count":15
 #  },
 #  "xbzrle-cache":{
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index 608d7270f6..693e07c227 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -92,6 +92,8 @@ def _migrate_progress(self, vm):
 info["ram"].get("skipped", 0),
 info["ram"].get("normal", 0),
 info["ram"].get("normal-bytes", 0),
+info["ram"].get("zero-pages", 0);
+info["ram"].get("zero-bytes", 0);
 info["ram"].get("dirty-pages-rate", 0),
 info["ram"].get("mbps", 0),
 info["ram"].get("dirty-sync-count", 0)
-- 
2.30.2

[PATCH v4 7/7] Update maintainer contact for migration multifd zero page checking acceleration.

2024-02-29 Thread Hao Xiang

Add myself to maintain multifd zero page checking acceleration function.

Signed-off-by: Hao Xiang 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 65dfdc9677..b547918e4d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3414,6 +3414,11 @@ F: tests/migration/
 F: util/userfaultfd.c
 X: migration/rdma*
 
+Migration multifd zero page checking acceleration
+M: Hao Xiang 
+S: Maintained
+F: migration/multifd-zero-page.c
+
 RDMA Migration
 R: Li Zhijian 
 R: Peter Xu 
-- 
2.30.2

[PATCH v4 4/7] migration/multifd: Enable multifd zero page checking by default.

2024-02-29 Thread Hao Xiang

Set default "zero-page-detection" option to "multifd". Now zero page
checking can be done in the multifd threads and this becomes the
default configuration. We still provide backward compatibility
where zero page checking is done from the migration main thread.

Signed-off-by: Hao Xiang 
---
 migration/options.c | 2 +-
 qapi/migration.json | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 3c603391b0..3c79b6ccd4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -181,7 +181,7 @@ Property migration_properties[] = {
   MIG_MODE_NORMAL),
 DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
parameters.zero_page_detection,
-   ZERO_PAGE_DETECTION_LEGACY),
+   ZERO_PAGE_DETECTION_MULTIFD),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/qapi/migration.json b/qapi/migration.json
index 846d0411d5..ca9561fbf1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -903,7 +903,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1100,7 +1100,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1333,7 +1333,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
-- 
2.30.2

Re: [PATCH V2] migration: massage cpr-reboot documentation

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 06:54:22AM -0800, Steve Sistare wrote:
> Re-wrap the cpr-reboot documentation to 70 columns, use '@' for
> cpr-reboot references, capitalize COLO and VFIO, and tweak the
> wording.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Steve Sistare 

queued.

-- 
Peter Xu

Re: [External] Re: [PATCH v3 2/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-02-29 Thread Hao Xiang

On Wed, Feb 28, 2024 at 11:46 AM Fabiano Rosas  wrote:
>
> Hao Xiang  writes:
>
> > 1. Add zero_pages field in MultiFDPacket_t.
> > 2. Implements the zero page detection and handling on the multifd
> > threads for non-compression, zlib and zstd compression backends.
> > 3. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > 4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
> > 5. Adds zero page counters and updates multifd send/receive tracing
> > format to track the newly added counters.
> >
> > Signed-off-by: Hao Xiang 
> > ---
> >  hw/core/machine.c|  4 +-
> >  hw/core/qdev-properties-system.c |  2 +-
> >  migration/meson.build|  1 +
> >  migration/multifd-zero-page.c| 78 ++
> >  migration/multifd-zlib.c | 21 ++--
> >  migration/multifd-zstd.c | 20 ++--
> >  migration/multifd.c  | 83 +++-
> >  migration/multifd.h  | 24 -
> >  migration/ram.c  |  1 -
> >  migration/trace-events   |  8 +--
> >  qapi/migration.json  |  5 +-
> >  11 files changed, 214 insertions(+), 33 deletions(-)
> >  create mode 100644 migration/multifd-zero-page.c
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index fb5afdcae4..746da219a4 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -32,7 +32,9 @@
> >  #include "hw/virtio/virtio-net.h"
> >  #include "audio/audio.h"
> >
> > -GlobalProperty hw_compat_8_2[] = {};
> > +GlobalProperty hw_compat_8_2[] = {
> > +{ "migration", "zero-page-detection", "legacy"},
> > +};
> >  const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
> >
> >  GlobalProperty hw_compat_8_1[] = {
> > diff --git a/hw/core/qdev-properties-system.c 
> > b/hw/core/qdev-properties-system.c
> > index 228e685f52..6e6f68ae1b 100644
> > --- a/hw/core/qdev-properties-system.c
> > +++ b/hw/core/qdev-properties-system.c
> > @@ -682,7 +682,7 @@ const PropertyInfo qdev_prop_mig_mode = {
> >  const PropertyInfo qdev_prop_zero_page_detection = {
> >  .name = "ZeroPageDetection",
> >  .description = "zero_page_detection values, "
> > -   "none,legacy",
> > +   "none,legacy,multifd",
> >  .enum_table = &ZeroPageDetection_lookup,
> >  .get = qdev_propinfo_get_enum,
> >  .set = qdev_propinfo_set_enum,
> > diff --git a/migration/meson.build b/migration/meson.build
> > index 92b1cc4297..1eeb915ff6 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -22,6 +22,7 @@ system_ss.add(files(
> >'migration.c',
> >'multifd.c',
> >'multifd-zlib.c',
> > +  'multifd-zero-page.c',
> >'ram-compress.c',
> >'options.c',
> >'postcopy-ram.c',
> > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > new file mode 100644
> > index 00..1650c41b26
> > --- /dev/null
> > +++ b/migration/multifd-zero-page.c
> > @@ -0,0 +1,78 @@
> > +/*
> > + * Multifd zero page detection implementation.
> > + *
> > + * Copyright (c) 2024 Bytedance Inc
> > + *
> > + * Authors:
> > + *  Hao Xiang 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/cutils.h"
> > +#include "exec/ramblock.h"
> > +#include "migration.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "ram.h"
> > +
> > +static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
> > +{
> > +ram_addr_t temp;
> > +
> > +if (a == b) {
> > +return;
> > +}
> > +
> > +temp = pages_offset[a];
> > +pages_offset[a] = pages_offset[b];
> > +pages_offset[b] = temp;
> > +}
> > +
> > +/**
> > + * multifd_zero_page_check_send: Perform zero page detection on all pages.
> > + *
> > + * Sort the page offset array by moving all normal pages to
> > + * the left and all zero pages to the right of the array.
>
> Let's move this to the loop as a comment. Here it's better to just
> inform about the side effects:
>
> Sorts normal pages before zero pages in p->pages->offset and updates
> p->pages->normal_num.
>
> > + *
> > + * @param p A pointer to the send params.
> > + */
> > +void multifd_zero_page_check_send(MultiFDSendParams *p)
>
> Use multifd_send_zero_page_check for consistency with the rest of the code.
>
> > +{
> > +/*
> > + * QEMU older than 9.0 don't understand zero page
> > + * on multifd channel. This switch is required to
> > + * maintain backward compatibility.
> > + */
> > +bool use_multifd_zero_page =
> > +(migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
>
> Could introduce a helper for this.
>
> static bool multifd_zero_page(void)
> {
> return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
> }
>
> > +MultiFDPages_t *pages = p->pages;
> > +RAMBlock *rb = pages->

Re: [PATCH v6 00/23] migration: File based migration with multifd and mapped-ram

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> Based-on: 74aa0fb297 (migration: options incompatible with cpr) # 
> peterx/migration-next
> 
> Hi,
> 
> In this v6:
> 
> - Minor fixes to 17/23 and 19/23

The whole set looks good to me now.  I plan to queue it before the
direct-io stuff.  Any other comments / concerns from anyone?

Dan, would it be fine I queue the IO patches together?

Thanks,

-- 
Peter Xu

Re: [PATCH v6 22/23] migration/multifd: Add mapped-ram support to fd: URI

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 12:30:16PM -0300, Fabiano Rosas wrote:
> If we receive a file descriptor that points to a regular file, there's
> nothing stopping us from doing multifd migration with mapped-ram to
> that file.
> 
> Enable the fd: URI to work with multifd + mapped-ram.
> 
> Note that the fds passed into multifd are duplicated because we want
> to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The
> original fd doesn't need to be duplicated because monitor_get_fd()
> transfers ownership to the caller.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v6 19/23] migration/multifd: Prepare multifd sync for mapped-ram migration

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 12:30:13PM -0300, Fabiano Rosas wrote:
> The mapped-ram migration can be performed live or non-live, but it is
> always asynchronous, i.e. the source machine and the destination
> machine are not migrating at the same time. We only need some pieces
> of the multifd sync operations.
> 
> multifd_send_sync_main()
> 
>   Issued by the ram migration code on the migration thread, causes the
>   multifd send channels to synchronize with the migration thread and
>   makes the sending side emit a packet with the MULTIFD_FLUSH flag.
> 
>   With mapped-ram we want to maintain the sync on the sending side
>   because that provides ordering between the rounds of dirty pages when
>   migrating live.
> 
> MULTIFD_FLUSH
> -
>   On the receiving side, the presence of the MULTIFD_FLUSH flag on a
>   packet causes the receiving channels to start synchronizing with the
>   main thread.
> 
>   We're not using packets with mapped-ram, so there's no MULTIFD_FLUSH
>   flag and therefore no channel sync on the receiving side.
> 
> multifd_recv_sync_main()
> 
>   Issued by the migration thread when the ram migration flag
>   RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
>   on the receiving side to start synchronizing with the recv
>   channels. Due to compatibility, this is also issued when
>   RAM_SAVE_FLAG_EOS is received.
> 
>   For mapped-ram we only need to synchronize the channels at the end of
>   migration to avoid doing cleanup before the channels have finished
>   their IO.
> 
> Make sure the multifd syncs are only issued at the appropriate times.
> 
> Note that due to pre-existing backward compatibility issues, we have
> the multifd_flush_after_each_section property that can cause a sync to
> happen at EOS. Since the EOS flag is needed on the stream, allow
> mapped-ram to just ignore it.
> 
> Also emit an error if any other unexpected flags are found on the
> stream.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v6 17/23] migration/multifd: Add outgoing QIOChannelFile support

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 12:30:11PM -0300, Fabiano Rosas wrote:
> Allow multifd to open file-backed channels. This will be used when
> enabling the mapped-ram migration stream format which expects a
> seekable transport.
> 
> The QIOChannel read and write methods will use the preadv/pwritev
> versions which don't update the file offset at each call so we can
> reuse the fd without re-opening for every channel.
> 
> Contrary to the socket migration, the file migration doesn't need an
> asynchronous channel creation process, so expose
> multifd_channel_connect() and call it directly.
> 
> Note that this is just setup code and multifd cannot yet make use of
> the file channels.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH 14/17] migration/option: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 10:39:11PM +0800, Zhao Liu wrote:
> From: Zhao Liu 
> 
> As the comment in qapi/error, passing @errp to error_prepend() requires
> ERRP_GUARD():
> 
> * = Why, when and how to use ERRP_GUARD() =
> *
> * Without ERRP_GUARD(), use of the @errp parameter is restricted:
> ...
> * - It should not be passed to error_prepend(), error_vprepend() or
> *   error_append_hint(), because that doesn't work with &error_fatal.
> * ERRP_GUARD() lifts these restrictions.
> *
> * To use ERRP_GUARD(), add it right at the beginning of the function.
> * @errp can then be used without worrying about the argument being
> * NULL or &error_fatal.
> 
> ERRP_GUARD() could avoid the case when @errp is the pointer of
> error_fatal, the user can't see this additional information, because
> exit() happens in error_setg earlier than information is added [1].
> 
> The migrate_params_check() passes @errp to error_prepend() without
> ERRP_GUARD(), and it could be called from migration_object_init(),
> where the passed @errp points to @error_fatal.
> 
> Therefore, the error message echoed in error_prepend() will be lost
> because of the above issue.
> 
> To fix this, add missing ERRP_GUARD() at the beginning of this function.
> 
> [1]: Issue description in the commit message of commit ae7c80a7bd73
>  ("error: New macro ERRP_GUARD()").
> 
> Cc: Peter Xu 
> Cc: Fabiano Rosas 
> Signed-off-by: Zhao Liu 

Acked-by: Peter Xu 

-- 
Peter Xu

RE: [PATCH v1 3/8] aspeed/sdmc: Add AST2700 support

2024-02-29 Thread Jamin Lin

> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Thursday, February 29, 2024 5:17 PM
> To: Jamin Lin ; Cédric Le Goater ;
> Peter Maydell ; Andrew Jeffery
> ; Joel Stanley ; Alistair
> Francis ; open list:ASPEED BMCs
> ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v1 3/8] aspeed/sdmc: Add AST2700 support
> 
> Hi Jamin,
> 
> On 29/2/24 08:23, Jamin Lin via wrote:
> > The SDRAM memory controller(DRAMC) controls the access to external
> > DDR4 and DDR5 SDRAM and power up to DDR4 and DDR5 PHY.
> >
> > The DRAM memory controller of AST2700 is not backward compatible to
> > previous chips such AST2600, AST2500 and AST2400.
> >
> > Max memory is now 8GiB on the AST2700. Introduce new
> aspeed_2700_sdmc
> > and class with read/write operation and reset handlers.
> >
> > Define DRAMC necessary protected registers and unprotected registers
> > for AST2700 and increase the register set to 0x1000.
> >
> > Signed-off-by: Troy Lee 
> > Signed-off-by: Jamin Lin 
> > ---
> >   hw/misc/aspeed_sdmc.c | 215
> ++
> >   include/hw/misc/aspeed_sdmc.h |   4 +-
> >   2 files changed, 198 insertions(+), 21 deletions(-)
> 
> 
> > @@ -231,7 +270,10 @@ static void aspeed_sdmc_realize(DeviceState *dev,
> Error **errp)
> >   AspeedSDMCState *s = ASPEED_SDMC(dev);
> >   AspeedSDMCClass *asc = ASPEED_SDMC_GET_CLASS(s);
> >
> > -assert(asc->max_ram_size < 4 * GiB); /* 32-bit address bus */
> > +if (!asc->is_aarch64) {
> 
> Maybe name it 'bus64bit'? Because this isn't really related to Aarch64.
> 
> > +assert(asc->max_ram_size < 4 * GiB); /* 32-bit address bus */
> > +}
> 
> 
> > +static void aspeed_2700_sdmc_class_init(ObjectClass *klass, void
> > +*data) {
> > +DeviceClass *dc = DEVICE_CLASS(klass);
> > +AspeedSDMCClass *asc = ASPEED_SDMC_CLASS(klass);
> > +
> > +dc->desc = "ASPEED 2700 SDRAM Memory Controller";
> > +dc->reset = aspeed_2700_sdmc_reset;
> > +
> > +asc->is_aarch64 = true;
> > +asc->max_ram_size = 8 * GiB;
> > +asc->compute_conf = aspeed_2700_sdmc_compute_conf;
> > +asc->write = aspeed_2700_sdmc_write;
> > +asc->valid_ram_sizes = aspeed_2700_ram_sizes; }
> 
> 
> > @@ -51,6 +52,7 @@ struct AspeedSDMCClass {
> >   const uint64_t *valid_ram_sizes;
> >   uint32_t (*compute_conf)(AspeedSDMCState *s, uint32_t data);
> >   void (*write)(AspeedSDMCState *s, uint32_t reg, uint32_t data);
> > +uint32_t is_aarch64;
> 
> bool.
> 
> >   };
> >
> >   #endif /* ASPEED_SDMC_H */
Thanks for review and will fix.

Re: [PATCH V4 10/14] migration: stop vm for cpr

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:
> On 2/25/2024 9:08 PM, Peter Xu wrote:
> > On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
> >> When migration for cpr is initiated, stop the vm and set state
> >> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
> >> possibility of ram and device state being out of sync, and guarantees
> >> that a guest in the suspended state remains suspended, because qmp_cont
> >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
> >>
> >> Signed-off-by: Steve Sistare 
> > 
> > Reviewed-by: Peter Xu 
> > 
> > cpr-reboot mode keeps changing behavior.
> > 
> > Could we declare it "experimental" until it's solid?  Maybe a patch to
> > document this?
> > 
> > Normally IMHO we shouldn't merge a feature if it's not complete, however
> > cpr-reboot is so special that the mode itself is already merged in 8.2
> > before I started to merge patches, and it keeps changing things.  I don't
> > know what else we can do here besides declaring it experimental and not
> > declare it a stable feature.
> 
> Hi Peter, the planned/committed functionality for cpr-reboot changed only 
> once, in:
> migration: stop vm for cpr
> 
> Suspension to support vfio is an enhancement which adds to the basic 
> functionality,
> it does not change it.  This was planned all along, but submitted as a 
> separate 

If VFIO used to migrate without suspension and now it won't, it's a
behavior change?

> series to manage complexity, as I outlined in my qemu community presentation,
> which I emailed you at the time.
> 
> Other "changes" that arose during review were just clarifications and 
> explanations.
> 
> So, I don't think cpr-reboot deserves to be condemned to experimental limbo.

IMHO it's not about a feature being condemned, it's about a kindful
heads-up to the users that one needs to take risk on using it until it
becomes stable, it also makes developers easier because of no limitation on
behavior change.  If all the changes are landing, then there's no need for
such a patch.

If so, please propose the planned complete document patch. I had a feeling
that cpr is still not fully understood by even many developers on the list.
It'll be great that such document will contain all the details one needs to
know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
when to use it, how to use it, how it differs from "normal" mode
(etc. lifted limitations on some devices that used to block migration?),
what is enforced (vm stop, suspension, etc.) and what is optionally offered
(VFIO, shared-mem, etc.), the expected behaviors, etc.

When send it, please copy relevant people (Alex & Cedric for VFIO, while
Markus could also be a good candidate considering the QMP involvement).

Thanks a lot,

-- 
Peter Xu

Re: [External] Re: [PATCH v3 2/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-02-29 Thread Hao Xiang

On Thu, Feb 29, 2024 at 7:47 AM Fabiano Rosas  wrote:
>
> Markus Armbruster  writes:
>
> > Hao Xiang  writes:
> >
> >> On Wed, Feb 28, 2024 at 1:50 AM Markus Armbruster  
> >> wrote:
> >>>
> >>> Hao Xiang  writes:
> >>>
> >>> > 1. Add zero_pages field in MultiFDPacket_t.
> >>> > 2. Implements the zero page detection and handling on the multifd
> >>> > threads for non-compression, zlib and zstd compression backends.
> >>> > 3. Added a new value 'multifd' in ZeroPageDetection enumeration.
> >>> > 4. Handle migration QEMU9.0 -> QEMU8.2 compatibility.
> >>> > 5. Adds zero page counters and updates multifd send/receive tracing
> >>> > format to track the newly added counters.
> >>> >
> >>> > Signed-off-by: Hao Xiang 
> >>>
> >>> [...]
> >>>
> >>> > diff --git a/qapi/migration.json b/qapi/migration.json
> >>> > index 1e66272f8f..5a1bb8ad62 100644
> >>> > --- a/qapi/migration.json
> >>> > +++ b/qapi/migration.json
> >>> > @@ -660,10 +660,13 @@
> >>> >  #
> >>> >  # @legacy: Perform zero page checking from main migration thread.
> >>> >  #
> >>> > +# @multifd: Perform zero page checking from multifd sender thread.
> >>> > +#
> >>> >  # Since: 9.0
> >>> > +#
> >>> >  ##
> >>> >  { 'enum': 'ZeroPageDetection',
> >>> > -  'data': [ 'none', 'legacy' ] }
> >>> > +  'data': [ 'none', 'legacy', 'multifd' ] }
> >>> >
> >>> >  ##
> >>> >  # @BitmapMigrationBitmapAliasTransform:
> >>>
> >>> What happens when you set "zero-page-detection": "multifd" *without*
> >>> enabling multifd migration?
> >>
> >> Very good question! Right now the behavior is that if "multifd
> >> migration" is not enabled, it goes through the legacy code path and
> >> the "multifd zero page" option is ignored. The legacy path has its own
> >> zero page checking and will run the same way as before. This is for
> >> backward compatibility.
> >
> > We need one of two improvements then:
> >
> > 1. Make "zero-page-detection" reject value "multifd" when multifd
> >migration is not enabled.  Document this: "Requires migration
> >capability @multifd" or similar.
> >
> > 2. Document that "multifd" means multifd only when multifd is enabled,
> >else same as "legacy".
> >
> > I prefer 1., because it's easier to document.  But migration maintainers
> > may have their own preference.  Peter, Fabiano?
>
> I think we need to go with 2 for consistency with the other multifd_*
> parameters. I don't see any validation at options.c.

Yeah, it's hard to do 1. Someone can also set multifd and then disable
multifd migration. This is an existing problem. I will update the
document for "multifd".

Re: [External] Re: [PATCH v3 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-29 Thread Hao Xiang

On Wed, Feb 28, 2024 at 10:01 PM Markus Armbruster  wrote:
>
> Hao Xiang  writes:
>
> > On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster  wrote:
> >>
> >> Hao Xiang  writes:
> >>
> >> > This change extends the MigrationStatus interface to track zero pages
> >> > and zero bytes counter.
> >> >
> >> > Signed-off-by: Hao Xiang 
> >>
> >> [...]
> >>
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index a0a85a0312..171734c07e 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -63,6 +63,10 @@
> >> >  # between 0 and @dirty-sync-count * @multifd-channels.  (since
> >> >  # 7.1)
> >> >  #
> >> > +# @zero-pages: number of zero pages (since 9.0)
> >> > +#
> >> > +# @zero-bytes: number of zero bytes sent (since 9.0)
> >> > +#
> >>
> >> Awfully terse.  How are these two related?
> >
> > Sorry I forgot to address the same feedback from the last version.
>
> Happens :)
>
> > zero-pages are the number of pages being detected as all "zero" and
> > hence the payload isn't sent over the network. zero-bytes is basically
> > zero-pages * page_size. It's the number of bytes migrated (but not
> > actually sent through the network) because they are all "zero". These
> > two are related to the existing interface below. normal and
> > normal-bytes are the same representation of pages who are not all
> > "zero" and are actually sent through the network.
> >
> > # @normal: number of normal pages (since 1.2)
> > #
> > # @normal-bytes: number of normal bytes sent (since 1.2)
>
> We also have
>
>   # @duplicate: number of duplicate (zero) pages (since 1.2)
>   #
>   # @skipped: number of skipped zero pages. Always zero, only provided for
>   # compatibility (since 1.5)
>
> Page skipping was introduced in 1.5, and withdrawn in 1.5.3 and 1.6.
> @skipped was formally deprecated in 8.1.  It'll soon be gone, no need to
> worry about it now.
>
> That leaves three values related to pages sent: @normal (and
> @normal-bytes), @duplicate (but no @duplicate-bytes), and @zero-pages
> (and @zero-bytes).
>
> I unwittingly created a naming inconsistency between @normal,
> @duplicate, and @zero-pages when I asked you to rename @zero to
> @zero-pages.
>
> The meaning of the three values is not obvious, and the doc comments
> don't explain them.  Can you, or anybody familiar with migration,
> explain them to me?
>
> MigrationStats return some values as bytes, some as pages, and some as
> both.  I hate that.  Can we standardize on bytes?

I added zero/zero-bytes because I thought they were not there. But it
turns out "duplicate" is for that purpose. "zero/zero-bytes" is really
additional information to "normal/normal-bytes". Peter suggested that
if we add "zero/zero-bytes" we can slowly retire "duplicate" at a
later point.
I don't know the historical reason why pages/bytes are used the way it
is today. The way I understand migration, the granularity of ram
migration is "page". There are only two types of pages 1) normal 2)
zero. Zero pages' playload are not sent through the network because we
already know what it looks like. Only the page offset is sent. Normal
pages are pages that are not zero. The entire page is sent through the
network to the target host. if a user knows the zero/normal count,
they can already calculate the zero-bytes/normal-bytes (zero/normal *
page size) but it's just convenient to see both. During development, I
check on these counters a lot and they are useful.

>
> >>
> >> >  # Features:
> >> >  #
> >> >  # @deprecated: Member @skipped is always zero since 1.5.3
> >>
> >> [...]
> >>
>

Re: [RFC PATCH v5 19/22] hw/intc/arm_gicv3: Report the NMI interrupt in gicv3_cpuif_update()

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

In CPU Interface, if the IRQ has the superpriority property, report
NMI to the corresponding PE.

Signed-off-by: Jinjie Ruan
---
v4:
- Swap the ordering of the IFs.
v3:
- Remove handling nmi_is_irq flag.
---
  hw/intc/arm_gicv3_cpuif.c | 4 
  1 file changed, 4 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v5 19/23] migration/multifd: Prepare multifd sync for mapped-ram migration

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 10:19:12AM -0300, Fabiano Rosas wrote:
> > IMHO EOS cannot be accounted as "invalid" here because it always exists.
> > Rather than this trick (then explicitly ignore it below... which is even
> > hackier, IMHO), we can avoid setting EOS in invalid_flags, but explicitly
> > ignore EOS in below code to bypass it for mapped-ram:
> >
> > @@ -4301,7 +4302,12 @@ static int ram_load_precopy(QEMUFile *f)
> >  case RAM_SAVE_FLAG_EOS:
> >  /* normal exit */
> >  if (migrate_multifd() &&
> > -migrate_multifd_flush_after_each_section()) {
> > +migrate_multifd_flush_after_each_section() &&
> > +/*
> > + * Mapped-ram migration flushes once and for all after
> > + * parsing ramblocks.  Always ignore EOS for it.
> > + */
> > +!migrate_mapped_ram()) {
> >  multifd_recv_sync_main();
> >  }
> >  break;
> 
> I thought we were already spraying too many migrate_mapped_ram() checks
> all over the code. But wat you said makes sense, I'll change it.

Yep that's not good, but I can't think of anything better yet and
simple. E.g. we could have some flag so ram_save_iterate()/etc. generates
nothing to the stream but only update the pages with the offsets, then we
don't need this at all and EOS can be legally accounted as invalid.  But
that can involve more changes and not helpful on this series to converge.

And it's also the long condition which makes me even more worry.. For the
long run I think we should cleanup most of these "multifd &&
after_flush_each_section && !mapped_ram" at some point..

-- 
Peter Xu

RE: [PATCH v3 1/2] ui/gtk: flush display pipeline before saving vmstate when blob=true

2024-02-29 Thread Kim, Dongwon

Hi Marc-André,

Can you take a look at this new version and also 
https://lists.gnu.org/archive/html/qemu-devel/2023-12/msg01828.html?

Thanks!
DW

> -Original Message-
> From: qemu-devel-bounces+dongwon.kim=intel@nongnu.org  devel-bounces+dongwon.kim=intel@nongnu.org> On Behalf Of
> dongwon@intel.com
> Sent: Wednesday, September 20, 2023 4:24 PM
> To: qemu-devel@nongnu.org
> Subject: [PATCH v3 1/2] ui/gtk: flush display pipeline before saving vmstate
> when blob=true
> 
> From: Dongwon Kim 
> 
> If the guest state is paused before it gets a response for the current scanout
> frame submission (resource-flush), it won't flush new frames after being
> restored as it still waits for the old response, which is accepted as a 
> scanout
> render done signal. So it's needed to unblock the current scanout render 
> pipeline
> before the run state is changed to make sure the guest receives the response 
> for
> the current frame submission.
> 
> v2: Giving some time for the fence to be signaled before flushing
> the pipeline
> 
> v3: Prevent redundant call of gd_hw_gl_flushed by checking dmabuf
> and fence_fd >= 0 in it (e.g. during and after eglClientWaitSync
> in gd_change_runstate).
> 
> Destroy sync object later in gd_hw_fl_flushed since it is needed
> by eglClientWaitSync.
> 
> Cc: Marc-André Lureau 
> Cc: Vivek Kasireddy 
> Signed-off-by: Dongwon Kim 
> ---
>  ui/egl-helpers.c |  2 --
>  ui/gtk.c | 31 +++
>  2 files changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c index 3d19dbe382..a77f9e57d9
> 100644
> --- a/ui/egl-helpers.c
> +++ b/ui/egl-helpers.c
> @@ -385,8 +385,6 @@ void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
>  if (dmabuf->sync) {
>  dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
>dmabuf->sync);
> -eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
> -dmabuf->sync = NULL;
>  }
>  }
> 
> diff --git a/ui/gtk.c b/ui/gtk.c
> index 810d7fc796..eaca890cba 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -597,10 +597,14 @@ void gd_hw_gl_flushed(void *vcon)
>  VirtualConsole *vc = vcon;
>  QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
> 
> -qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
> -close(dmabuf->fence_fd);
> -dmabuf->fence_fd = -1;
> -graphic_hw_gl_block(vc->gfx.dcl.con, false);
> +if (dmabuf && dmabuf->fence_fd >= 0) {
> +qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
> +close(dmabuf->fence_fd);
> +dmabuf->fence_fd = -1;
> +eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
> +dmabuf->sync = NULL;
> +graphic_hw_gl_block(vc->gfx.dcl.con, false);
> +}
>  }
> 
>  /** DisplayState Callbacks (opengl version) **/ @@ -678,6 +682,25 @@ static
> const DisplayGLCtxOps egl_ctx_ops = {  static void gd_change_runstate(void
> *opaque, bool running, RunState state)  {
>  GtkDisplayState *s = opaque;
> +int i;
> +
> +if (state == RUN_STATE_SAVE_VM) {
> +for (i = 0; i < s->nb_vcs; i++) {
> +VirtualConsole *vc = &s->vc[i];
> +
> +if (vc->gfx.guest_fb.dmabuf &&
> +vc->gfx.guest_fb.dmabuf->fence_fd >= 0) {
> +eglClientWaitSync(qemu_egl_display,
> +  vc->gfx.guest_fb.dmabuf->sync,
> +  EGL_SYNC_FLUSH_COMMANDS_BIT_KHR,
> +  1);
> +
> +/* force flushing current scanout blob rendering process
> + * just in case the fence is still not signaled */
> +gd_hw_gl_flushed(vc);
> +}
> +}
> +}
> 
>  gd_update_caption(s);
>  }
> --
> 2.34.1
>

Re: [PATCH v5 17/23] migration/multifd: Add outgoing QIOChannelFile support

2024-02-29 Thread Peter Xu

On Thu, Feb 29, 2024 at 02:34:13PM +, Daniel P. Berrangé wrote:
> On Thu, Feb 29, 2024 at 11:27:44AM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Thu, Feb 29, 2024 at 10:44:21AM +0800, Peter Xu wrote:
> > >> On Wed, Feb 28, 2024 at 12:21:21PM -0300, Fabiano Rosas wrote:
> > >> > Allow multifd to open file-backed channels. This will be used when
> > >> > enabling the mapped-ram migration stream format which expects a
> > >> > seekable transport.
> > >> > 
> > >> > The QIOChannel read and write methods will use the preadv/pwritev
> > >> > versions which don't update the file offset at each call so we can
> > >> > reuse the fd without re-opening for every channel.
> > >> > 
> > >> > Contrary to the socket migration, the file migration doesn't need an
> > >> > asynchronous channel creation process, so expose
> > >> > multifd_channel_connect() and call it directly.
> > >> > 
> > >> > Note that this is just setup code and multifd cannot yet make use of
> > >> > the file channels.
> > >> > 
> > >> > Signed-off-by: Fabiano Rosas 
> > >> > ---
> > >> > - moved flags change to another patch
> > >> > - removed channels_created assert
> > >> > ---
> > >> >  migration/file.c| 41 +++--
> > >> >  migration/file.h|  4 
> > >> >  migration/multifd.c | 18 +++---
> > >> >  migration/multifd.h |  1 +
> > >> >  4 files changed, 59 insertions(+), 5 deletions(-)
> > >> > 
> > >> > diff --git a/migration/file.c b/migration/file.c
> > >> > index 22d052a71f..83328a7a1b 100644
> > >> > --- a/migration/file.c
> > >> > +++ b/migration/file.c
> > >> > @@ -12,12 +12,17 @@
> > >> >  #include "channel.h"
> > >> >  #include "file.h"
> > >> >  #include "migration.h"
> > >> > +#include "multifd.h"
> > >> >  #include "io/channel-file.h"
> > >> >  #include "io/channel-util.h"
> > >> >  #include "trace.h"
> > >> >  
> > >> >  #define OFFSET_OPTION ",offset="
> > >> >  
> > >> > +static struct FileOutgoingArgs {
> > >> > +char *fname;
> > >> > +} outgoing_args;
> > >> > +
> > >> >  /* Remove the offset option from @filespec and return it in @offsetp. 
> > >> > */
> > >> >  
> > >> >  int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
> > >> > @@ -37,6 +42,36 @@ int file_parse_offset(char *filespec, uint64_t 
> > >> > *offsetp, Error **errp)
> > >> >  return 0;
> > >> >  }
> > >> >  
> > >> > +void file_cleanup_outgoing_migration(void)
> > >> > +{
> > >> > +g_free(outgoing_args.fname);
> > >> > +outgoing_args.fname = NULL;
> > >> > +}
> > >> > +
> > >> > +bool file_send_channel_create(gpointer opaque, Error **errp)
> > >> > +{
> > >> > +QIOChannelFile *ioc;
> > >> > +int flags = O_WRONLY;
> > >> > +bool ret = true;
> > >> > +
> > >> > +ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, 
> > >> > errp);
> > >> > +if (!ioc) {
> > >> > +ret = false;
> > >> > +goto out;
> > >> > +}
> > >> > +
> > >> > +multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
> > >> > +
> > >> > +out:
> > >> > +/*
> > >> > + * File channel creation is synchronous. However posting this
> > >> > + * semaphore here is simpler than adding a special case.
> > >> > + */
> > >> > +multifd_send_channel_created();
> > >> > +
> > >> > +return ret;
> > >> > +}
> > >> > +
> > >> >  void file_start_outgoing_migration(MigrationState *s,
> > >> > FileMigrationArgs *file_args, 
> > >> > Error **errp)
> > >> >  {
> > >> > @@ -47,12 +82,14 @@ void file_start_outgoing_migration(MigrationState 
> > >> > *s,
> > >> >  
> > >> >  trace_migration_file_outgoing(filename);
> > >> >  
> > >> > -fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | 
> > >> > O_TRUNC,
> > >> > - 0600, errp);
> > >> > +fioc = qio_channel_file_new_path(filename, O_CREAT | O_TRUNC | 
> > >> > O_WRONLY,
> > >> > + 0660, errp);
> > >> 
> > >> It seems this is still leftover?
> > >> 
> > >> >  if (!fioc) {
> > >> >  return;
> > >> >  }
> > >> >  
> > >> > +outgoing_args.fname = g_strdup(filename);
> > >> > +
> > >> >  ioc = QIO_CHANNEL(fioc);
> > >> >  if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 
> > >> > 0) {
> > >> >  return;
> > >> > diff --git a/migration/file.h b/migration/file.h
> > >> > index 37d6a08bfc..4577f9efdd 100644
> > >> > --- a/migration/file.h
> > >> > +++ b/migration/file.h
> > >> > @@ -9,10 +9,14 @@
> > >> >  #define QEMU_MIGRATION_FILE_H
> > >> >  
> > >> >  #include "qapi/qapi-types-migration.h"
> > >> > +#include "io/task.h"
> > >> > +#include "channel.h"
> > >> >  
> > >> >  void file_start_incoming_migration(FileMigrationArgs *file_args, 
> > >> > Error **errp);
> > >> >  
> > >> >  void file_start_outgoing_migration(MigrationState *s,
> > >> > FileMigrationArgs *file_args, 
> > >> > Er

RE: [PATCH 0/3] ui/gtk: introducing vc->visible

2024-02-29 Thread Kim, Dongwon

Hi Marc-André Lureau,

Just a reminder.. I need your help reviewing this patch series. Please take a 
look at my messages at
https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg06636.html and
https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg06637.html

Thanks!!
DW

> -Original Message-
> From: qemu-devel-bounces+dongwon.kim=intel@nongnu.org  devel-bounces+dongwon.kim=intel@nongnu.org> On Behalf Of
> dongwon@intel.com
> Sent: Tuesday, January 30, 2024 3:49 PM
> To: qemu-devel@nongnu.org
> Subject: [PATCH 0/3] ui/gtk: introducing vc->visible
> 
> From: Dongwon Kim 
> 
> Drawing guest display frames can't be completed while the VC is not in visible
> state, which could result in timeout in both the host and the guest especially
> when using blob scanout. Therefore it is needed to update and track the 
> visiblity
> status of the VC and unblock the pipeline in case when VC becomes invisible 
> (e.g.
> windows minimization, switching among tabs) while processing a guest frame.
> 
> First patch (0001-ui-gtk-skip...) is introducing a flag 'visible' to 
> VirtualConsole
> struct then set it only if the VC and its window is visible.
> 
> Second patch (0002-ui-gtk-set-...) sets the ui size to 0 when VC is invisible 
> when
> the tab is closed or deactivated. This notifies the guest that the associated 
> guest
> display is not active anymore.
> 
> Third patch (0003-ui-gtk-reset-visible...) adds a callback for GTK 
> window-state-
> event. The flag, 'visible' is updated based on the minization status of the 
> window.
> 
> Dongwon Kim (3):
>   ui/gtk: skip drawing guest scanout when associated VC is invisible
>   ui/gtk: set the ui size to 0 when invisible
>   ui/gtk: reset visible flag when window is minimized
> 
>  include/ui/gtk.h |  1 +
>  ui/gtk-egl.c |  8 +++
>  ui/gtk-gl-area.c |  8 +++
>  ui/gtk.c | 62 ++--
>  4 files changed, 77 insertions(+), 2 deletions(-)
> 
> --
> 2.34.1
>

Re: [RFC PATCH v5 18/22] hw/intc/arm_gicv3: Implement NMI interrupt prioirty

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

If GICD_CTLR_DS bit is zero and the NMI is non-secure, the NMI prioirty
is higher than 0x80, otherwise it is higher than 0x0. And save NMI
super prioirty information in hppi.superprio to deliver NMI exception.
Since both GICR and GICD can deliver NMI, it is both necessary to check
whether the pending irq is NMI in gicv3_redist_update_noirqset and
gicv3_update_noirqset. And In irqbetter(), only a non-NMI with the same
priority and a smaller interrupt number can be preempted but not NMI.

Signed-off-by: Jinjie Ruan 
---
v4:
- Replace is_nmi with has_superprio to not a mix NMI and superpriority.
- Update the comment in irqbetter().
- Extract gicv3_get_priority() to avoid code repeat.
---
v3:
- Add missing brace
---
  hw/intc/arm_gicv3.c | 71 -
  1 file changed, 63 insertions(+), 8 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 0b8f79a122..1d16a53b23 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -21,7 +21,8 @@
  #include "hw/intc/arm_gicv3.h"
  #include "gicv3_internal.h"
  
-static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)

+static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio,
+  bool has_superprio)
  {
  /* Return true if this IRQ at this priority should take
   * precedence over the current recorded highest priority
@@ -33,11 +34,24 @@ static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t 
prio)
  if (prio < cs->hppi.prio) {
  return true;
  }
+
+/*
+ * Current highest prioirity pending interrupt is an IRQ without
+ * superpriority, the new IRQ with superpriority has same priority
+ * should signal to the CPU as it have the priority higher than
+ * the labelled 0x80 or 0x00.
+ */
+if (prio == cs->hppi.prio && !cs->hppi.superprio && has_superprio) {
+return true;
+}
+
  /* If multiple pending interrupts have the same priority then it is an
   * IMPDEF choice which of them to signal to the CPU. We choose to
- * signal the one with the lowest interrupt number.
+ * signal the one with the lowest interrupt number if they don't have
+ * superpriority.
   */
-if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
+if (prio == cs->hppi.prio && !cs->hppi.superprio &&
+!has_superprio && irq <= cs->hppi.irq) {
  return true;
  }
  return false;
@@ -129,6 +143,35 @@ static uint32_t gicr_int_pending(GICv3CPUState *cs)
  return pend;
  }
  
+static bool gicv3_get_priority(GICv3CPUState *cs, bool is_redist,

+   uint32_t superprio, uint8_t *prio, int irq)
+{
+bool has_superprio = false;
+
+if (superprio) {
+has_superprio = true;
+
+/* DS = 0 & Non-secure NMI */
+if (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
+((is_redist && extract32(cs->gicr_igroupr0, irq, 1)) ||
+ (!is_redist && gicv3_gicd_group_test(cs->gic, irq {
+*prio = 0x80;
+} else {
+*prio = 0x0;
+}
+} else {
+has_superprio = false;
+
+if (is_redist) {
+*prio = cs->gicr_ipriorityr[irq];
+} else {
+*prio = cs->gic->gicd_ipriority[irq];
+}
+}
+
+return has_superprio;
+}


Did you not like the idea to map {priority, !superpriority} into a single value?

It would eliminate the change in irqbetter(), which is a bit more complex than 
it needs to be.


@@ -152,10 +197,13 @@ static void gicv3_redist_update_noirqset(GICv3CPUState 
*cs)
  if (!(pend & (1 << i))) {
  continue;
  }
-prio = cs->gicr_ipriorityr[i];
-if (irqbetter(cs, i, prio)) {
+superprio = extract32(cs->gicr_isuperprio, i, 1);
+has_superprio = gicv3_get_priority(cs, true, superprio, &prio, i);


It would allow moving the read of gicr_isuperprio into gicv3_get_priority(), alongside the 
read of gicr_ipriorityr.


Is there a bug here not handling is_redist for GCIR_INMI*?


@@ -168,7 +216,7 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
  if ((cs->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) && cs->gic->lpi_enable &&
  (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1NS) &&
  (cs->hpplpi.prio != 0xff)) {
-if (irqbetter(cs, cs->hpplpi.irq, cs->hpplpi.prio)) {
+if (irqbetter(cs, cs->hpplpi.irq, cs->hpplpi.prio, false)) {


Always passing false here is incorrect -- again missing the redistributor nmi 
bit?


r~

Re: [RFC PATCH v5 17/22] hw/intc/arm_gicv3: Add NMI handling CPU interface registers

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

+static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* todo */
+uint64_t intid = INTID_SPURIOUS;
+return intid;
+}


You're going to need to fill this in.

It's like icv_iar_read() but only affects irqs with superpriority.


r~

Re: [RFC PATCH v5 20/22] hw/intc/arm_gicv3: Report the VNMI interrupt

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

In vCPU Interface, if the vIRQ has the superpriority property, report
vNMI to the corresponding vPE.

Signed-off-by: Jinjie Ruan
---
  hw/intc/arm_gicv3_cpuif.c | 14 --
  hw/intc/gicv3_internal.h  |  1 +
  2 files changed, 13 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [QEMU][PATCH v3 6/7] xen: add map and unmap callbacks for grant region

2024-02-29 Thread Stefano Stabellini

On Tue, 27 Feb 2024, Vikram Garhwal wrote:
> From: Juergen Gross 
> 
> Add the callbacks for mapping/unmapping guest memory via grants to the
> special grant memory region.
> 
> Signed-off-by: Juergen Gross 
> Signed-off-by: Vikram Garhwal 

Reviewed-by: Stefano Stabellini 


> ---
>  hw/xen/xen-mapcache.c | 176 +-
>  system/physmem.c  |  11 ++-
>  2 files changed, 182 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
> index 179b7e95b2..2e4c9b4947 100644
> --- a/hw/xen/xen-mapcache.c
> +++ b/hw/xen/xen-mapcache.c
> @@ -9,6 +9,8 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/queue.h"
> +#include "qemu/thread.h"
>  #include "qemu/units.h"
>  #include "qemu/error-report.h"
>  
> @@ -23,6 +25,8 @@
>  #include "sysemu/xen-mapcache.h"
>  #include "trace.h"
>  
> +#include 
> +#include 
>  
>  #if HOST_LONG_BITS == 32
>  #  define MCACHE_BUCKET_SHIFT 16
> @@ -377,7 +381,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
>  return p;
>  }
>  
> -ram_addr_t xen_ram_addr_from_mapcache(void *ptr)
> +static ram_addr_t xen_ram_addr_from_mapcache_try(void *ptr)
>  {
>  MapCacheEntry *entry = NULL;
>  MapCacheRev *reventry;
> @@ -588,10 +592,179 @@ uint8_t *xen_replace_cache_entry(hwaddr old_phys_addr,
>  return p;
>  }
>  
> +struct XENMappedGrantRegion {
> +void *addr;
> +unsigned int pages;
> +unsigned int refs;
> +unsigned int prot;
> +uint32_t idx;
> +QLIST_ENTRY(XENMappedGrantRegion) list;
> +};
> +
> +static xengnttab_handle *xen_region_gnttabdev;
> +static QLIST_HEAD(GrantRegionList, XENMappedGrantRegion) xen_grant_mappings =
> +QLIST_HEAD_INITIALIZER(xen_grant_mappings);
> +static QemuMutex xen_map_mutex;
> +
> +static void *xen_map_grant_dyn(MemoryRegion **mr, hwaddr addr, hwaddr *plen,
> +   bool is_write, MemTxAttrs attrs)
> +{
> +unsigned int page_off = addr & (XC_PAGE_SIZE - 1);
> +unsigned int i;
> +unsigned int total_grants = 0;
> +unsigned int nrefs = (page_off + *plen + XC_PAGE_SIZE - 1) >> 
> XC_PAGE_SHIFT;
> +uint32_t ref = (addr - XEN_GRANT_ADDR_OFF) >> XC_PAGE_SHIFT;
> +uint32_t *refs = NULL;
> +unsigned int prot = PROT_READ;
> +struct XENMappedGrantRegion *mgr = NULL;
> +
> +if (is_write) {
> +prot |= PROT_WRITE;
> +}
> +
> +qemu_mutex_lock(&xen_map_mutex);
> +
> +QLIST_FOREACH(mgr, &xen_grant_mappings, list) {
> +if (mgr->idx == ref &&
> +mgr->pages == nrefs &&
> +(mgr->prot & prot) == prot) {
> +break;
> +}
> +
> +total_grants += mgr->pages;
> +}
> +
> +if (!mgr) {
> +if (nrefs + total_grants >= XEN_MAX_VIRTIO_GRANTS) {
> +qemu_mutex_unlock(&xen_map_mutex);
> +return NULL;
> +}
> +
> +mgr = g_new(struct XENMappedGrantRegion, 1);
> +
> +if (nrefs == 1) {
> +refs = &ref;
> +} else {
> +refs = g_new(uint32_t, nrefs);
> +for (i = 0; i < nrefs; i++) {
> +refs[i] = ref + i;
> +}
> +}
> +mgr->addr = xengnttab_map_domain_grant_refs(xen_region_gnttabdev, 
> nrefs,
> +xen_domid, refs, prot);
> +if (mgr->addr) {
> +mgr->pages = nrefs;
> +mgr->refs = 1;
> +mgr->prot = prot;
> +mgr->idx = ref;
> +
> +QLIST_INSERT_HEAD(&xen_grant_mappings, mgr, list);
> +} else {
> +g_free(mgr);
> +mgr = NULL;
> +}
> +} else {
> +mgr->refs++;
> +}
> +
> +qemu_mutex_unlock(&xen_map_mutex);
> +
> +if (nrefs > 1) {
> +g_free(refs);
> +}
> +
> +return mgr ? mgr->addr + page_off : NULL;
> +}
> +
> +static void xen_unmap_grant_dyn(MemoryRegion *mr, void *buffer, ram_addr_t 
> addr,
> +hwaddr len, bool is_write, hwaddr access_len)
> +{
> +unsigned int page_off = (unsigned long)buffer & (XC_PAGE_SIZE - 1);
> +unsigned int nrefs = (page_off + len + XC_PAGE_SIZE - 1) >> 
> XC_PAGE_SHIFT;
> +unsigned int prot = PROT_READ;
> +struct XENMappedGrantRegion *mgr = NULL;
> +
> +if (is_write) {
> +prot |= PROT_WRITE;
> +}
> +
> +qemu_mutex_lock(&xen_map_mutex);
> +
> +QLIST_FOREACH(mgr, &xen_grant_mappings, list) {
> +if (mgr->addr == buffer - page_off &&
> +mgr->pages == nrefs &&
> +(mgr->prot & prot) == prot) {
> +break;
> +}
> +}
> +if (mgr) {
> +mgr->refs--;
> +if (!mgr->refs) {
> +xengnttab_unmap(xen_region_gnttabdev, mgr->addr, nrefs);
> +
> +QLIST_REMOVE(mgr, list);
> +g_free(mgr);
> +}
> +} else {
> +error_report("xen_unmap_grant_dyn() trying to unmap unknown buffer");
> +}
> +
> +qemu_mutex_

Re: [RFC PATCH v5 14/22] hw/intc/arm_gicv3_redist: Implement GICR_INMIR0

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

Add GICR_INMIR0 register and support access GICR_INMIR0.

Signed-off-by: Jinjie Ruan
---
v4:
- Make the GICR_INMIR0 implementation more clearer.
---
  hw/intc/arm_gicv3_redist.c | 19 +++
  hw/intc/gicv3_internal.h   |  1 +
  2 files changed, 20 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [QEMU][PATCH v3 5/7] memory: add MemoryRegion map and unmap callbacks

2024-02-29 Thread Stefano Stabellini

On Tue, 27 Feb 2024, Vikram Garhwal wrote:
> From: Juergen Gross 
> 
> In order to support mapping and unmapping guest memory dynamically to
> and from qemu during address_space_[un]map() operations add the map()
> and unmap() callbacks to MemoryRegionOps.
> 
> Those will be used e.g. for Xen grant mappings when performing guest
> I/Os.
> 
> Signed-off-by: Juergen Gross 
> Signed-off-by: Vikram Garhwal 

Reviewed-by: Stefano Stabellini 


> ---
>  include/exec/memory.h | 21 ++
>  system/physmem.c  | 50 +--
>  2 files changed, 60 insertions(+), 11 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 8626a355b3..9f7dfe59c7 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -282,6 +282,27 @@ struct MemoryRegionOps {
>  unsigned size,
>  MemTxAttrs attrs);
>  
> +/*
> + * Dynamically create mapping. @addr is the guest address to map; @plen
> + * is the pointer to the usable length of the buffer.
> + * @mr contents can be changed in case a new memory region is created for
> + * the mapping.
> + * Returns the buffer address for accessing the data.
> + */
> +void *(*map)(MemoryRegion **mr,
> + hwaddr addr,
> + hwaddr *plen,
> + bool is_write,
> + MemTxAttrs attrs);
> +
> +/* Unmap an area obtained via map() before. */
> +void (*unmap)(MemoryRegion *mr,
> +  void *buffer,
> +  ram_addr_t addr,
> +  hwaddr len,
> +  bool is_write,
> +  hwaddr access_len);
> +
>  enum device_endian endianness;
>  /* Guest-visible constraints: */
>  struct {
> diff --git a/system/physmem.c b/system/physmem.c
> index 949dcb20ba..d989e9fc1f 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -3141,6 +3141,7 @@ void *address_space_map(AddressSpace *as,
>  hwaddr len = *plen;
>  hwaddr l, xlat;
>  MemoryRegion *mr;
> +void *ptr = NULL;
>  FlatView *fv;
>  
>  if (len == 0) {
> @@ -3174,12 +3175,20 @@ void *address_space_map(AddressSpace *as,
>  return bounce.buffer;
>  }
>  
> -
>  memory_region_ref(mr);
> +
> +if (mr->ops && mr->ops->map) {
> +ptr = mr->ops->map(&mr, addr, plen, is_write, attrs);
> +}
> +
>  *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
>  l, is_write, attrs);
>  fuzz_dma_read_cb(addr, *plen, mr);
> -return qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
> +if (ptr == NULL) {
> +ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
> +}
> +
> +return ptr;
>  }
>  
>  /* Unmaps a memory region previously mapped by address_space_map().
> @@ -3195,11 +3204,16 @@ void address_space_unmap(AddressSpace *as, void 
> *buffer, hwaddr len,
>  
>  mr = memory_region_from_host(buffer, &addr1);
>  assert(mr != NULL);
> -if (is_write) {
> -invalidate_and_set_dirty(mr, addr1, access_len);
> -}
> -if (xen_enabled()) {
> -xen_invalidate_map_cache_entry(buffer);
> +
> +if (mr->ops && mr->ops->unmap) {
> +mr->ops->unmap(mr, buffer, addr1, len, is_write, access_len);
> +} else {
> +if (is_write) {
> +invalidate_and_set_dirty(mr, addr1, access_len);
> +}
> +if (xen_enabled()) {
> +xen_invalidate_map_cache_entry(buffer);
> +}
>  }
>  memory_region_unref(mr);
>  return;
> @@ -3272,10 +3286,18 @@ int64_t address_space_cache_init(MemoryRegionCache 
> *cache,
>   * doing this if we found actual RAM, which behaves the same
>   * regardless of attributes; so UNSPECIFIED is fine.
>   */
> +if (mr->ops && mr->ops->map) {
> +cache->ptr = mr->ops->map(&mr, addr, &l, is_write,
> +  MEMTXATTRS_UNSPECIFIED);
> +}
> +
>  l = flatview_extend_translation(cache->fv, addr, len, mr,
>  cache->xlat, l, is_write,
>  MEMTXATTRS_UNSPECIFIED);
> -cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, &l, 
> true);
> +if (!cache->ptr) {
> +cache->ptr = qemu_ram_ptr_length(mr->ram_block, cache->xlat, &l,
> + true);
> +}
>  } else {
>  cache->ptr = NULL;
>  }
> @@ -3297,14 +3319,20 @@ void address_space_cache_invalidate(MemoryRegionCache 
> *cache,
>  
>  void address_space_cache_destroy(MemoryRegionCache *cache)
>  {
> -if (!cache->mrs.mr) {
> +MemoryRegion *mr = cache->mrs.mr;
> +
> +if (!mr) {
>  return;
>  }
>  
> -if (xen_enabled()) {

Re: [RFC PATCH v5 12/22] target/arm: Handle NMI in arm_cpu_do_interrupt_aarch64()

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
with superpriority is always IRQ, never FIQ, so the NMI exception trap entry
behave like IRQ. However, VNMI can be IRQ or FIQ, FIQ can only come from
hcrx_el2.HCRX_VFNMI bit, IRQ can be raised from the GIC or come from the
hcrx_el2.HCRX_VINMI bit.

Signed-off-by: Jinjie Ruan 
---
v4:
- Also handle VNMI in arm_cpu_do_interrupt_aarch64().
v3:
- Remove the FIQ NMI handle.
---
  target/arm/helper.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b796dbdf21..bd34b3506a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11459,12 +11459,21 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
  break;
  case EXCP_IRQ:
  case EXCP_VIRQ:
+case EXCP_NMI:
  addr += 0x80;
  break;
  case EXCP_FIQ:
  case EXCP_VFIQ:
  addr += 0x100;
  break;
+case EXCP_VNMI:
+if (env->irq_line_state & CPU_INTERRUPT_VNMI ||
+env->cp15.hcrx_el2 & HCRX_VINMI) {
+addr += 0x80;
+} else if (env->cp15.hcrx_el2 & HCRX_VFNMI) {
+addr += 0x100;
+}
+break;


By not combining VFNMI with CPU_INTERRUPT_VNMI, you don't need this 
complication.
Just

 case EXCP_IRQ:
 case EXCP_VIRQ:
+case EXCP_NMI:


r~

Re: [RFC PATCH v5 08/22] target/arm: Handle IS/FS in ISR_EL1 for NMI

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

  if (hcr_el2 & HCR_FMO) {
  if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
  ret |= CPSR_F;
+
+if (env->cp15.hcrx_el2 & HCRX_VFNMI) {
+ret |= ISR_FS;
+}


VFIO can be raised one of two ways: from the GIC and from HCR_EL2.VF.
But superpriority can only be added with HCRX_EL2.VFNMI.

You need to verify that HCR_EL2.VF is set before checking VFNMI.


r~

Re: [RFC PATCH v5 06/22] target/arm: Add support for Non-maskable Interrupt

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

+bool new_state = ((env->cp15.hcr_el2 & HCR_VI) &&
+  (env->cp15.hcrx_el2 & HCRX_VINMI)) ||
+ ((env->cp15.hcr_el2 & HCR_VF) &&
+  (env->cp15.hcrx_el2 & HCRX_VFNMI)) ||
+(env->irq_line_state & CPU_INTERRUPT_VNMI);


Because the GIC cannot signal an FIQ with superpriority, I think you should not include VF 
&& VFNMI in CPU_INTERRUPT_VNMI.


See comments for patch 8.


r~

Re: [PATCH] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread BALATON Zoltan


On Thu, 29 Feb 2024, Sven Schnelle wrote:

BALATON Zoltan  writes:


On Thu, 29 Feb 2024, Sven Schnelle wrote:

Some OS's like HP-UX 10.20 are spinn


I guess the above line is left here by accident.


Yes.


HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions until the timer fires.
The limit of instructions is also reduced because scripts running
on the SCSI processor are usually very short. This keeps the time
until the loop-exit short.

Suggested-by: Peter Maydell 
Signed-off-by: Sven Schnelle 
---
hw/scsi/lsi53c895a.c | 33 +
1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb..0b6f1dc72f 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@ static const char *names[] = {
#define LSI_TAG_VALID (1 << 16)

/* Maximum instructions to process. */
-#define LSI_MAX_INSN1
+#define LSI_MAX_INSN100

typedef struct lsi_request {
SCSIRequest *req;
@@ -205,6 +205,7 @@ enum {
LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
};

enum {
@@ -224,6 +225,7 @@ struct LSIState {
MemoryRegion ram_io;
MemoryRegion io_io;
AddressSpace pci_io_as;
+QEMUTimer *scripts_timer;

int carry; /* ??? Should this be an a visible register somewhere?  */
int status;
@@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
s->sbr = 0;
assert(QTAILQ_EMPTY(&s->queue));
assert(!s->current);
+timer_del(s->scripts_timer);


Maybe the rimer needs to be deleted in lsi_scsi_exit() too but I'm not
sure.


I added it, thanks.


}

static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
}
}

+static void lsi_scripts_timer_start(LSIState *s)
+{
+trace_lsi_scripts_timer_start();
+timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
static void lsi_execute_script(LSIState *s)
{
PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1152,13 +1161,8 @@ again:
 * which should be enough for all valid use cases).
 */


Does tha above comment need updating to say what the code does now?


Yes, i changed it to describe the new method:

   /*
* Some windows drivers make the device spin waiting for a memory location
* to change. If we have executed more than LSI_MAX_INSN instructions then
* assume this is the case and start a timer. Until the timer fires, the
* host CPU has a chance to run and change the memory location.


Is that the host or guest CPU? I thought the guest vcpu needs to get a 
chance to do something about this but maybe I did not get the problem at 
all.


Regards,
BALATON Zoltan


*
* Another issue (CVE-2023-0330) can occur if the script is programmed to
* trigger itself again and again. Avoid this problem by stopping after
* being called multiple times in a reentrant way (8 is an arbitrary value
* which should be enough for all valid use cases).
*/


Regards,
BALATON Zoltan


if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-if (!(s->sien0 & LSI_SIST0_UDC)) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "lsi_scsi: inf. loop with UDC masked");
-}
-lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-lsi_disconnect(s);
-trace_lsi_execute_script_stop();
+s->waiting = LSI_WAIT_SCRIPTS;
+lsi_scripts_timer_start(s);
reentrancy_level--;
return;
}
@@ -2197,6 +2201,9 @@ static int lsi_post_load(void *opaque, int version_id)
return -EINVAL;
}

+if (s->waiting == LSI_WAIT_SCRIPTS) {
+lsi_scripts_timer_start(s);
+}
return 0;
}

@@ -2294,6 +2301,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
.cancel = lsi_request_cancelled
};

+static void scripts_timer_cb(void *opaque)
+{
+LSIState *s = opaque;
+
+trace_lsi_scripts_timer_triggered();
+s->waiting = LSI_NOWAIT;
+lsi_execute_script(s);
+}
+
static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
{
LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2329,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
  "lsi-ram", 0x2000);
memory_region_init_io(&s->io_io, OBJECT(s), &lsi_io_ops, s,
  "lsi-io", 256);
+s->scripts_timer = timer_new_us(

Re: [RFC PATCH v5 06/22] target/arm: Add support for Non-maskable Interrupt

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

@@ -900,6 +945,31 @@ void arm_cpu_update_vfiq(ARMCPU *cpu)
  }
  }
  
+void arm_cpu_update_vnmi(ARMCPU *cpu)

+{
+/*
+ * Update the interrupt level for VNMI, which is the logical OR of
+ * the HCRX_EL2.VINMI or HCRX_EL2.VFNMI bit and the input line level from
+ * the GIC.
+ */
+CPUARMState *env = &cpu->env;
+CPUState *cs = CPU(cpu);
+
+bool new_state = ((env->cp15.hcr_el2 & HCR_VI) &&
+  (env->cp15.hcrx_el2 & HCRX_VINMI)) ||
+ ((env->cp15.hcr_el2 & HCR_VF) &&
+  (env->cp15.hcrx_el2 & HCRX_VFNMI)) ||


Need to use arm_hcr_el2_eff and arm_hcrx_el2_eff.  I see this is an existing error from 
the other functions too.



+(env->irq_line_state & CPU_INTERRUPT_VNMI);
+
+if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VNMI) != 0)) {
+if (new_state) {
+cpu_interrupt(cs, CPU_INTERRUPT_VNMI);
+} else {
+cpu_reset_interrupt(cs, CPU_INTERRUPT_VNMI);
+}
+}
+}


This is incomplete, as you need additional changes within the other two functions to not 
raise IRQ when VINMI is set, etc.



r~

Re: [PATCH v2 1/3] qtest: migration: Enhance qtest migration functions to support 'channels' argument

2024-02-29 Thread Fabiano Rosas

Fabiano Rosas  writes:

> Het Gala  writes:
>
>> On 27/02/24 1:04 am, Het Gala wrote:
>>>
>>>
>>> On 26/02/24 6:31 pm, Fabiano Rosas wrote:
 Het Gala  writes:

> On 24/02/24 1:42 am, Fabiano Rosas wrote:
> this was the same first approach that I attempted. It won't work because
>
> The final 'migrate' QAPI with channels string would look like
>
> { "execute": "migrate", "arguments": { "channels": "[ { "channel-type":
> "main", "addr": { "transport": "socket", "type": "inet", "host":
> "10.117.29.84", "port": "4000" }, "multifd-channels": 2 } ]" } }
>
> instead of
>
> { "execute": "migrate", "arguments": { "channels": [ { "channel-type":
> "main", "addr": { "transport": "socket", "type": "inet", "host":
> "10.117.29.84", "port": "4000" }, "multifd-channels": 2 } ] } }
>
> It would complain, that channels should be an *array* and not a string.
>
> So, that's the reason parsing was required in qtest too.
>
> I would be glad to hear if there are any ideas to convert /string ->
> json object -> add it inside qdict along with uri/ ?
>
 Isn't this what the various qobject_from_json do? How does it work with
 the existing tests?

  qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
   "  'arguments': { "
   "  'channels': [ { 'channel-type': 
 'main',"
   "  'addr': { 'transport': 'socket',"
   "'type': 'inet',"
   "'host': '127.0.0.1',"
   "'port': '0' } } ] } }");

 We can pass this^ string successfully to QMP somehow...
>>>
>>> I think, here in qtest_qmp_assert_success, we actually can pass the 
>>> whole QMP command, and it just asserts that return key is present in 
>>> the response, though I am not very much familiar with qtest codebase 
>>> to verify how swiftly we can convert string into an actual QObject.
>>>
>>> [...]
>>>
>> I tried with qobject_from_json type of utility functions and the error I 
>> got was this :
>>
>> migration-test: /rpmbuild/SOURCES/qemu/include/qapi/qmp/qobject.h:126: 
>> qobject_type: Assertion `QTYPE_NONE < obj->base.type && obj->base.type < 
>> QTYPE__MAX' failed.
>>
>> And I suppose this was the case because, there are only limited types of 
>> QTYPE available
>>
>> typedefenumQType{
>> QTYPE_NONE,
>> QTYPE_QNULL,
>> QTYPE_QNUM,
>> QTYPE_QSTRING,
>> QTYPE_QDICT,
>> QTYPE_QLIST,
>> QTYPE_QBOOL,
>> QTYPE__MAX,
>> } QType;
>>
>> And 'channels' is a mixture of QDICT and QLIST and hence it is not able 
>> to easily convert from string to json.
>>
>> Thoughts on this ?
>
> I'm not sure what you tried. This works:
>
> g_assert(!qdict_haskey(args, "channels"));
> if (channels) {
> channels_obj = qobject_from_json(channels, errp);
> qdict_put_obj(args, "channels", channels_obj);
> }
>
> And in the test:
>
> .connect_channels = "[ { 'channel-type': 'main',"
> "'addr': { 'transport': 'socket',"
> "  'type': 'inet',"
> "  'host': '127.0.0.1',"
> "  'port': '0' } } ]",
> .listen_uri = "tcp:127.0.0.1:0",
> .result = MIG_TEST_QMP_ERROR
>
> However, the real issue is how to inject the port for the source
> migration. The example above only works for the tests that are expected
> to fail. For a test that should pass, 0 as a port does not work.
>
> I'm thinking it might be better to alter migrate_qmp like this:
>
>   void migrate_qmp(QTestState *from, QTestState *to, const char *channels,
>const char *fmt, ...)
>
> Invocations would be:
>
>   migrate_qmp(from, to, NULL, "{uri: %s}", connect_uri);
>   migrate_qmp(from, to, args->channels, "{}");
>
> In this last case, if the test provided a port, we use it, otherwise we
> resolve it from the 'to' instance and put it in the QDict directly.
>
> I'll play with this a bit more tomorrow, let me know what you think.

Ok, so here's what I think we should do:

1) Add the 'to' object into migrate_qmp(), so we can use
migrate_get_socket_address() to get the port. Leave the other
migrate_qmp* alone because they don't need the port;

2) Move the calls to migrate_get_socket_address() into migrate_qmp() and
get rid of that connect_uri, use args->connect_uri instead everywhere;

3) Add a new function SocketAddress_to_qdict() that does the same as
SocketAddress_to_str(), but fills in a QDict instead;

4) Add args->connect_channels and pass it to migrate_qmp() and
migrate_qmp_fail(). Convert the string to a dict;

if (channels) {
channels_obj = qobject_from_json(channels, &error_abort);
qdict_put_obj(args, "channels", channels_obj);
}

5)

Re: [RFC PATCH v5 04/22] target/arm: Implement ALLINT MSR (immediate)

2024-02-29 Thread Richard Henderson


On 2/29/24 03:10, Jinjie Ruan via wrote:

+static bool trans_MSR_i_ALLINT(DisasContext *s, arg_i *a)
+{
+if (!dc_isar_feature(aa64_nmi, s) || s->current_el == 0) {
+return false;
+}
+
+if (a->imm == 0) {
+clear_pstate_bits(PSTATE_ALLINT);
+} else if (s->current_el > 1) {
+set_pstate_bits(PSTATE_ALLINT);
+} else {
+gen_helper_msr_set_allint_el1(tcg_env);
+}
+
+s->base.is_jmp = DISAS_TOO_MANY;
+return true;
+}


I just noticed one final item: for imm == 0, we need

/* Exit the cpu loop to re-evaluate pending IRQs. */
s->base.is_jmp = DISAS_UPDATE_EXIT;

like trans_MSR_i_DAIFCLEAR.


r~

Re: [PATCH 0/2] Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"

2024-02-29 Thread Bernhard Beschow




Am 26. Februar 2024 21:59:07 UTC schrieb Bernhard Beschow :
>As reported by Volker [1], commit 6f6ad2b24582 "hw/i386/pc: Confine system
>
>flash handling to pc_sysfw" causes a regression when specifying the property
>
>`-M pflash0` in the PCI PC machines:
>
>  qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
>
>Revert the commit for now until a solution is found.
>

Ping

>
>
>Best regards,
>
>Bernhard
>
>
>
>[1] 
>https://lore.kernel.org/qemu-devel/9e82a04b-f2c1-4e34-b4b6-46a0581b5...@t-online.de/
>
>
>
>Bernhard Beschow (2):
>
>  Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove
>
>it"
>
>  Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
>
>
>
> include/hw/i386/pc.h |  2 ++
>
> hw/i386/pc.c |  1 +
>
> hw/i386/pc_piix.c|  1 +
>
> hw/i386/pc_sysfw.c   | 17 +
>
> 4 files changed, 17 insertions(+), 4 deletions(-)
>
>
>
>-- >
>2.44.0
>
>
>

Re: [PATCH] tcg/optimize: fix uninitialized variable

2024-02-29 Thread Richard Henderson


On 2/28/24 09:29, Richard Henderson wrote:

On 2/28/24 02:20, Paolo Bonzini wrote:

On Wed, Feb 28, 2024 at 12:19 PM Philippe Mathieu-Daudé
 wrote:


On 28/2/24 12:06, Paolo Bonzini wrote:

The variables uext_opc and sext_opc are used without initialization if
TCG_TARGET_extract_i{32,64}_valid returns false.  The result, depending
on the compiler, might be the generation of extract and sextract opcodes


Shouldn't compilers bark?


I expected that too...


Weird.  Anyhoo,

Reviewed-by: Richard Henderson 


Queued, thanks.

r~

Re: [PATCH] system/qdev-monitor: move drain_call_rcu call under if (!dev) in qmp_device_add()

2024-02-29 Thread Paolo Bonzini

Queued, thanks.

Paolo

Re: What is the correct way to add linker dependency to QEMU build system?

2024-02-29 Thread Paolo Bonzini

On Thu, Feb 29, 2024 at 6:02 PM Alex Bennée  wrote:
>
> Paz Offer  writes:
>
> > Hi,
> >
> > I want to add library 'libdl' to be linked with QEMU build for a particular 
> > target (e.g. - qemu-system-arm).
> > Using meson I would typically do 'compiler.find_library(...)', and later 
> > add the returned dependency to the binary
> > dependencies list.
> > However, in QEMU I understand that these configurations are done in
> > the './configure' file?
>
> No I'm pretty sure all the library finding is done in meson now:

Yes, indeed. The documentation lists the tasks of the configure script:

- detect the host architecture

- list the targets for which to build emulators; the list of targets
also affects which firmware binaries and tests to build

- find the compilers (native and cross) used to build executables,
firmware and tests.  The results are written as either Makefile
fragments (``config-host.mak``) or a Meson machine file
(``config-meson.cross``)

- create a virtual environment in which all Python code runs during
the build, and possibly install packages into it from PyPI

- invoke Meson in the virtual environment, to perform the actual
configuration step for the emulator build

Anything related to 1) building executables and documentation 2)
installing is done in meson.

Generally, non-Meson parts of the bulid system are limited to stuff
that has to be cross compiled. The exception is
contrib/plugins/Makefile (which I mention because you might be
interested in it as well), but that's just for documentation purposes,
so that it can be lifted out of the QEMU tree and used to build custom
plugins.

Paolo

Re: [PATCH] block: Remove special-casing of compound pages

2024-02-29 Thread Greg Edwards

On Thu, Feb 29, 2024 at 07:37:11PM +, Matthew Wilcox wrote:
> On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote:
>>> [1/1] block: Remove special-casing of compound pages
>>>   commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55
>>
>> This commit results in a change of behavior for QEMU VMs backed by hugepages
>> that open their VM disk image file with O_DIRECT (QEMU cache=none or
>> cache.direct=on options).  When the VM shuts down and the QEMU process exits,
>> one or two hugepages may fail to free correctly.  It appears to be a race, as
>> it doesn't happen every time.
>
> By sheer coincidence the very next email after this one was:
>
> https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328...@cybernetics.com/T/#u
>
> Can you try Tony's patch and see if it fixes your problem?
> I haven't even begun to analyse either your email or his patch,
> but there's a strong likelihood that they're the same thing.

This does appear to fix it.  Thank you!

I'll do some more testing on it today, then add a Tested-by: tag if it
holds up.

Greg

Re: [PATCH] block: Remove special-casing of compound pages

2024-02-29 Thread Greg Edwards

On Thu, Dec 07, 2023 at 02:04:26PM -0700, Jens Axboe wrote:
> On Mon, 14 Aug 2023 15:41:00 +0100, Matthew Wilcox (Oracle) wrote:
>> The special casing was originally added in pre-git history; reproducing
>> the commit log here:
>>
>>> commit a318a92567d77
>>> Author: Andrew Morton 
>>> Date:   Sun Sep 21 01:42:22 2003 -0700
>>>
>>> [PATCH] Speed up direct-io hugetlbpage handling
>>>
>>> This patch short-circuits all the direct-io page dirtying logic for
>>> higher-order pages.  Without this, we pointlessly bounce BIOs up to
>>> keventd all the time.
>>
>> [...]
>
> Applied, thanks!
>
> [1/1] block: Remove special-casing of compound pages
>   commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55

This commit results in a change of behavior for QEMU VMs backed by hugepages
that open their VM disk image file with O_DIRECT (QEMU cache=none or
cache.direct=on options).  When the VM shuts down and the QEMU process exits,
one or two hugepages may fail to free correctly.  It appears to be a race, as
it doesn't happen every time.

>From debugging on 6.8-rc6, when it occurs, the hugepage that fails to free has
a non-zero refcount when it hits the folio_put_testzero(folio) test in
release_pages().  On a failure test iteration with 1 GiB hugepages, the failing
folio had a mapcount of 0, refcount of 35, and folio_maybe_dma_pinned was true.

The problem only occurs when the VM disk image file is opened with O_DIRECT.
When using QEMU cache=writeback or cache.direct=off options, it does not occur.
We first noticed it on the 6.1.y stable kernel when this commit landed there
(6.1.75).

A very simple reproducer without KVM (just boot VM up, then shut it down):

echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
qemu-system-x86_64 \
-cpu qemu64 \
-m 1024 \
-nographic \
-mem-path /dev/hugepages/vm00 \
-mem-prealloc \
-drive file=test.qcow2,if=none,cache=none,id=drive0 \
-device virtio-blk-pci,drive=drive0,id=disk0,bootindex=1
rm -f /dev/hugepages/vm00

Some testing notes:

  * occurs with 6.1.75, 6.6.14, 6.8-rc6, and linux-next-20240229
  * occurs with 1 GiB and 2 MiB huge pages, with both hugetlbfs and memfd
  * occurs with QEMU 8.0.y, 8.1.y, 8.2.y, and master
  * occurs with (-enable-kvm -cpu host) or without (-cpu qemu64) KVM

Thanks for your time!

Greg

Re: [PATCH 08/16] block/qcow2: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Eric Blake

On Thu, Feb 29, 2024 at 12:37:15AM +0800, Zhao Liu wrote:
> From: Zhao Liu 
> 
> As the comment in qapi/error, passing @errp to error_prepend() requires
> ERRP_GUARD():
> 

> 
> In block/qcow2.c, there're 2 functions passing @errp to error_prepend()

s/there're/there are/

> without ERRP_GUARD():
>  - qcow2_co_create()
>  - qcow2_co_truncate()
> 
> Their @errp parameters are so widely sourced that it is necessary to
> protect their @errp with ERRP_GUARD().
> 
> Therefore, to avoid the issue like [1] said, add missing ERRP_GUARD() at
> their beginning.
> 
> [1]: Issue description in the commit message of commit ae7c80a7bd73
>  ("error: New macro ERRP_GUARD()").
> 
> Cc: Kevin Wolf 
> Cc: Hanna Reitz 
> Signed-off-by: Zhao Liu 
> ---
>  block/qcow2.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Eric Blake 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

[PATCH v2] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread Sven Schnelle

HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions which might changes the
memory location.

The limit of instructions is also reduced because scripts running on
the SCSI processor are usually very short. This keeps the time until
the loop is exit short.

Suggested-by: Peter Maydell 
Signed-off-by: Sven Schnelle 
---
Changes in v2:
- update comment in lsi_execute_script()
- reset waiting state and del timer in lsi_execute_script() to
  handle the case where script processing is triggered via
  register write, and not from the pending timer
- delete timer in lsi_scsi_exit()

 hw/scsi/lsi53c895a.c | 43 +--
 hw/scsi/trace-events |  2 ++
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb..4ff9470381 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@ static const char *names[] = {
 #define LSI_TAG_VALID (1 << 16)
 
 /* Maximum instructions to process. */
-#define LSI_MAX_INSN1
+#define LSI_MAX_INSN100
 
 typedef struct lsi_request {
 SCSIRequest *req;
@@ -205,6 +205,7 @@ enum {
 LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
 LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
 LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
 };
 
 enum {
@@ -224,6 +225,7 @@ struct LSIState {
 MemoryRegion ram_io;
 MemoryRegion io_io;
 AddressSpace pci_io_as;
+QEMUTimer *scripts_timer;
 
 int carry; /* ??? Should this be an a visible register somewhere?  */
 int status;
@@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
 s->sbr = 0;
 assert(QTAILQ_EMPTY(&s->queue));
 assert(!s->current);
+timer_del(s->scripts_timer);
 }
 
 static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
 }
 }
 
+static void lsi_scripts_timer_start(LSIState *s)
+{
+trace_lsi_scripts_timer_start();
+timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
 static void lsi_execute_script(LSIState *s)
 {
 PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1136,6 +1145,11 @@ static void lsi_execute_script(LSIState *s)
 int insn_processed = 0;
 static int reentrancy_level;
 
+if (s->waiting == LSI_WAIT_SCRIPTS) {
+timer_del(s->scripts_timer);
+s->waiting = LSI_NOWAIT;
+}
+
 reentrancy_level++;
 
 s->istat1 |= LSI_ISTAT1_SRUN;
@@ -1143,8 +1157,8 @@ again:
 /*
  * Some windows drivers make the device spin waiting for a memory location
  * to change. If we have executed more than LSI_MAX_INSN instructions then
- * assume this is the case and force an unexpected device disconnect. This
- * is apparently sufficient to beat the drivers into submission.
+ * assume this is the case and start a timer. Until the timer fires, the
+ * host CPU has a chance to run and change the memory location.
  *
  * Another issue (CVE-2023-0330) can occur if the script is programmed to
  * trigger itself again and again. Avoid this problem by stopping after
@@ -1152,13 +1166,8 @@ again:
  * which should be enough for all valid use cases).
  */
 if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-if (!(s->sien0 & LSI_SIST0_UDC)) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "lsi_scsi: inf. loop with UDC masked");
-}
-lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-lsi_disconnect(s);
-trace_lsi_execute_script_stop();
+s->waiting = LSI_WAIT_SCRIPTS;
+lsi_scripts_timer_start(s);
 reentrancy_level--;
 return;
 }
@@ -2197,6 +2206,9 @@ static int lsi_post_load(void *opaque, int version_id)
 return -EINVAL;
 }
 
+if (s->waiting == LSI_WAIT_SCRIPTS) {
+lsi_scripts_timer_start(s);
+}
 return 0;
 }
 
@@ -2294,6 +2306,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
 .cancel = lsi_request_cancelled
 };
 
+static void scripts_timer_cb(void *opaque)
+{
+LSIState *s = opaque;
+
+trace_lsi_scripts_timer_triggered();
+s->waiting = LSI_NOWAIT;
+lsi_execute_script(s);
+}
+
 static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
 {
 LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2334,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
   "lsi-ram", 0x2000);
 memory_region_init_io(&s->io_

Re: [PATCH] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread Sven Schnelle

BALATON Zoltan  writes:

> On Thu, 29 Feb 2024, Sven Schnelle wrote:
>> Some OS's like HP-UX 10.20 are spinn
>
> I guess the above line is left here by accident.

Yes.

>> HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
>> under certain circumstances. As the SCSI controller and CPU are not
>> running at the same time this loop will never finish. After some
>> time, the check loop interrupts with a unexpected device disconnect.
>> This works, but is slow because the kernel resets the scsi controller.
>> Instead of signaling UDC, start a timer and exit the loop. Until the
>> timer fires, the CPU can process instructions until the timer fires.
>> The limit of instructions is also reduced because scripts running
>> on the SCSI processor are usually very short. This keeps the time
>> until the loop-exit short.
>>
>> Suggested-by: Peter Maydell 
>> Signed-off-by: Sven Schnelle 
>> ---
>> hw/scsi/lsi53c895a.c | 33 +
>> 1 file changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
>> index d607a5f9fb..0b6f1dc72f 100644
>> --- a/hw/scsi/lsi53c895a.c
>> +++ b/hw/scsi/lsi53c895a.c
>> @@ -188,7 +188,7 @@ static const char *names[] = {
>> #define LSI_TAG_VALID (1 << 16)
>>
>> /* Maximum instructions to process. */
>> -#define LSI_MAX_INSN1
>> +#define LSI_MAX_INSN100
>>
>> typedef struct lsi_request {
>> SCSIRequest *req;
>> @@ -205,6 +205,7 @@ enum {
>> LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
>> LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
>> LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
>> +LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit 
>> */
>> };
>>
>> enum {
>> @@ -224,6 +225,7 @@ struct LSIState {
>> MemoryRegion ram_io;
>> MemoryRegion io_io;
>> AddressSpace pci_io_as;
>> +QEMUTimer *scripts_timer;
>>
>> int carry; /* ??? Should this be an a visible register somewhere?  */
>> int status;
>> @@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
>> s->sbr = 0;
>> assert(QTAILQ_EMPTY(&s->queue));
>> assert(!s->current);
>> +timer_del(s->scripts_timer);
>
> Maybe the rimer needs to be deleted in lsi_scsi_exit() too but I'm not
> sure.

I added it, thanks.

>> }
>>
>> static int lsi_dma_40bit(LSIState *s)
>> @@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
>> }
>> }
>>
>> +static void lsi_scripts_timer_start(LSIState *s)
>> +{
>> +trace_lsi_scripts_timer_start();
>> +timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 
>> 500);
>> +}
>> +
>> static void lsi_execute_script(LSIState *s)
>> {
>> PCIDevice *pci_dev = PCI_DEVICE(s);
>> @@ -1152,13 +1161,8 @@ again:
>>  * which should be enough for all valid use cases).
>>  */
>
> Does tha above comment need updating to say what the code does now?

Yes, i changed it to describe the new method:

/*
 * Some windows drivers make the device spin waiting for a memory location
 * to change. If we have executed more than LSI_MAX_INSN instructions then
 * assume this is the case and start a timer. Until the timer fires, the
 * host CPU has a chance to run and change the memory location.
 *
 * Another issue (CVE-2023-0330) can occur if the script is programmed to
 * trigger itself again and again. Avoid this problem by stopping after
 * being called multiple times in a reentrant way (8 is an arbitrary value
 * which should be enough for all valid use cases).
 */

> Regards,
> BALATON Zoltan
>
>> if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
>> -if (!(s->sien0 & LSI_SIST0_UDC)) {
>> -qemu_log_mask(LOG_GUEST_ERROR,
>> -  "lsi_scsi: inf. loop with UDC masked");
>> -}
>> -lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
>> -lsi_disconnect(s);
>> -trace_lsi_execute_script_stop();
>> +s->waiting = LSI_WAIT_SCRIPTS;
>> +lsi_scripts_timer_start(s);
>> reentrancy_level--;
>> return;
>> }
>> @@ -2197,6 +2201,9 @@ static int lsi_post_load(void *opaque, int version_id)
>> return -EINVAL;
>> }
>>
>> +if (s->waiting == LSI_WAIT_SCRIPTS) {
>> +lsi_scripts_timer_start(s);
>> +}
>> return 0;
>> }
>>
>> @@ -2294,6 +2301,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
>> .cancel = lsi_request_cancelled
>> };
>>
>> +static void scripts_timer_cb(void *opaque)
>> +{
>> +LSIState *s = opaque;
>> +
>> +trace_lsi_scripts_timer_triggered();
>> +s->waiting = LSI_NOWAIT;
>> +lsi_execute_script(s);
>> +}
>> +
>> static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
>> {
>> LSIState *s = LSI53C895A(dev);
>> @@ -2313,6 +2329,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error 
>> **errp)
>>   "lsi-ram", 0x2000);
>> memory_

Re: [PATCH] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread BALATON Zoltan


On Thu, 29 Feb 2024, Sven Schnelle wrote:

Some OS's like HP-UX 10.20 are spinn


I guess the above line is left here by accident.


HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions until the timer fires.
The limit of instructions is also reduced because scripts running
on the SCSI processor are usually very short. This keeps the time
until the loop-exit short.

Suggested-by: Peter Maydell 
Signed-off-by: Sven Schnelle 
---
hw/scsi/lsi53c895a.c | 33 +
1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb..0b6f1dc72f 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@ static const char *names[] = {
#define LSI_TAG_VALID (1 << 16)

/* Maximum instructions to process. */
-#define LSI_MAX_INSN1
+#define LSI_MAX_INSN100

typedef struct lsi_request {
SCSIRequest *req;
@@ -205,6 +205,7 @@ enum {
LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
};

enum {
@@ -224,6 +225,7 @@ struct LSIState {
MemoryRegion ram_io;
MemoryRegion io_io;
AddressSpace pci_io_as;
+QEMUTimer *scripts_timer;

int carry; /* ??? Should this be an a visible register somewhere?  */
int status;
@@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
s->sbr = 0;
assert(QTAILQ_EMPTY(&s->queue));
assert(!s->current);
+timer_del(s->scripts_timer);


Maybe the rimer needs to be deleted in lsi_scsi_exit() too but I'm not 
sure.



}

static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
}
}

+static void lsi_scripts_timer_start(LSIState *s)
+{
+trace_lsi_scripts_timer_start();
+timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
static void lsi_execute_script(LSIState *s)
{
PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1152,13 +1161,8 @@ again:
 * which should be enough for all valid use cases).
 */


Does tha above comment need updating to say what the code does now?

Regards,
BALATON Zoltan


if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-if (!(s->sien0 & LSI_SIST0_UDC)) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "lsi_scsi: inf. loop with UDC masked");
-}
-lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-lsi_disconnect(s);
-trace_lsi_execute_script_stop();
+s->waiting = LSI_WAIT_SCRIPTS;
+lsi_scripts_timer_start(s);
reentrancy_level--;
return;
}
@@ -2197,6 +2201,9 @@ static int lsi_post_load(void *opaque, int version_id)
return -EINVAL;
}

+if (s->waiting == LSI_WAIT_SCRIPTS) {
+lsi_scripts_timer_start(s);
+}
return 0;
}

@@ -2294,6 +2301,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
.cancel = lsi_request_cancelled
};

+static void scripts_timer_cb(void *opaque)
+{
+LSIState *s = opaque;
+
+trace_lsi_scripts_timer_triggered();
+s->waiting = LSI_NOWAIT;
+lsi_execute_script(s);
+}
+
static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
{
LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2329,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
  "lsi-ram", 0x2000);
memory_region_init_io(&s->io_io, OBJECT(s), &lsi_io_ops, s,
  "lsi-io", 256);
+s->scripts_timer = timer_new_us(QEMU_CLOCK_VIRTUAL, scripts_timer_cb, s);

/*
 * Since we use the address-space API to interact with ram_io, disable the

Re: [PATCH] tests: riscv64: Use 'zfa' instead of 'Zfa'

2024-02-29 Thread Daniel Henrique Barboza





On 2/29/24 15:06, Christoph Müllner wrote:

Running test-fcvtmod triggers the following deprecation warning:
   warning: CPU property 'Zfa' is deprecated. Please use 'zfa' instead
Let's fix that.

Signed-off-by: Christoph Müllner 
---


Reviewed-by: Daniel Henrique Barboza 


  tests/tcg/riscv64/Makefile.target | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/riscv64/Makefile.target 
b/tests/tcg/riscv64/Makefile.target
index a7e390c384..4da5b9a3b3 100644
--- a/tests/tcg/riscv64/Makefile.target
+++ b/tests/tcg/riscv64/Makefile.target
@@ -17,4 +17,4 @@ run-test-aes: QEMU_OPTS += -cpu rv64,zk=on
  TESTS += test-fcvtmod
  test-fcvtmod: CFLAGS += -march=rv64imafdc
  test-fcvtmod: LDFLAGS += -static
-run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,Zfa=true
+run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,zfa=true

Re: [q&a] Status of IOMMU virtualization for nested virtualization (userspace PCI drivers in VMs)

2024-02-29 Thread Peter Delevoryas




> On Feb 28, 2024, at 11:38 AM, Alex Williamson  
> wrote:
> 
> On Wed, 28 Feb 2024 10:29:32 -0800
> Peter Delevoryas  wrote:
> 
>> Hey guys,
>> 
>> I’m having a little trouble reading between the lines on various
>> docs, mailing list threads, KVM presentations, github forks, etc, so
>> I figured I’d just ask:
>> 
>> What is the status of IOMMU virtualization, like in the case where I
>> want a VM guest to have a virtual IOMMU?
> 
> It works fine for simply nested assignment scenarios, ie. guest
> userspace drivers or nested VMs.
> 
>> I found this great presentation from KVM Forum 2021: [1]
>> 
>> 1. I’m using -device intel-iommu right now. This has performance
>> implications and large DMA transfers hit the vfio_iommu_type1
>> dma_entry_limit on the host because of how the mappings are made.
> 
> Hugepages for the guest and mappings within the guest should help both
> the mapping performance and DMA entry limit.  In general the type1 vfio
> IOMMU backend is not optimized for dynamic mapping, so performance-wise
> your best bet is still to design the userspace driver for static DMA
> buffers.

Yep, huge pages definitely help, will probably switch to allocating them at 
boot for better guarantees.

> 
>> 2. -device virtio-iommu is an improvement, but it doesn’t seem
>> compatible with -device vfio-pci? I was only able to test this with
>> cloud-hypervisor, and it has a better vfio mapping pattern (avoids
>> hitting dma_entry_limit).
> 
> AFAIK it's just growing pains, it should work but it's working through
> bugs.

Oh really?? Ok: I might even be configuring things incorrectly, or
Maybe I need to upgrade from QEMU 7.1 to 8. I was relying on whatever
libvirt does by default, which seems to just be:

-device virtio-iommu -device vfio-pci,host=

But maybe I need some other options?

> 
>> 3. -object iommufd [2] I haven’t tried this quite yet, planning to:
>> if it’s using iommufd, and I have all the right kernel features in
>> the guest and host, I assume it’s implementing the passthrough mode
>> that AMD has described in their talk? Because I imagine that would be
>> the best solution for me, I’m just having trouble understanding if
>> it’s actually related or orthogonal.
> 
> For now iommufd provides a similar DMA mapping interface to type1, but
> it does remove the DMA entry limit and improves locked page accounting.
> 
> To really see a performance improvement relative to dynamic mappings,
> you'll need nesting support in the IOMMU, which is under active
> development.  From this aspect you will want iommufd since similar
> features will not be provided by type1.  Thanks,

I see, thanks! That’s great to hear.

> 
> Alex
>

Re: [PATCH 03/16] block: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Eric Blake

On Thu, Feb 29, 2024 at 12:37:10AM +0800, Zhao Liu wrote:
> From: Zhao Liu 
> 
> As the comment in qapi/error, passing @errp to error_prepend() requires
> ERRP_GUARD():
> 

> 
> In block.c, there're 4 functions passing @errp to error_prepend()
> without ERRP_GUARD():
>  - bdrv_co_create_opts_simple()
>  - parse_json_filename()
>  - bdrv_open_backing_file()
>  - bdrv_append_temp_snapshot()
> 
> bdrv_co_create_opts_simple(), as an implementation of
> BolckDriver.bdrv_co_create_opts(), its @errp parameter is so widely

BlockDriver

> sourced that it is necessary to protect it with ERRP_GUARD().
> 
> Though the @errp parameters passed to parse_json_filename(),
> bdrv_open_backing_file() and bdrv_append_temp_snapshot() points to their
> callers' local_err, to follow the requirement of @errp, also add missing
> ERRP_GUARD() at their beginning.
> 
> [1]: Issue description in the commit message of commit ae7c80a7bd73
>  ("error: New macro ERRP_GUARD()").
> 
> Cc: Kevin Wolf 
> Cc: Hanna Reitz 
> Signed-off-by: Zhao Liu 
> ---
>  block.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH 0/3] plugins/execlog: add data address match and address range support

2024-02-29 Thread Sven Schnelle

Hi Alex,

Alex Bennée  writes:

> Sven Schnelle  writes:
> I think we've lost a patch in the re-posting. patchew hasn't seen it
> either:
>
>   https://patchew.org/QEMU/20240229150729.1620410-1-sv...@stackframe.org/
>
>>   plugins/execlog: add data address match
>>   plugins/execlog: add address range matching
>>
>>  contrib/plugins/execlog.c | 95 ---
>>  1 file changed, 79 insertions(+), 16 deletions(-)

Yes, i got a 550 mail. But i'll look into the qemu_set_dfilter_ranges()
before resending.

Re: [PATCH] block: Remove special-casing of compound pages

2024-02-29 Thread Matthew Wilcox

On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote:
> > [1/1] block: Remove special-casing of compound pages
> >   commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55
> 
> This commit results in a change of behavior for QEMU VMs backed by hugepages
> that open their VM disk image file with O_DIRECT (QEMU cache=none or
> cache.direct=on options).  When the VM shuts down and the QEMU process exits,
> one or two hugepages may fail to free correctly.  It appears to be a race, as
> it doesn't happen every time.

Hi Greg,

By sheer coincidence the very next email after this one was:

https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328...@cybernetics.com/T/#u

Can you try Tony's patch and see if it fixes your problem?
I haven't even begun to analyse either your email or his patch,
but there's a strong likelihood that they're the same thing.

[PATCH] hw/scsi/lsi53c895a: add timer to scripts processing

2024-02-29 Thread Sven Schnelle

Some OS's like HP-UX 10.20 are spinn

HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions until the timer fires.
The limit of instructions is also reduced because scripts running
on the SCSI processor are usually very short. This keeps the time
until the loop-exit short.

Suggested-by: Peter Maydell 
Signed-off-by: Sven Schnelle 
---
 hw/scsi/lsi53c895a.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb..0b6f1dc72f 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@ static const char *names[] = {
 #define LSI_TAG_VALID (1 << 16)
 
 /* Maximum instructions to process. */
-#define LSI_MAX_INSN1
+#define LSI_MAX_INSN100
 
 typedef struct lsi_request {
 SCSIRequest *req;
@@ -205,6 +205,7 @@ enum {
 LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
 LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
 LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
 };
 
 enum {
@@ -224,6 +225,7 @@ struct LSIState {
 MemoryRegion ram_io;
 MemoryRegion io_io;
 AddressSpace pci_io_as;
+QEMUTimer *scripts_timer;
 
 int carry; /* ??? Should this be an a visible register somewhere?  */
 int status;
@@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
 s->sbr = 0;
 assert(QTAILQ_EMPTY(&s->queue));
 assert(!s->current);
+timer_del(s->scripts_timer);
 }
 
 static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
 }
 }
 
+static void lsi_scripts_timer_start(LSIState *s)
+{
+trace_lsi_scripts_timer_start();
+timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
 static void lsi_execute_script(LSIState *s)
 {
 PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1152,13 +1161,8 @@ again:
  * which should be enough for all valid use cases).
  */
 if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-if (!(s->sien0 & LSI_SIST0_UDC)) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "lsi_scsi: inf. loop with UDC masked");
-}
-lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-lsi_disconnect(s);
-trace_lsi_execute_script_stop();
+s->waiting = LSI_WAIT_SCRIPTS;
+lsi_scripts_timer_start(s);
 reentrancy_level--;
 return;
 }
@@ -2197,6 +2201,9 @@ static int lsi_post_load(void *opaque, int version_id)
 return -EINVAL;
 }
 
+if (s->waiting == LSI_WAIT_SCRIPTS) {
+lsi_scripts_timer_start(s);
+}
 return 0;
 }
 
@@ -2294,6 +2301,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
 .cancel = lsi_request_cancelled
 };
 
+static void scripts_timer_cb(void *opaque)
+{
+LSIState *s = opaque;
+
+trace_lsi_scripts_timer_triggered();
+s->waiting = LSI_NOWAIT;
+lsi_execute_script(s);
+}
+
 static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
 {
 LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2329,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
   "lsi-ram", 0x2000);
 memory_region_init_io(&s->io_io, OBJECT(s), &lsi_io_ops, s,
   "lsi-io", 256);
+s->scripts_timer = timer_new_us(QEMU_CLOCK_VIRTUAL, scripts_timer_cb, s);
 
 /*
  * Since we use the address-space API to interact with ram_io, disable the
-- 
2.43.2

Re: Intention to work on GSoC project

2024-02-29 Thread Sahil

Hi,

Thank you for your reply.

On Thursday, February 29, 2024 2:02:25 PM IST you wrote:
> [...]
> The first project is for sure related with migration. While vhost-user
> memory isolation is not really related to migration, but both are
> related to virtio devices.
> [...]
> Yeah, "bite-size" issues should be better to understand how to
> contribute to QEMU. Feel free to work on any issue, doing the work
> or helping to complete old patches.

Got it. In that case it'll be best to complete the bite-sized task related to
virtio [1]. A patch [2] had been long ago but it hasn't been completed yet.
I'll work on getting it merged.

I have already started reading the documentation on the migration subsystem
and the hands-on blogs in the project description. That will help me get started
with the first project.

> Feel free to reach us if you have more questions on the projects.

Yes, I'll definitely do that.

Thanks,
Sahil

[1] https://gitlab.com/qemu-project/qemu/-/issues/230
[2] 
https://patchwork.kernel.org/project/qemu-devel/patch/20220414112902.41390-1-codeguy.mot...@gmail.com/

Re: What is the correct way to add linker dependency to QEMU build system?

2024-02-29 Thread Paz Offer

Thanks a lot Peter,
This is working.
Paz

From: Peter Maydell 
Sent: Thursday, February 29, 2024 7:34 PM
To: Paz Offer 
Cc: qemu-devel@nongnu.org 
Subject: Re: What is the correct way to add linker dependency to QEMU build 
system?

External email: Use caution opening links or attachments

On Thu, 29 Feb 2024 at 16:10, Paz Offer  wrote:
> I want to add library 'libdl' to be linked with QEMU build for a particular 
> target (e.g. - qemu-system-arm).
> Using meson I would typically do 'compiler.find_library(...)', and later add 
> the returned dependency to the binary dependencies list.
> However, in QEMU I understand that these configurations are done in the 
> './configure' file?
>
> What would be the correct way to do this?

If you can do the job using the glib g_module_open()/
g_module_symbol() functions (which is how QEMU itself does
loading of plugin and module DLLs, and which on Unix hosts
are pretty much wrappers around dlopen/dlsym) then you
don't need to link against libdl at all.

If this is for something you're planning to upstream
then it might be worth talking at a higher level about
what you're aiming to do. (If it's for something downstream
that you don't plan to ever upstream then you can do
whatever's easiest for you, of course.)

-- PMM

Re: [PATCH 3/3] target/hppa: mask CR_SAR register writes to 5/6 bit in gdbstub

2024-02-29 Thread Richard Henderson


On 2/28/24 10:14, Sven Schnelle wrote:

Signed-off-by: Sven Schnelle 
---
  target/hppa/gdbstub.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/gdbstub.c b/target/hppa/gdbstub.c
index a5b2c80c07..049b2d6381 100644
--- a/target/hppa/gdbstub.c
+++ b/target/hppa/gdbstub.c
@@ -184,7 +184,7 @@ int hppa_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
  env->gr[n] = val;
  break;
  case 32:
-env->cr[CR_SAR] = val;
+env->cr[CR_SAR] = val & (hppa_is_pa20(env) ? 63 : 31);
  break;
  case 33:
  env->iaoq_f = val;


Reviewed-by: Richard Henderson 


r~

Re: [PATCH 11/17] hw/vfio/platform: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

The vfio_platform_realize() passes @errp to error_prepend(), and as a
DeviceClass.realize method, its @errp is so widely sourced that it is
necessary to protect it with ERRP_GUARD().

To avoid the issue like [1] said, add missing ERRP_GUARD() at the
beginning of this function.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/platform.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a8d9b7da633e..dcd2365fb353 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -576,6 +576,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, 
Error **errp)
   */
  static void vfio_platform_realize(DeviceState *dev, Error **errp)
  {
+ERRP_GUARD();
  VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
  SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
  VFIODevice *vbasedev = &vdev->vbasedev;

Re: [PATCH 10/17] hw/vfio/pci: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

In hw/vfio/pci.c, there're 2 functions passing @errp to error_prepend()
without ERRP_GUARD():
- vfio_add_std_cap()
- vfio_realize()

The @errp of vfio_add_std_cap() is also from vfio_realize(). And
vfio_realize(), as a PCIDeviceClass.realize method, its @errp is from
DeviceClass.realize so that there is no guarantee that the @errp won't
point to @error_fatal.

To avoid the issue like [1] said, add missing ERRP_GUARD() at their
beginning.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.

Re: [PATCH] tests: riscv64: Use 'zfa' instead of 'Zfa'

2024-02-29 Thread Richard Henderson


On 2/29/24 08:06, Christoph Müllner wrote:

Running test-fcvtmod triggers the following deprecation warning:
   warning: CPU property 'Zfa' is deprecated. Please use 'zfa' instead
Let's fix that.

Signed-off-by: Christoph Müllner 
---
  tests/tcg/riscv64/Makefile.target | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/riscv64/Makefile.target 
b/tests/tcg/riscv64/Makefile.target
index a7e390c384..4da5b9a3b3 100644
--- a/tests/tcg/riscv64/Makefile.target
+++ b/tests/tcg/riscv64/Makefile.target
@@ -17,4 +17,4 @@ run-test-aes: QEMU_OPTS += -cpu rv64,zk=on
  TESTS += test-fcvtmod
  test-fcvtmod: CFLAGS += -march=rv64imafdc
  test-fcvtmod: LDFLAGS += -static
-run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,Zfa=true
+run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,zfa=true


Reviewed-by: Richard Henderson 

I saw this myself the other day and didn't get around to sending a patch.


r~

Re: [PATCH 09/17] hw/vfio/pci-quirks: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

In hw/vfio/pci-quirks.c, there're 2 functions passing @errp to
error_prepend() without ERRP_GUARD():
- vfio_add_nv_gpudirect_cap()
- vfio_add_vmd_shadow_cap()

Their @errp parameters are so widely sourced that it is necessary to
protect them with ERRP_GUARD().

To avoid the issue like [1] said, add missing ERRP_GUARD() at the
beginning of this function.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/pci-quirks.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 84b1a7b9485c..496fd1ee86bd 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1538,6 +1538,7 @@ static bool is_valid_std_cap_offset(uint8_t pos)
  
  static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)

  {
+ERRP_GUARD();
  PCIDevice *pdev = &vdev->pdev;
  int ret, pos;
  bool c8_conflict = false, d4_conflict = false;
@@ -1630,6 +1631,7 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, 
Error **errp)
  #define VMD_SHADOW_CAP_LEN 24
  static int vfio_add_vmd_shadow_cap(VFIOPCIDevice *vdev, Error **errp)
  {
+ERRP_GUARD();
  uint8_t membar_phys[16];
  int ret, pos = 0xE8;

Re: [PATCH 08/17] hw/vfio/iommufd: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

The iommufd_cdev_getfd() passes @errp to error_prepend(). Its @errp is
from vfio_attach_device(), which @errp parameter is so widely sourced
that it is necessary to protect it with ERRP_GUARD().

To avoid the issue like [1] said, add missing ERRP_GUARD() at the
beginning of this function.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/iommufd.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9bfddc136089..7baf49e6ee9e 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -116,6 +116,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice 
*vbasedev)
  
  static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)

  {
+ERRP_GUARD();
  long int ret = -ENOTTY;
  char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
  DIR *dir = NULL;

Re: [PATCH 07/17] hw/vfio/helpers: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

In hw/vfio/helpers.c, there're 3 functions passing @errp to
error_prepend() without ERRP_GUARD():
  - vfio_set_irq_signaling()
  - vfio_device_get_name()
  - vfio_device_set_fd()

As the widely used helpers, their @errp parameters are so widely sourced
that it is necessary to protect them with ERRP_GUARD().

To avoid the issue like [1] said, add missing ERRP_GUARD() at their
beginning.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/helpers.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 678987080228..47b4096c05ee 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -110,6 +110,7 @@ static const char *index_to_str(VFIODevice *vbasedev, int 
index)
  int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
 int action, int fd, Error **errp)
  {
+ERRP_GUARD();
  struct vfio_irq_set *irq_set;
  int argsz, ret = 0;
  const char *name;
@@ -613,6 +614,7 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, 
uint16_t cap_type)
  
  int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)

  {
+ERRP_GUARD();
  struct stat st;
  
  if (vbasedev->fd < 0) {

@@ -644,6 +646,7 @@ int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
  
  void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)

  {
+ERRP_GUARD();
  int fd = monitor_fd_param(monitor_cur(), str, errp);
  
  if (fd < 0) {

Re: [PATCH 06/17] hw/vfio/container: Fix missing ERRP_GUARD() for error_prepend()

2024-02-29 Thread Cédric Le Goater


Hello,

On 2/29/24 15:39, Zhao Liu wrote:

From: Zhao Liu 

As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is the pointer of
error_fatal, the user can't see this additional information, because
exit() happens in error_setg earlier than information is added [1].

The vfio_get_group() passes @errp to error_prepend(). Its @errp is
from vfio_attach_device(), which @errp parameter is so widely sourced
that it is necessary to protect it with ERRP_GUARD().

To avoid the issue like [1] said, add missing ERRP_GUARD() at the
beginning of this function.

[1]: Issue description in the commit message of commit ae7c80a7bd73
  ("error: New macro ERRP_GUARD()").

Cc: Alex Williamson 
Cc: "Cédric Le Goater" 
Signed-off-by: Zhao Liu 



Reviewed-by: Cédric Le Goater 

Thanks,

C.

Re: [RFC PATCH] disas/hppa: drop raw opcode dump

2024-02-29 Thread Richard Henderson


On 2/29/24 04:05, Alex Bennée wrote:

The hppa disassembly is different from the others due to leading with
the raw opcode data. This confuses plugins looking for instruction
prefixes to match instructions. For plugins like execlog there is
another mechanism for getting the instruction byte data.

For the sake of consistently just present the instruction assembly
code.

Signed-off-by: Alex Bennée 
---
  disas/hppa.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/disas/hppa.c b/disas/hppa.c
index 22dce9b41bb..dd34cce211b 100644
--- a/disas/hppa.c
+++ b/disas/hppa.c
@@ -1972,10 +1972,6 @@ print_insn_hppa (bfd_vma memaddr, disassemble_info *info)
  
insn = bfd_getb32 (buffer);
  
-  info->fprintf_func(info->stream, " %02x %02x %02x %02x   ",

-(insn >> 24) & 0xff, (insn >> 16) & 0xff,
-(insn >>  8) & 0xff, insn & 0xff);
-


It's hardly the only one doing this.  Our capstone dumper does this, and glancing at some 
others riscv.c does as well.


When you say "the others", I think you mean "everything using capstone", which has uses a 
different print function entirely for plugins just to avoid the dump.



r~

Re: [PATCH v6 06/12] tests/plugin/mem: migrate to new per_vcpu API

2024-02-29 Thread Alex Bennée

Pierrick Bouvier  writes:

> Reviewed-by: Luc Michel 
> Signed-off-by: Pierrick Bouvier 
> ---
>  tests/plugin/mem.c | 46 +++---
>  1 file changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
> index 44e91065ba7..b650dddcce1 100644
> --- a/tests/plugin/mem.c
> +++ b/tests/plugin/mem.c
> @@ -16,9 +16,14 @@
>  
>  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>  
> -static uint64_t inline_mem_count;
> -static uint64_t cb_mem_count;
> -static uint64_t io_count;
> +typedef struct {
> +uint64_t mem_count;
> +uint64_t io_count;
> +} CPUCount;
> +
> +static struct qemu_plugin_scoreboard *counts;
> +static qemu_plugin_u64 mem_count;
> +static qemu_plugin_u64 io_count;
>  static bool do_inline, do_callback;
>  static bool do_haddr;
>  static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
> @@ -27,16 +32,16 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
>  {
>  g_autoptr(GString) out = g_string_new("");
>  
> -if (do_inline) {
> -g_string_printf(out, "inline mem accesses: %" PRIu64 "\n", 
> inline_mem_count);
> -}
> -if (do_callback) {
> -g_string_append_printf(out, "callback mem accesses: %" PRIu64 "\n", 
> cb_mem_count);
> +if (do_inline || do_callback) {
> +g_string_printf(out, "mem accesses: %" PRIu64 "\n",
> +qemu_plugin_u64_sum(mem_count));
>  }
>  if (do_haddr) {
> -g_string_append_printf(out, "io accesses: %" PRIu64 "\n", io_count);
> +g_string_append_printf(out, "io accesses: %" PRIu64 "\n",
> +   qemu_plugin_u64_sum(io_count));
>  }
>  qemu_plugin_outs(out->str);
> +qemu_plugin_scoreboard_free(counts);
>  }
>  
>  static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
> @@ -46,12 +51,12 @@ static void vcpu_mem(unsigned int cpu_index, 
> qemu_plugin_meminfo_t meminfo,
>  struct qemu_plugin_hwaddr *hwaddr;
>  hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
>  if (qemu_plugin_hwaddr_is_io(hwaddr)) {
> -io_count++;
> +qemu_plugin_u64_add(io_count, cpu_index, 1);
>  } else {
> -cb_mem_count++;
> +qemu_plugin_u64_add(mem_count, cpu_index, 1);
>  }
>  } else {
> -cb_mem_count++;
> +qemu_plugin_u64_add(mem_count, cpu_index, 1);
>  }
>  }
>  
> @@ -64,9 +69,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct 
> qemu_plugin_tb *tb)
>  struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
>  
>  if (do_inline) {
> -qemu_plugin_register_vcpu_mem_inline(insn, rw,
> - QEMU_PLUGIN_INLINE_ADD_U64,
> - &inline_mem_count, 1);
> +qemu_plugin_register_vcpu_mem_inline_per_vcpu(
> +insn, rw,
> +QEMU_PLUGIN_INLINE_ADD_U64,
> +mem_count, 1);
>  }
>  if (do_callback) {
>  qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem,
> @@ -117,6 +123,16 @@ QEMU_PLUGIN_EXPORT int 
> qemu_plugin_install(qemu_plugin_id_t id,
>  }
>  }
>  
> +if (do_inline && do_callback) {
> +fprintf(stderr,
> +"can't enable inline and callback counting at the same 
> time\n");
> +return -1;
> +}
> +
> +counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
> +mem_count = qemu_plugin_scoreboard_u64_in_struct(
> +counts, CPUCount, mem_count);
> +io_count = qemu_plugin_scoreboard_u64_in_struct(counts, CPUCount, 
> io_count);
>  qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>  qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
>  return 0;

You need the following to keep check-tcg working:

modified   tests/tcg/Makefile.target
@@ -168,7 +168,7 @@ RUN_TESTS+=$(EXTRA_RUNS)
 
 # Some plugins need additional arguments above the default to fully
 # exercise things. We can define them on a per-test basis here.
-run-plugin-%-with-libmem.so: 
PLUGIN_ARGS=$(COMMA)inline=true$(COMMA)callback=true
+run-plugin-%-with-libmem.so: PLUGIN_ARGS=$(COMMA)inline=true
 
 ifeq ($(filter %-softmmu, $(TARGET)),)
 run-%: %

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH] tests: riscv64: Use 'zfa' instead of 'Zfa'

2024-02-29 Thread Christoph Müllner

Running test-fcvtmod triggers the following deprecation warning:
  warning: CPU property 'Zfa' is deprecated. Please use 'zfa' instead
Let's fix that.

Signed-off-by: Christoph Müllner 
---
 tests/tcg/riscv64/Makefile.target | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/riscv64/Makefile.target 
b/tests/tcg/riscv64/Makefile.target
index a7e390c384..4da5b9a3b3 100644
--- a/tests/tcg/riscv64/Makefile.target
+++ b/tests/tcg/riscv64/Makefile.target
@@ -17,4 +17,4 @@ run-test-aes: QEMU_OPTS += -cpu rv64,zk=on
 TESTS += test-fcvtmod
 test-fcvtmod: CFLAGS += -march=rv64imafdc
 test-fcvtmod: LDFLAGS += -static
-run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,Zfa=true
+run-test-fcvtmod: QEMU_OPTS += -cpu rv64,d=true,zfa=true
-- 
2.43.2

Re: [PATCH 3/3] plugins/execlog: add address range matching

2024-02-29 Thread Sven Schnelle

Hi Alex,

Alex Bennée  writes:

> Sven Schnelle  writes:
>> +static void parse_vaddr_match(GArray **matches, char *token)
>>  {
>> -uint64_t v = g_ascii_strtoull(match, NULL, 16);
>> +uint64_t low, high;
>> +gchar *endp;
>>  
>> -if (!matches) {
>> -*matches = g_array_new(false, true, sizeof(uint64_t));
>> +low = g_ascii_strtoull(token, &endp, 16);
>> +if (endp == token) {
>> +fprintf(stderr, "Invalid address(range) specified: %s\n", token);
>> +return;
>> +}
>> +
>> +if (*endp != '-') {
>> +high = low;
>> +} else {
>> +high = g_ascii_strtoull(endp + 1, &endp, 16);
>> +if (endp == token) {
>> +fprintf(stderr, "Invalid address(range) specified: %s\n", 
>> token);
>> +return;
>> +}
>> +}
>> +
>> +if (!*matches) {
>> +*matches = g_array_new(false, true, sizeof(struct address_match));
>>  }
>> -g_array_append_val(*matches, v);
>> +struct address_match *match = g_new(struct address_match, 1);
>> +match->low = low;
>> +match->high = high;
>> +g_array_append_val(*matches, match);
>
> This is almost but not quite qemu_set_dfilter_ranges(). I wonder if it
> would be worth a light re-factoring and then exposing the parser as a
> helper function?

Thanks, I'll take a look. I wasn't aware of qemu_set_dfilter_ranges().

1 2 3 4 >

1 - 100 of 329 matches

Mail list logo