Re: [PATCH v15 3/8] net/vmnet: implement shared mode (vmnet-shared)

2022-03-01 Thread Vladislav Yaroshchuk
On Tue, Mar 1, 2022 at 8:52 AM Akihiko Odaki 
wrote:

> On 2022/02/28 20:59, Vladislav Yaroshchuk wrote:
> >
> >
> > On Sat, Feb 26, 2022 at 3:27 PM Akihiko Odaki  > > wrote:
> >
> > On Sat, Feb 26, 2022 at 8:33 PM Vladislav Yaroshchuk
> >  > > wrote:
> >  >
> >  >
> >  >
> >  > On Sat, Feb 26, 2022 at 12:16 PM Akihiko Odaki
> > mailto:akihiko.od...@gmail.com>> wrote:
> >  >>
> >  >> On 2022/02/26 17:37, Vladislav Yaroshchuk wrote:
> >  >> >
> >  >> > Hi Akihiko,
> >  >> >
> >  >> > On Fri, Feb 25, 2022 at 8:46 PM Akihiko Odaki
> > mailto:akihiko.od...@gmail.com>
> >  >> >  > >> wrote:
> >  >> >
> >  >> > On 2022/02/26 2:13, Vladislav Yaroshchuk wrote:
> >  >> >  > Interaction with vmnet.framework in different modes
> >  >> >  > differs only on configuration stage, so we can create
> >  >> >  > common `send`, `receive`, etc. procedures and reuse
> them.
> >  >> >  >
> >  >> >  > vmnet.framework supports iov, but writing more than
> >  >> >  > one iov into vmnet interface fails with
> >  >> >  > 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
> >  >> >  > one and passing it to vmnet works fine. That's the
> >  >> >  > reason why receive_iov() left unimplemented. But it
> still
> >  >> >  > works with good enough performance having .receive()
> >  >> >  > net/vmnet: implement shared mode (vmnet-shared)
> >  >> >  >
> >  >> >  > Interaction with vmnet.framework in different modes
> >  >> >  > differs only on configuration stage, so we can create
> >  >> >  > common `send`, `receive`, etc. procedures and reuse
> them.
> >  >> >  >
> >  >> >  > vmnet.framework supports iov, but writing more than
> >  >> >  > one iov into vmnet interface fails with
> >  >> >  > 'VMNET_INVALID_ARGUMENT'. Collecting provided iovs into
> >  >> >  > one and passing it to vmnet works fine. That's the
> >  >> >  > reason why receive_iov() left unimplemented. But it
> still
> >  >> >  > works with good enough performance having .receive()
> >  >> >  > implemented only.
> >  >> >  >
> >  >> >  > Also, there is no way to unsubscribe from vmnet packages
> >  >> >  > receiving except registering and unregistering event
> >  >> >  > callback or simply drop packages just ignoring and
> >  >> >  > not processing them when related flag is set. Here we do
> >  >> >  > using the second way.
> >  >> >  >
> >  >> >  > Signed-off-by: Phillip Tennen  > 
> >  >> > >>
> >  >> >  > Signed-off-by: Vladislav Yaroshchuk
> >  >> >  > 
> >  >> >  > >>
> >  >> >
> >  >> > Thank you for persistently working on this.
> >  >> >
> >  >> >  > ---
> >  >> >  >   net/vmnet-common.m | 302
> >  >> > +
> >  >> >  >   net/vmnet-shared.c |  94 +-
> >  >> >  >   net/vmnet_int.h|  39 +-
> >  >> >  >   3 files changed, 430 insertions(+), 5 deletions(-)
> >  >> >  >
> >  >> >  > diff --git a/net/vmnet-common.m b/net/vmnet-common.m
> >  >> >  > index 56612c72ce..2f70921cae 100644
> >  >> >  > --- a/net/vmnet-common.m
> >  >> >  > +++ b/net/vmnet-common.m
> >  >> >  > @@ -10,6 +10,8 @@
> >  >> >  >*/
> >  >> >  >
> >  >> >  >   #include "qemu/osdep.h"
> >  >> >  > +#include "qemu/main-loop.h"
> >  >> >  > +#include "qemu/log.h"
> >  >> >  >   #include "qapi/qapi-types-net.h"
> >  >> >  >   #include "vmnet_int.h"
> >  >> >  >   #include "clients.h"
> >  >> >  > @@ -17,4 +19,304 @@
> >  >> >  >   #include "qapi/error.h"
> >  >> >  >
> >  >> >  >   #include 
> >  >> >  > +#include 
> >  >> >  >
> >  >> >  > +
> >  >> >  > +static inline void
> > vmnet_set_send_bh_scheduled(VmnetCommonState *s,
> >  >> >  > +   bool
> > enable)
> >  >> >  > +{
> >  >> >  > +qatomic_set(&s->send_scheduled, enable);
> >  >> >  > +}
> >  >> >  > +
> >  >> >  > +
> >  >> >  > +static inline bool
> > vmnet_is_send_bh_scheduled(VmnetCommonState *s)
> >  >> >  > +{
> >  >> >  > +return qatomic_load_acquire(&s->send_scheduled);
> >  >> >  > +}

Re: [PATCH v15 3/8] net/vmnet: implement shared mode (vmnet-shared)

2022-03-01 Thread Akihiko Odaki

On 2022/03/01 17:09, Vladislav Yaroshchuk wrote:

 > Not sure that only one field is enough, cause
 > we may have two states on bh execution start:
 > 1. There are packets in vmnet buffer s->packets_buf
 >      that were rejected by qemu_send_async and waiting
 >      to be sent. If this happens, we should complete sending
 >      these waiting packets with qemu_send_async firstly,
 >      and after that we should call vmnet_read to get
 >      new ones and send them to QEMU;
 > 2. There are no packets in s->packets_buf to be sent to
 >      qemu, we only need to get new packets from vmnet
 >      with vmnet_read and send them to QEMU

In case 1, you should just keep calling qemu_send_packet_async.
Actually
qemu_send_packet_async adds the packet to its internal queue and calls
the callback when it is consumed.


I'm not sure we can keep calling qemu_send_packet_async,
because as docs from net/queue.c says:

/* [...]
  * If a sent callback is provided to send(), the caller must handle a
  * zero return from the delivery handler by not sending any more packets
  * until we have invoked the callback. Only in that case will we queue
  * the packet.
  *
  * If a sent callback isn't provided, we just drop the packet to avoid
  * unbounded queueing.
  */

So after we did vmnet_read and read N packets
into temporary s->packets_buf, we begin calling
qemu_send_packet_async. If it returns 0 - it says
"no more packets until sent_cb called please".
At this moment we have N packets in s->packets_buf
and already queued K < N of them. But, packets K..N
are not queued and keep waiting for sent_cb to be sent
with qemu_send_packet_async.
Thus when sent_cb called, we should finish
our transfer of packets K..N from s->packets_buf
to qemu calling qemu_send_packet_async.
I meant this.


I missed the comment. The description is contradicting with the actual 
code; qemu_net_queue_send_iov appends the packet to the queue whenever 
it cannot send one immediately.


Jason Wang, I saw you are in the MAINTAINERS for net/. Can you tell if 
calling qemu_send_packet_async is allowed after it returns 0?


Regards,
Akihiko Odaki



Re: [PATCH] ppc/pnv: fix default PHB4 QOM hierarchy

2022-03-01 Thread Cédric Le Goater

On 2/28/22 14:51, Daniel Henrique Barboza wrote:



On 2/26/22 10:49, Cédric Le Goater wrote:

On 2/18/22 21:28, Daniel Henrique Barboza wrote:

Commit 3f4c369ea63e ("ppc/pnv: make PECs create and realize PHB4s")
changed phb4_pec code to create the default PHB4 objects in
pnv_pec_default_phb_realize(). In this process the stacks[] PEC array was
removed and each PHB4 object is tied together with its PEC via the
phb->pec pointer.

This change also broke the previous QOM hierarchy - the PHB4 objects are
being created and not being parented to their respective chips. This can
be verified by 'info pic' in a powernv9 domain with default settings.
pnv_chip_power9_pic_print_info() will fail to find the PHBs because
object_child_foreach_recursive() won't find any.

The solution is to set the parent chip and the parent bus, in the same
way done for user created PHB4 devices, for all PHB4 devices.

Fixes: 3f4c369ea63e ("ppc/pnv: make PECs create and realize PHB4s")
Signed-off-by: Daniel Henrique Barboza 



What about the pnv-phb3/4-root-port devices ? Should we attached
them also to the QOM hierarchy ?



I guess it wouldn't hurt. I'll see what I can do.


I took it as it is for ppc-7.0. Changes can come after. Nothing critical.

Thanks,

C.




Re: [PATCH v14 0/4] PMU-EBB support for PPC64 TCG

2022-03-01 Thread Cédric Le Goater

On 2/25/22 11:11, Daniel Henrique Barboza wrote:

Hi,

This new version contains a change suggested by Richard in patch 4. No
function change was made.

Changes from v13:
- patch 1:
   * added Richard's r-b
- patch 4:
   * renamed helper_ebb_perfm_excp() to raise_ebb_perfm_exception(). The
 function is no longer declared as translation helper
- v13 link: https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg05414.html

Daniel Henrique Barboza (4):
   target/ppc: make power8-pmu.c CONFIG_TCG only
   target/ppc: finalize pre-EBB PMU logic
   target/ppc: add PPC_INTERRUPT_EBB and EBB exceptions
   target/ppc: trigger PERFM EBBs from power8-pmu.c

  target/ppc/cpu.h | 10 -
  target/ppc/cpu_init.c| 20 +-
  target/ppc/excp_helper.c | 81 
  target/ppc/machine.c |  6 ++-
  target/ppc/meson.build   |  2 +-
  target/ppc/power8-pmu.c  | 39 +--
  target/ppc/power8-pmu.h  |  4 +-
  7 files changed, 144 insertions(+), 18 deletions(-)




Applied to ppc-7.0.

Thanks,

C.



Re: [PATCH v5 00/49] target/ppc: PowerISA Vector/VSX instruction batch

2022-03-01 Thread Cédric Le Goater

On 2/25/22 22:08, matheus.fe...@eldorado.org.br wrote:

From: Matheus Ferst 

This patch series implements 5 missing instructions from PowerISA v3.0
and 58 new instructions from PowerISA v3.1, moving 87 other instructions
to decodetree along the way.

Patches without review: 4, 24, 26, 27, 34, 35, 38, 40, 44-46


I think we are done.

Applied to ppc-7.0.

Thanks,

C.



Re: [PATCH v2 05/22] hw/ppc/pnv: Determine ns16550's IRQ number from QOM property

2022-03-01 Thread Cédric Le Goater

On 2/27/22 23:17, Philippe Mathieu-Daudé wrote:

On 22/2/22 20:34, Bernhard Beschow wrote:

Determine the IRQ number in the same way as for isa-ipmi-bt. This resolves
the last usage of ISADevice::isairq[] which allows it to be removed.

Signed-off-by: Bernhard Beschow 
---
  hw/ppc/pnv.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 837146a2fb..1e9f6b0690 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -380,9 +380,12 @@ static void pnv_dt_serial(ISADevice *d, void *fdt, int 
lpc_off)
  cpu_to_be32(io_base),
  cpu_to_be32(8)
  };
+    uint32_t irq;
  char *name;
  int node;
+    irq = object_property_get_int(OBJECT(d), "irq", &error_fatal);


object_property_get_[u]int(), otherwise:


Fixed it.
 

Reviewed-by: Philippe Mathieu-Daudé 


Applied to ppc-7.0.

Thanks,

C.




  name = g_strdup_printf("%s@i%x", qdev_fw_name(DEVICE(d)), io_base);
  node = fdt_add_subnode(fdt, lpc_off, name);
  _FDT(node);
@@ -394,7 +397,7 @@ static void pnv_dt_serial(ISADevice *d, void *fdt, int 
lpc_off)
  _FDT((fdt_setprop_cell(fdt, node, "clock-frequency", 1843200)));
  _FDT((fdt_setprop_cell(fdt, node, "current-speed", 115200)));
-    _FDT((fdt_setprop_cell(fdt, node, "interrupts", d->isairq[0])));
+    _FDT((fdt_setprop_cell(fdt, node, "interrupts", irq)));
  _FDT((fdt_setprop_cell(fdt, node, "interrupt-parent",
 fdt_get_phandle(fdt, lpc_off;







[PATCH v2 00/25] migration: Postcopy Preemption

2022-03-01 Thread Peter Xu
This is v2 of postcopy preempt series.  It can also be found here:

  https://github.com/xzpeter/qemu/tree/postcopy-preempt

RFC: https://lore.kernel.org/qemu-devel/20220119080929.39485-1-pet...@redhat.com
V1:  https://lore.kernel.org/qemu-devel/20220216062809.57179-1-pet...@redhat.com

v1->v2 changelog:
- Picked up more r-bs from Dave
- Rename both fault threads to drop "qemu/" prefix [Dave]
- Further rework on postcopy recovery, to be able to detect qemufile errors
  from either main channel or postcopy one [Dave]
- shutdown() qemufile before close on src postcopy channel when postcopy is
  paused [Dave]
- In postcopy_preempt_new_channel(), explicitly set the new channel in
  blocking state, even if it's the default [Dave]
- Make RAMState.postcopy_channel unsigned int [Dave]
- Added patches:
  - "migration: Create the postcopy preempt channel asynchronously"
  - "migration: Parameter x-postcopy-preempt-break-huge"
  - "migration: Add helpers to detect TLS capability"
  - "migration: Fail postcopy preempt with TLS"
  - "tests: Pass in MigrateStart** into test_migrate_start()"

Abstract


This series added a new migration capability called "postcopy-preempt".  It can
be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
postcopy page requests handling process.

Below are some initial postcopy page request latency measurements after the
new series applied.

For each page size, I measured page request latency for three cases:

  (a) Vanilla:the old postcopy
  (b) Preempt no-break-huge:  preempt enabled, x-postcopy-preempt-break-huge=off
  (c) Preempt full:   preempt enabled, x-postcopy-preempt-break-huge=on
  (this is the default option when preempt enabled)

Here x-postcopy-preempt-break-huge parameter is just added in v2 so as to
conditionally disable the behavior to break sending a precopy huge page for
debugging purpose.  So when it's off, postcopy will not preempt precopy
sending a huge page, but still postcopy will use its own channel.

I tested it separately to give a rough idea on which part of the change
helped how much of it.  The overall benefit should be the comparison
between case (a) and (c).

  |---+-+---+--|
  | Page size | Vanilla | Preempt no-break-huge | Preempt full |
  |---+-+---+--|
  | 4K|   10.68 |   N/A [*] | 0.57 |
  | 2M|   10.58 |  5.49 | 5.02 |
  | 1G| 2046.65 |   933.185 |  649.445 |
  |---+-+---+--|
  [*]: This case is N/A because 4K page does not contain huge page at all

[1] 
https://github.com/xzpeter/small-stuffs/blob/master/tools/huge_vm/uffd-latency.bpf

TODO List
=

TLS support
---

I only noticed its missing very recently.  Since soft freeze is coming, and
obviously I'm still growing this series, so I tend to have the existing
material discussed. Let's see if it can still catch the train for QEMU 7.0
release (soft freeze on 2022-03-08)..

Avoid precopy write() blocks postcopy
-

I didn't prove this, but I always think the write() syscalls being blocked
for precopy pages can affect postcopy services.  If we can solve this
problem then my wild guess is we can further reduce the average page
latency.

Two solutions at least in mind: (1) we could have made the write side of
the migration channel NON_BLOCK too, or (2) multi-threads on send side,
just like multifd, but we may use lock to protect which page to send too
(e.g., the core idea is we should _never_ rely anything on the main thread,
multifd has that dependency on queuing pages only on main thread).

That can definitely be done and thought about later.

Multi-channel for preemption threads


Currently the postcopy preempt feature use only one extra channel and one
extra thread on dest (no new thread on src QEMU).  It should be mostly good
enough for major use cases, but when the postcopy queue is long enough
(e.g. hundreds of vCPUs faulted on different pages) logically we could
still observe more delays in average.  Whether growing threads/channels can
solve it is debatable, but sounds worthwhile a try.  That's yet another
thing we can think about after this patchset lands.

Logically the design provides space for that - the receiving postcopy
preempt thread can understand all ram-layer migration protocol, and for
multi channel and multi threads we could simply grow that into multile
threads handling the same protocol (with multiple PostcopyTmpPage).  The
source needs more thoughts on synchronizations, though, but it shouldn't
affect the whole protocol layer, so should be easy to keep compatible.

Patch Layout


Patch 1-3: Three leftover patches from patchset "[PATCH v3 0/8] migration:
Postcopy cleanup on ram disg

[PATCH v2 01/25] migration: Dump sub-cmd name in loadvm_process_command tp

2022-03-01 Thread Peter Xu
It'll be easier to read the name rather than index of sub-cmd when debugging.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 3 ++-
 migration/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 1599b02fbc..7bb65e1d61 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2273,12 +2273,13 @@ static int loadvm_process_command(QEMUFile *f)
 return qemu_file_get_error(f);
 }
 
-trace_loadvm_process_command(cmd, len);
 if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
 error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
 return -EINVAL;
 }
 
+trace_loadvm_process_command(mig_cmd_args[cmd].name, len);
+
 if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
 error_report("%s received with bad length - expecting %zu, got %d",
  mig_cmd_args[cmd].name,
diff --git a/migration/trace-events b/migration/trace-events
index 48aa7b10ee..123cfe79d7 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -22,7 +22,7 @@ loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) 
"%s: %ud"
-loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command(const char *s, uint16_t len) "com=%s len=%d"
 loadvm_process_command_ping(uint32_t val) "0x%x"
 postcopy_ram_listen_thread_exit(void) ""
 postcopy_ram_listen_thread_start(void) ""
-- 
2.32.0




[PATCH v2 02/25] migration: Finer grained tracepoints for POSTCOPY_LISTEN

2022-03-01 Thread Peter Xu
The enablement of postcopy listening has a few steps, add a few tracepoints to
be there ready for some basic measurements for them.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 9 -
 migration/trace-events | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7bb65e1d61..190cc5fc42 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1948,9 +1948,10 @@ static void *postcopy_ram_listen_thread(void *opaque)
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
 PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
-trace_loadvm_postcopy_handle_listen();
 Error *local_err = NULL;
 
+trace_loadvm_postcopy_handle_listen("enter");
+
 if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
 error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
 return -1;
@@ -1965,6 +1966,8 @@ static int 
loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 }
 }
 
+trace_loadvm_postcopy_handle_listen("after discard");
+
 /*
  * Sensitise RAM - can now generate requests for blocks that don't exist
  * However, at this point the CPU shouldn't be running, and the IO
@@ -1977,6 +1980,8 @@ static int 
loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 }
 }
 
+trace_loadvm_postcopy_handle_listen("after uffd");
+
 if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
 error_report_err(local_err);
 return -1;
@@ -1991,6 +1996,8 @@ static int 
loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 qemu_sem_wait(&mis->listen_thread_sem);
 qemu_sem_destroy(&mis->listen_thread_sem);
 
+trace_loadvm_postcopy_handle_listen("return");
+
 return 0;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index 123cfe79d7..92596c00d8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -14,7 +14,7 @@ loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
-loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
 loadvm_postcopy_handle_run_cpu_sync(void) ""
 loadvm_postcopy_handle_run_vmstart(void) ""
-- 
2.32.0




[PATCH v2 04/25] migration: Introduce postcopy channels on dest node

2022-03-01 Thread Peter Xu
Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.

It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.

Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().

To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.

Meanwhile we need to maintain the tmp huge page status per-channel too.

To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:

  - all_zero: this keeps whether this huge page contains all zeros
  - target_pages: this counts how many target pages have been copied
  - host_page:this keeps the host ptr for the page to install

Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
need one structure to keep the state around.

For vanilla postcopy, obviously there's only one channel.  It contains both
precopy and postcopy pages.

This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable.  Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).

Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd).  In this patch there's a TODO marked
for that; so far the channels is always set to 1.

We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage.  One temp host
huge page for each channel will be enough for us for now.

Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.h| 36 +++-
 migration/postcopy-ram.c | 60 ++--
 migration/ram.c  | 43 ++--
 migration/savevm.c   | 12 
 4 files changed, 113 insertions(+), 38 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..42c7395094 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -45,6 +45,24 @@ struct PostcopyBlocktimeContext;
  */
 #define CLEAR_BITMAP_SHIFT_MAX31
 
+/* This is an abstraction of a "temp huge page" for postcopy's purpose */
+typedef struct {
+/*
+ * This points to a temporary huge page as a buffer for UFFDIO_COPY.  It's
+ * mmap()ed and needs to be freed when cleanup.
+ */
+void *tmp_huge_page;
+/*
+ * This points to the host page we're going to install for this temp page.
+ * It tells us after we've received the whole page, where we should put it.
+ */
+void *host_addr;
+/* Number of small pages copied (in size of TARGET_PAGE_SIZE) */
+unsigned int target_pages;
+/* Whether this page contains all zeros */
+bool all_zero;
+} PostcopyTmpPage;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
@@ -81,7 +99,22 @@ struct MigrationIncomingState {
 QemuMutex rp_mutex;/* We send replies from multiple threads */
 /* RAMBlock of last request sent to source */
 RAMBlock *last_rb;
-void *postcopy_tmp_page;
+/*
+ * Number of postcopy channels including the default precopy channel, so
+ * vanilla postcopy will only contain one channel which contain both
+ * precopy and postcopy streams.
+ *
+ * This is calculated when the src requests to enable postcopy but before
+ * it starts.  Its value can depend on e.g. whether postcopy preemption is
+ * enabled.
+ */
+unsigned int postcopy_channels;
+/*
+ * An array of temp host huge pages to be used, one for each postcopy
+ * channel.
+ */
+PostcopyTmpPage *postcopy_tmp_pages;
+/* This is shared for all postcopy channels */
 void *postcopy_tmp_zero_page;
 /* PostCopyFD's for external userfaultfds & handlers of shared memory */
 GArray   *postcopy_remote_fds;
@@ -391,5 +424,6 @@ bool migration_rate_limit(voi

[PATCH v2 03/25] migration: Tracepoint change in postcopy-run bottom half

2022-03-01 Thread Peter Xu
Remove the old two tracepoints and they're even near each other:

trace_loadvm_postcopy_handle_run_cpu_sync()
trace_loadvm_postcopy_handle_run_vmstart()

Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 12 +---
 migration/trace-events |  3 +--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 190cc5fc42..41e3238798 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2006,13 +2006,19 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 Error *local_err = NULL;
 MigrationIncomingState *mis = opaque;
 
+trace_loadvm_postcopy_handle_run_bh("enter");
+
 /* TODO we should move all of this lot into postcopy_ram.c or a shared code
  * in migration.c
  */
 cpu_synchronize_all_post_init();
 
+trace_loadvm_postcopy_handle_run_bh("after cpu sync");
+
 qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
+trace_loadvm_postcopy_handle_run_bh("after announce");
+
 /* Make sure all file formats flush their mutable metadata.
  * If we get an error here, just don't restart the VM yet. */
 bdrv_invalidate_cache_all(&local_err);
@@ -2022,9 +2028,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 autostart = false;
 }
 
-trace_loadvm_postcopy_handle_run_cpu_sync();
-
-trace_loadvm_postcopy_handle_run_vmstart();
+trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
 
 dirty_bitmap_mig_before_vm_start();
 
@@ -2037,6 +2041,8 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 }
 
 qemu_bh_delete(mis->bh);
+
+trace_loadvm_postcopy_handle_run_bh("return");
 }
 
 /* After all discards we can start running and asking for pages */
diff --git a/migration/trace-events b/migration/trace-events
index 92596c00d8..1aec580e92 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -16,8 +16,7 @@ loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
-loadvm_postcopy_handle_run_cpu_sync(void) ""
-loadvm_postcopy_handle_run_vmstart(void) ""
+loadvm_postcopy_handle_run_bh(const char *str) "%s"
 loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
-- 
2.32.0




[PATCH v2 06/25] migration: Add postcopy_thread_create()

2022-03-01 Thread Peter Xu
Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread.  Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.

Make it a shared infrastructure so it's easier to create yet another thread.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.h|  8 +---
 migration/postcopy-ram.c | 23 +--
 migration/postcopy-ram.h |  4 
 migration/savevm.c   | 12 +++-
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 42c7395094..8445e1d14a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -70,7 +70,11 @@ struct MigrationIncomingState {
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
-
+/*
+ * Used to sync thread creations.  Note that we can't create threads in
+ * parallel with this sem.
+ */
+QemuSemaphore  thread_sync_sem;
 /*
  * Free at the start of the main state load, set as the main thread 
finishes
  * loading state.
@@ -83,13 +87,11 @@ struct MigrationIncomingState {
 size_t largest_page_size;
 bool   have_fault_thread;
 QemuThread fault_thread;
-QemuSemaphore  fault_thread_sem;
 /* Set this when we want the fault thread to quit */
 bool   fault_thread_quit;
 
 bool   have_listen_thread;
 QemuThread listen_thread;
-QemuSemaphore  listen_thread_sem;
 
 /* For the kernel to send us notifications */
 int   userfault_fd;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 30c3508f44..d08d396c63 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -78,6 +78,20 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error 
**errp)
 &pnd);
 }
 
+/*
+ * NOTE: this routine is not thread safe, we can't call it concurrently. But it
+ * should be good enough for migration's purposes.
+ */
+void postcopy_thread_create(MigrationIncomingState *mis,
+QemuThread *thread, const char *name,
+void *(*fn)(void *), int joinable)
+{
+qemu_sem_init(&mis->thread_sync_sem, 0);
+qemu_thread_create(thread, name, fn, mis, joinable);
+qemu_sem_wait(&mis->thread_sync_sem);
+qemu_sem_destroy(&mis->thread_sync_sem);
+}
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -902,7 +916,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
 trace_postcopy_ram_fault_thread_entry();
 rcu_register_thread();
 mis->last_rb = NULL; /* last RAMBlock we sent part of */
-qemu_sem_post(&mis->fault_thread_sem);
+qemu_sem_post(&mis->thread_sync_sem);
 
 struct pollfd *pfd;
 size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
@@ -1173,11 +1187,8 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
 return -1;
 }
 
-qemu_sem_init(&mis->fault_thread_sem, 0);
-qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-   postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
-qemu_sem_wait(&mis->fault_thread_sem);
-qemu_sem_destroy(&mis->fault_thread_sem);
+postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
+   postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
 mis->have_fault_thread = true;
 
 /* Mark so that we get notified of accesses to unwritten areas */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 6d2b3cf124..07684c0e1d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -135,6 +135,10 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
 /* Call the notifier list set by postcopy_add_start_notifier */
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
 
+void postcopy_thread_create(MigrationIncomingState *mis,
+QemuThread *thread, const char *name,
+void *(*fn)(void *), int joinable);
+
 struct PostCopyFD;
 
 /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ccd7e5e3f..967ff80547 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1863,7 +1863,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
 migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
-qemu_sem_post(&mis->listen_thread_sem);
+qemu_sem_post(&mis->thread_sync_sem);
 trace_postcopy_ram_listen_thread_start();
 
 rcu_register_thread();
@@ -1988,14 +1988,8 @@ static int 
loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 }
 
 

[PATCH v2 05/25] migration: Dump ramblock and offset too when non-same-page detected

2022-03-01 Thread Peter Xu
In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging.  Adding ramblock & offset into the
error log too.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/ram.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 0fc6b8e349..3a216c2340 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3707,8 +3707,12 @@ static int ram_load_postcopy(QEMUFile *f)
 } else if (tmp_page->host_addr !=
host_page_from_ram_block_offset(block, addr)) {
 /* not the 1st TP within the HP */
-error_report("Non-same host page %p/%p", tmp_page->host_addr,
- host_page_from_ram_block_offset(block, addr));
+error_report("Non-same host page detected.  Target host page 
%p, "
+ "received host page %p "
+ "(rb %s offset 0x"RAM_ADDR_FMT" target_pages %d)",
+ tmp_page->host_addr,
+ host_page_from_ram_block_offset(block, addr),
+ block->idstr, addr, tmp_page->target_pages);
 ret = -EINVAL;
 break;
 }
-- 
2.32.0




[PATCH v2 12/25] migration: Export ram_load_postcopy()

2022-03-01 Thread Peter Xu
Will be reused in postcopy fast load thread.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/ram.c | 2 +-
 migration/ram.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index f1de1a06e4..5cb5dfc2cc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3645,7 +3645,7 @@ int ram_postcopy_incoming_init(MigrationIncomingState 
*mis)
  *
  * @f: QEMUFile where to send the data
  */
-static int ram_load_postcopy(QEMUFile *f)
+int ram_load_postcopy(QEMUFile *f)
 {
 int flags = 0, ret = 0;
 bool place_needed = false;
diff --git a/migration/ram.h b/migration/ram.h
index 2c6dc3675d..ded0a3a086 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -61,6 +61,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+int ram_load_postcopy(QEMUFile *f);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
-- 
2.32.0




[PATCH v2 08/25] migration: Add pss.postcopy_requested status

2022-03-01 Thread Peter Xu
This boolean flag shows whether the current page during migration is triggered
by postcopy or not.  Then in ram_save_host_page() and deeper stack we'll be
able to have a reference on the priority of this page.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/ram.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 9516dd655a..f1de1a06e4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -414,6 +414,8 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
+/* Whether current page is explicitly requested by postcopy */
+bool postcopy_requested;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -1487,6 +1489,9 @@ retry:
  */
 static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
 {
+/* This is not a postcopy requested page */
+pss->postcopy_requested = false;
+
 pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
 if (pss->complete_round && pss->block == rs->last_seen_block &&
 pss->page >= rs->last_page) {
@@ -1981,6 +1986,7 @@ static bool get_queued_page(RAMState *rs, 
PageSearchStatus *pss)
  * really rare.
  */
 pss->complete_round = false;
+pss->postcopy_requested = true;
 }
 
 return !!block;
-- 
2.32.0




[PATCH v2 10/25] migration: Enlarge postcopy recovery to capture !-EIO too

2022-03-01 Thread Peter Xu
We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery.  That's based on the assumption that
we'll only return -EIO for channel issues.

It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.

I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked.  Neither did I encounter anything outside the -EIO error.

However we'd better touch that up before it triggers a rare VM data loss during
live migrating.

To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c| 4 ++--
 migration/postcopy-ram.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6e4cc9cc87..67520d3105 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2877,7 +2877,7 @@ retry:
 out:
 res = qemu_file_get_error(rp);
 if (res) {
-if (res == -EIO && migration_in_postcopy()) {
+if (res && migration_in_postcopy()) {
 /*
  * Maybe there is something we can do: it looks like a
  * network down issue, and we pause for a recovery.
@@ -3478,7 +3478,7 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 error_free(local_error);
 }
 
-if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
 /*
  * For postcopy, we allow the network to be down for a
  * while. After that, it can be continued by a
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d08d396c63..b0d12d5053 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1039,7 +1039,7 @@ retry:
 msg.arg.pagefault.address);
 if (ret) {
 /* May be network failure, try to wait for recovery */
-if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+if (postcopy_pause_fault_thread(mis)) {
 /* We got reconnected somehow, try to continue */
 goto retry;
 } else {
-- 
2.32.0




[PATCH v2 07/25] migration: Move static var in ram_block_from_stream() into global

2022-03-01 Thread Peter Xu
Static variable is very unfriendly to threading of ram_block_from_stream().
Move it into MigrationIncomingState.

Make the incoming state pointer to be passed over to ram_block_from_stream() on
both caller sites.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.h |  3 ++-
 migration/ram.c   | 13 +
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8445e1d14a..d8b9850eae 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -66,7 +66,8 @@ typedef struct {
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
-
+/* Previously received RAM's RAMBlock pointer */
+RAMBlock *last_recv_block;
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index 3a216c2340..9516dd655a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3185,12 +3185,14 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, 
void *host)
  *
  * Returns a pointer from within the RCU-protected ram_list.
  *
+ * @mis: the migration incoming state pointer
  * @f: QEMUFile where to read the data from
  * @flags: Page flags (mostly to see if it's a continuation of previous block)
  */
-static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
+static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
+  QEMUFile *f, int flags)
 {
-static RAMBlock *block;
+RAMBlock *block = mis->last_recv_block;
 char id[256];
 uint8_t len;
 
@@ -3217,6 +3219,8 @@ static inline RAMBlock *ram_block_from_stream(QEMUFile 
*f, int flags)
 return NULL;
 }
 
+mis->last_recv_block = block;
+
 return block;
 }
 
@@ -3669,7 +3673,7 @@ static int ram_load_postcopy(QEMUFile *f)
 trace_ram_load_postcopy_loop((uint64_t)addr, flags);
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE)) {
-block = ram_block_from_stream(f, flags);
+block = ram_block_from_stream(mis, f, flags);
 if (!block) {
 ret = -EINVAL;
 break;
@@ -3881,6 +3885,7 @@ void colo_flush_ram_cache(void)
  */
 static int ram_load_precopy(QEMUFile *f)
 {
+MigrationIncomingState *mis = migration_incoming_get_current();
 int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_is_advised();
@@ -3919,7 +3924,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-RAMBlock *block = ram_block_from_stream(f, flags);
+RAMBlock *block = ram_block_from_stream(mis, f, flags);
 
 host = host_from_ram_block_offset(block, addr);
 /*
-- 
2.32.0




[PATCH v2 14/25] migration: Add migration_incoming_transport_cleanup()

2022-03-01 Thread Peter Xu
Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 22 ++
 migration/migration.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b2e6446457..6bb321cdd3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -279,6 +279,19 @@ MigrationIncomingState 
*migration_incoming_get_current(void)
 return current_incoming;
 }
 
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
+{
+if (mis->socket_address_list) {
+qapi_free_SocketAddressList(mis->socket_address_list);
+mis->socket_address_list = NULL;
+}
+
+if (mis->transport_cleanup) {
+mis->transport_cleanup(mis->transport_data);
+mis->transport_data = mis->transport_cleanup = NULL;
+}
+}
+
 void migration_incoming_state_destroy(void)
 {
 struct MigrationIncomingState *mis = migration_incoming_get_current();
@@ -299,10 +312,8 @@ void migration_incoming_state_destroy(void)
 g_array_free(mis->postcopy_remote_fds, TRUE);
 mis->postcopy_remote_fds = NULL;
 }
-if (mis->transport_cleanup) {
-mis->transport_cleanup(mis->transport_data);
-}
 
+migration_incoming_transport_cleanup(mis);
 qemu_event_reset(&mis->main_thread_load_event);
 
 if (mis->page_requested) {
@@ -310,11 +321,6 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
-if (mis->socket_address_list) {
-qapi_free_SocketAddressList(mis->socket_address_list);
-mis->socket_address_list = NULL;
-}
-
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index d677a750c9..f17ccc657c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -166,6 +166,7 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
 /*
  * Functions to work with blocktime context
  */
-- 
2.32.0




[PATCH v2 24/25] tests: Add postcopy preempt test

2022-03-01 Thread Peter Xu
Two tests are added: a normal postcopy preempt test, and a recovery test.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 41 ++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7b42f6fd90..09a9ce4401 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -470,6 +470,7 @@ typedef struct {
  */
 bool hide_stderr;
 bool use_shmem;
+bool postcopy_preempt;
 /* only launch the target process */
 bool only_target;
 /* Use dirty ring if true; dirty logging otherwise */
@@ -663,6 +664,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 MigrateStart *args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+/* NOTE: args will be freed in test_migrate_start(), cache it */
+bool postcopy_preempt = args->postcopy_preempt;
 QTestState *from, *to;
 
 if (test_migrate_start(&from, &to, uri, args)) {
@@ -673,6 +676,11 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 /* We want to pick a speed slow enough that the test completes
  * quickly, but that it doesn't complete precopy even on a slow
  * machine, so also set the downtime.
@@ -719,13 +727,29 @@ static void test_postcopy(void)
 migrate_postcopy_complete(from, to);
 }
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_preempt(void)
+{
+MigrateStart *args = migrate_start_new();
+QTestState *from, *to;
+
+args->postcopy_preempt = true;
+
+if (migrate_postcopy_prepare(&from, &to, args)) {
+return;
+}
+migrate_postcopy_start(from, to);
+migrate_postcopy_complete(from, to);
+}
+
+/* @preempt: whether to use postcopy-preempt */
+static void test_postcopy_recovery(bool preempt)
 {
 MigrateStart *args = migrate_start_new();
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
 args->hide_stderr = true;
+args->postcopy_preempt = preempt;
 
 if (migrate_postcopy_prepare(&from, &to, args)) {
 return;
@@ -781,6 +805,16 @@ static void test_postcopy_recovery(void)
 migrate_postcopy_complete(from, to);
 }
 
+static void test_postcopy_recovery_normal(void)
+{
+test_postcopy_recovery(false);
+}
+
+static void test_postcopy_recovery_preempt(void)
+{
+test_postcopy_recovery(true);
+}
+
 static void test_baddest(void)
 {
 MigrateStart *args = migrate_start_new();
@@ -1458,7 +1492,10 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery", 
test_postcopy_recovery_normal);
+qtest_add_func("/migration/postcopy/preempt/unix", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery",
+   test_postcopy_recovery_preempt);
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix", test_precopy_unix);
 qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
-- 
2.32.0




[PATCH v2 13/25] migration: Move channel setup out of postcopy_try_recover()

2022-03-01 Thread Peter Xu
We used to use postcopy_try_recover() to replace migration_incoming_setup() to
setup incoming channels.  That's fine for the old world, but in the new world
there can be more than one channels that need setup.  Better move the channel
setup out of it so that postcopy_try_recover() only handles the last phase of
switching to the recovery phase.

To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
call to be after migration_incoming_setup(), which will setup the channels.
While in migration_ioc_process_incoming(), postpone the recover() routine right
before we'll jump into migration_incoming_process().

A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
anymore.  Remove it.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 67520d3105..b2e6446457 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -665,19 +665,20 @@ void migration_incoming_process(void)
 }
 
 /* Returns true if recovered from a paused migration, otherwise false */
-static bool postcopy_try_recover(QEMUFile *f)
+static bool postcopy_try_recover(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 
 if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
 /* Resumed from a paused postcopy migration */
 
-mis->from_src_file = f;
+/* This should be set already in migration_incoming_setup() */
+assert(mis->from_src_file);
 /* Postcopy has standalone thread to do vm load */
-qemu_file_set_blocking(f, true);
+qemu_file_set_blocking(mis->from_src_file, true);
 
 /* Re-configure the return path */
-mis->to_src_file = qemu_file_get_return_path(f);
+mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
 
 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
   MIGRATION_STATUS_POSTCOPY_RECOVER);
@@ -698,11 +699,10 @@ static bool postcopy_try_recover(QEMUFile *f)
 
 void migration_fd_process_incoming(QEMUFile *f, Error **errp)
 {
-if (postcopy_try_recover(f)) {
+if (!migration_incoming_setup(f, errp)) {
 return;
 }
-
-if (!migration_incoming_setup(f, errp)) {
+if (postcopy_try_recover()) {
 return;
 }
 migration_incoming_process();
@@ -718,11 +718,6 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 /* The first connection (multifd may have multiple) */
 QEMUFile *f = qemu_fopen_channel_input(ioc);
 
-/* If it's a recovery, we're done */
-if (postcopy_try_recover(f)) {
-return;
-}
-
 if (!migration_incoming_setup(f, errp)) {
 return;
 }
@@ -743,6 +738,10 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 }
 
 if (start_migration) {
+/* If it's a recovery, we're done */
+if (postcopy_try_recover()) {
+return;
+}
 migration_incoming_process();
 }
 }
-- 
2.32.0




[PATCH v2 09/25] migration: Move migrate_allow_multifd and helpers into migration.c

2022-03-01 Thread Peter Xu
This variable, along with its helpers, is used to detect whether multiple
channel will be supported for migration.  In follow up patches, there'll be
other capability that requires multi-channels.  Hence move it outside multifd
specific code and make it public.  Meanwhile rename it from "multifd" to
"multi_channels" to show its real meaning.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 22 +-
 migration/migration.h |  3 +++
 migration/multifd.c   | 19 ---
 migration/multifd.h   |  2 --
 4 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bcc385b94b..6e4cc9cc87 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -180,6 +180,18 @@ static int migration_maybe_pause(MigrationState *s,
  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 
+static bool migrate_allow_multi_channels = true;
+
+void migrate_protocol_allow_multi_channels(bool allow)
+{
+migrate_allow_multi_channels = allow;
+}
+
+bool migrate_multi_channels_is_allowed(void)
+{
+return migrate_allow_multi_channels;
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
 uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -463,12 +475,12 @@ static void qemu_start_incoming_migration(const char 
*uri, Error **errp)
 {
 const char *p = NULL;
 
-migrate_protocol_allow_multifd(false); /* reset it anyway */
+migrate_protocol_allow_multi_channels(false); /* reset it anyway */
 qapi_event_send_migration(MIGRATION_STATUS_SETUP);
 if (strstart(uri, "tcp:", &p) ||
 strstart(uri, "unix:", NULL) ||
 strstart(uri, "vsock:", NULL)) {
-migrate_protocol_allow_multifd(true);
+migrate_protocol_allow_multi_channels(true);
 socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
 } else if (strstart(uri, "rdma:", &p)) {
@@ -1255,7 +1267,7 @@ static bool migrate_caps_check(bool *cap_list,
 
 /* incoming side only */
 if (runstate_check(RUN_STATE_INMIGRATE) &&
-!migrate_multifd_is_allowed() &&
+!migrate_multi_channels_is_allowed() &&
 cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
 error_setg(errp, "multifd is not supported by current protocol");
 return false;
@@ -2313,11 +2325,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
blk,
 }
 }
 
-migrate_protocol_allow_multifd(false);
+migrate_protocol_allow_multi_channels(false);
 if (strstart(uri, "tcp:", &p) ||
 strstart(uri, "unix:", NULL) ||
 strstart(uri, "vsock:", NULL)) {
-migrate_protocol_allow_multifd(true);
+migrate_protocol_allow_multi_channels(true);
 socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
 } else if (strstart(uri, "rdma:", &p)) {
diff --git a/migration/migration.h b/migration/migration.h
index d8b9850eae..d677a750c9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -429,4 +429,7 @@ void migration_cancel(const Error *error);
 void populate_vfio_info(MigrationInfo *info);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+bool migrate_multi_channels_is_allowed(void);
+void migrate_protocol_allow_multi_channels(bool allow);
+
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 76b57a7177..180586dcde 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -517,7 +517,7 @@ void multifd_save_cleanup(void)
 {
 int i;
 
-if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
+if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
 return;
 }
 multifd_send_terminate_threads(NULL);
@@ -858,17 +858,6 @@ cleanup:
 multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
 
-static bool migrate_allow_multifd = true;
-void migrate_protocol_allow_multifd(bool allow)
-{
-migrate_allow_multifd = allow;
-}
-
-bool migrate_multifd_is_allowed(void)
-{
-return migrate_allow_multifd;
-}
-
 int multifd_save_setup(Error **errp)
 {
 int thread_count;
@@ -879,7 +868,7 @@ int multifd_save_setup(Error **errp)
 if (!migrate_use_multifd()) {
 return 0;
 }
-if (!migrate_multifd_is_allowed()) {
+if (!migrate_multi_channels_is_allowed()) {
 error_setg(errp, "multifd is not supported by current protocol");
 return -1;
 }
@@ -980,7 +969,7 @@ int multifd_load_cleanup(Error **errp)
 {
 int i;
 
-if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
+if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
 return 0;
 }
 multifd_recv_terminate_threads(NULL);
@@ -1129,7 +1118,7 @@ int multifd_load_setup(Error **errp)
 if (!migrate_use_multifd()) {
 return 0;
 }
-if (!migrate_multifd_is_allowed()) {
+if (!migrate_multi_channels_is_allowed()) {
 

[PATCH v2 21/25] migration: Parameter x-postcopy-preempt-break-huge

2022-03-01 Thread Peter Xu
Add a parameter that can conditionally disable the "break sending huge
page" behavior in postcopy preemption.  By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++
 migration/ram.c   | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 78e1e6bfb9..cd4a150202 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4325,6 +4325,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
+DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
+  postcopy_preempt_break_huge, true),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/migration.h b/migration/migration.h
index f898b8547a..6ee520642f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,6 +340,13 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
+/*
+ * Whether we allow break sending huge pages when postcopy preempt is
+ * enabled.  When disabled, we won't interrupt precopy within sending a
+ * host huge page, which is the old behavior of vanilla postcopy.
+ * NOTE: this parameter is ignored if postcopy preempt is not enabled.
+ */
+bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/ram.c b/migration/ram.c
index 53dfd9be38..ede8aaac01 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2266,11 +2266,18 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 
 static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
 {
+MigrationState *ms = migrate_get_current();
+
 /* Not enabled eager preempt?  Then never do that. */
 if (!migrate_postcopy_preempt()) {
 return false;
 }
 
+/* If the user explicitly disabled breaking of huge page, skip */
+if (!ms->postcopy_preempt_break_huge) {
+return false;
+}
+
 /* If the ramblock we're sending is a small page?  Never bother. */
 if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
 return false;
-- 
2.32.0




[PATCH v2 15/25] migration: Allow migrate-recover to run multiple times

2022-03-01 Thread Peter Xu
Previously migration didn't have an easy way to cleanup the listening
transport, migrate recovery only allows to execute once.  That's done with a
trick flag in postcopy_recover_triggered.

Now the facility is already there.

Drop postcopy_recover_triggered and instead allows a new migrate-recover to
release the previous listener transport.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 13 ++---
 migration/migration.h |  1 -
 migration/savevm.c|  3 ---
 3 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6bb321cdd3..16086897aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2159,11 +2159,8 @@ void qmp_migrate_recover(const char *uri, Error **errp)
 return;
 }
 
-if (qatomic_cmpxchg(&mis->postcopy_recover_triggered,
-   false, true) == true) {
-error_setg(errp, "Migrate recovery is triggered already");
-return;
-}
+/* If there's an existing transport, release it */
+migration_incoming_transport_cleanup(mis);
 
 /*
  * Note that this call will never start a real migration; it will
@@ -2171,12 +2168,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
  * to continue using that newly established channel.
  */
 qemu_start_incoming_migration(uri, errp);
-
-/* Safe to dereference with the assert above */
-if (*errp) {
-/* Reset the flag so user could still retry */
-qatomic_set(&mis->postcopy_recover_triggered, false);
-}
 }
 
 void qmp_migrate_pause(Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index f17ccc657c..a863032b71 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -139,7 +139,6 @@ struct MigrationIncomingState {
 struct PostcopyBlocktimeContext *blocktime_ctx;
 
 /* notify PAUSED postcopy incoming migrations to try to continue */
-bool postcopy_recover_triggered;
 QemuSemaphore postcopy_pause_sem_dst;
 QemuSemaphore postcopy_pause_sem_fault;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 967ff80547..254aa78234 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2589,9 +2589,6 @@ static bool 
postcopy_pause_incoming(MigrationIncomingState *mis)
 
 assert(migrate_postcopy_ram());
 
-/* Clear the triggered bit to allow one recovery */
-mis->postcopy_recover_triggered = false;
-
 /*
  * Unregister yank with either from/to src would work, since ioc behind it
  * is the same
-- 
2.32.0




[PATCH v2 22/25] migration: Add helpers to detect TLS capability

2022-03-01 Thread Peter Xu
Add migrate_tls_enabled() to detect whether TLS is configured.

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS.

No functional change intended.

Signed-off-by: Peter Xu 
---
 migration/channel.c   | 10 ++
 migration/migration.c | 17 +
 migration/migration.h |  4 
 migration/multifd.c   |  7 +--
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index c4fc000a1a..85ac053275 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,10 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls(ioc)) {
 migration_tls_channel_process_incoming(s, ioc, &local_err);
 } else {
 migration_ioc_register_yank(ioc);
@@ -71,10 +68,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls(ioc)) {
 migration_tls_channel_connect(s, ioc, hostname, &error);
 
 if (!error) {
diff --git a/migration/migration.c b/migration/migration.c
index cd4a150202..f30bad982c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/target_page.h"
 #include "io/channel-buffer.h"
+#include "io/channel-tls.h"
 #include "migration/colo.h"
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
@@ -4246,6 +4247,22 @@ void migration_global_dump(Monitor *mon)
ms->clear_bitmap_shift);
 }
 
+bool migrate_tls_enabled(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
+bool migrate_channel_requires_tls(QIOChannel *ioc)
+{
+if (!migrate_tls_enabled()) {
+return false;
+}
+
+return !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+}
+
 #define DEFINE_PROP_MIG_CAP(name, x) \
 DEFINE_PROP_BOOL(name, MigrationState, enabled_capabilities[x], false)
 
diff --git a/migration/migration.h b/migration/migration.h
index 6ee520642f..8b9ad7fe31 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -436,6 +436,10 @@ bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
 bool migrate_postcopy_preempt(void);
+/* Whether TLS is enabled for migration? */
+bool migrate_tls_enabled(void);
+/* Whether the QIO channel requires further TLS handshake? */
+bool migrate_channel_requires_tls(QIOChannel *ioc);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/migration/multifd.c b/migration/multifd.c
index 180586dcde..46dfcbfa1d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -784,16 +784,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
 Error *error)
 {
-MigrationState *s = migrate_get_current();
-
 trace_multifd_set_outgoing_channel(
 ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls(ioc)) {
 multifd_tls_channel_connect(p, ioc, &error);
 if (!error) {
 /*
-- 
2.32.0




[PATCH v2 11/25] migration: postcopy_pause_fault_thread() never fails

2022-03-01 Thread Peter Xu
Per the title, remove the return code and simplify the callers as the errors
will never be triggered.  No functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/postcopy-ram.c | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b0d12d5053..32c52f4b1d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -891,15 +891,11 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
   affected_cpu);
 }
 
-static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
 {
 trace_postcopy_pause_fault_thread();
-
 qemu_sem_wait(&mis->postcopy_pause_sem_fault);
-
 trace_postcopy_pause_fault_thread_continued();
-
-return true;
 }
 
 /*
@@ -959,13 +955,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
  * broken already using the event. We should hold until
  * the channel is rebuilt.
  */
-if (postcopy_pause_fault_thread(mis)) {
-/* Continue to read the userfaultfd */
-} else {
-error_report("%s: paused but don't allow to continue",
- __func__);
-break;
-}
+postcopy_pause_fault_thread(mis);
 }
 
 if (pfd[1].revents) {
@@ -1039,15 +1029,8 @@ retry:
 msg.arg.pagefault.address);
 if (ret) {
 /* May be network failure, try to wait for recovery */
-if (postcopy_pause_fault_thread(mis)) {
-/* We got reconnected somehow, try to continue */
-goto retry;
-} else {
-/* This is a unavoidable fault */
-error_report("%s: postcopy_request_page() get %d",
- __func__, ret);
-break;
-}
+postcopy_pause_fault_thread(mis);
+goto retry;
 }
 }
 
-- 
2.32.0




[PATCH v2 17/25] migration: Postcopy preemption preparation on channel creation

2022-03-01 Thread Peter Xu
Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Signed-off-by: Peter Xu 
---
 migration/migration.c| 62 +++
 migration/migration.h|  8 
 migration/postcopy-ram.c | 92 ++--
 migration/postcopy-ram.h | 10 +
 migration/ram.c  | 25 ---
 migration/ram.h  |  4 +-
 migration/socket.c   | 22 +-
 migration/socket.h   |  1 +
 migration/trace-events   |  3 ++
 9 files changed, 207 insertions(+), 20 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4c22bad304..3d7f897b72 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
+if (mis->postcopy_qemufile_dst) {
+migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+qemu_fclose(mis->postcopy_qemufile_dst);
+mis->postcopy_qemufile_dst = NULL;
+}
+
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error 
**errp)
 migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 bool start_migration;
+QEMUFile *f;
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_fopen_channel_input(ioc);
+f = qemu_fopen_channel_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 
 /*
  * Common migration only needs one channel, so we can start
- * right now.  Multifd needs more than one channel, we wait.
+ * right now.  Some features need more than one channel, we wait.
  */
-start_migration = !migrate_use_multifd();
+start_migration = !migration_needs_multiple_sockets();
 } else {
 /* Multiple connections */
-assert(migrate_use_multifd());
-start_migration = multifd_recv_new_channel(ioc, &local_err);
+assert(migration_needs_multiple_sockets());
+if (migrate_use_multifd()) {
+start_migration = multifd_recv_new_channel(ioc, &local_err);
+} else if (migrate_postcopy_preempt()) {
+f = qemu_fopen_channel_input(ioc);
+start_migration = postcopy_preempt_new_channel(mis, f);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 bool migration_has_all_channels(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
-bool all_channels;
 
-all_channels = multifd_recv_all_channels_created();
+if (!mis->from_src_file) {
+return false;
+}
+
+if (migrate_use_multifd()) {
+return multifd_recv_all_channels_created();
+}
+
+if (migrate_postcopy_preempt()) {
+return mis->postcopy_qemufile_dst != NULL;
+}
 
-return all_channels && mis->from_src_file != NULL;
+return true;
 }
 
 /*
@@ -1858,6 +1884,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3233,6 +3265,11 @@ static void migration_completion(MigrationState *s)
 qemu_savevm_state_complete_postcopy(s->to_dst_file);
 qemu_mutex_unlock_iothread();
 
+/* Shutdown the postcopy fast path thread */
+if (migrate_postcopy_preempt()) {
+postcopy_preempt_shutdown_file(s);
+}
+
 trace_migration_completion_postcopy_end_after_complete();
 } else {
 goto fail;
@@ -4120,6 +4157,15 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 }
 }
 
+/* This needs to be done before resuming a postcopy */
+if (postcopy_preempt_setup(s,

[PATCH v2 23/25] migration: Fail postcopy preempt with TLS for now

2022-03-01 Thread Peter Xu
The support is not yet there.  Temporarily fail it properly when starting
postcopy until it's supported.  Fail at postcopy-start still allows the
user to proceed with e.g. pure tls precopy even if postcopy-ram is set.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index f30bad982c..95cfc483c9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1802,6 +1802,12 @@ void qmp_migrate_start_postcopy(Error **errp)
  " started");
 return;
 }
+
+if (migrate_postcopy_preempt() && migrate_tls_enabled()) {
+error_setg(errp, "Postcopy preemption does not support TLS yet");
+return;
+}
+
 /*
  * we don't error if migration has finished since that would be racy
  * with issuing this command.
-- 
2.32.0




[PATCH v2 16/25] migration: Add postcopy-preempt capability

2022-03-01 Thread Peter Xu
Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c | 23 +++
 migration/migration.h |  1 +
 qapi/migration.json   |  8 +++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 16086897aa..4c22bad304 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1235,6 +1235,11 @@ static bool migrate_caps_check(bool *cap_list,
 error_setg(errp, "Postcopy is not compatible with ignore-shared");
 return false;
 }
+
+if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+error_setg(errp, "Multifd is not supported in postcopy");
+return false;
+}
 }
 
 if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
@@ -1278,6 +1283,13 @@ static bool migrate_caps_check(bool *cap_list,
 return false;
 }
 
+if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "Postcopy preempt requires postcopy-ram");
+return false;
+}
+}
+
 return true;
 }
 
@@ -2622,6 +2634,15 @@ bool migrate_background_snapshot(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4232,6 +4253,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
 DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
 DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
 DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
 DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
 DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index a863032b71..af4bcb19c2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -394,6 +394,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index 5975a0e104..50878b5f3b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -463,6 +463,12 @@
 #   procedure starts. The VM RAM is saved with running VM.
 #   (since 6.0)
 #
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#requests to preempt precopy stream, so postcopy requests
+#will be handled faster.  This is a performance feature and
+#should not affect the correctness of postcopy migration.
+#(since 7.0)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -476,7 +482,7 @@
'block', 'return-path', 'pause-before-switchover', 'multifd',
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
-   'validate-uuid', 'background-snapshot'] }
+   'validate-uuid', 'background-snapshot', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.32.0




Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID

2022-03-01 Thread Gavin Shan

Hi Igor,

On 2/28/22 6:54 PM, Igor Mammedov wrote:

On Mon, 28 Feb 2022 12:26:53 +0800
Gavin Shan  wrote:

On 2/25/22 6:03 PM, Igor Mammedov wrote:

On Fri, 25 Feb 2022 16:41:43 +0800
Gavin Shan  wrote:

On 2/17/22 10:14 AM, Gavin Shan wrote:

On 1/26/22 5:14 PM, Igor Mammedov wrote:

On Wed, 26 Jan 2022 13:24:10 +0800
Gavin Shan  wrote:
 

The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id()
when it isn't provided explicitly. However, the CPU topology isn't fully
considered in the default association and it causes CPU topology broken
warnings on booting Linux guest.

For example, the following warning messages are observed when the Linux guest
is booted with the following command lines.

     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
     -accel kvm -machine virt,gic-version=host   \
     -cpu host   \
     -smp 6,sockets=2,cores=3,threads=1  \
     -m 1024M,slots=16,maxmem=64G    \
     -object memory-backend-ram,id=mem0,size=128M    \
     -object memory-backend-ram,id=mem1,size=128M    \
     -object memory-backend-ram,id=mem2,size=128M    \
     -object memory-backend-ram,id=mem3,size=128M    \
     -object memory-backend-ram,id=mem4,size=128M    \
     -object memory-backend-ram,id=mem4,size=384M    \
     -numa node,nodeid=0,memdev=mem0 \
     -numa node,nodeid=1,memdev=mem1 \
     -numa node,nodeid=2,memdev=mem2 \
     -numa node,nodeid=3,memdev=mem3 \
     -numa node,nodeid=4,memdev=mem4 \
     -numa node,nodeid=5,memdev=mem5
    :
     alternatives: patching kernel code
     BUG: arch topology borken
     the CLS domain not a subset of the MC domain
     
     BUG: arch topology borken
     the DIE domain not a subset of the NODE domain

With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
are associated with NODE#0 to NODE#5 separately. That's incorrect because
CPU#0/1/2 should be associated with same NUMA node because they're seated
in same socket.

This fixes the issue by considering the socket when default CPU-to-NUMA
is given. With this applied, no more CPU topology broken warnings are seen
from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are
no CPUs associated with NODE#2/3/4/5.
 

  From migration point of view it looks fine to me, and doesn't need a compat 
knob

since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't
version those unless something is broken by it).

 

Signed-off-by: Gavin Shan 
---
    hw/arm/virt.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 141350bf21..b4a95522d3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
    static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
    {
-    return idx % ms->numa_state->num_nodes;
+    return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * 
ms->smp.threads);


I'd like for ARM folks to confirm whether above is correct
(i.e. socket is NUMA node boundary and also if above topo vars
could have odd values. Don't look at horribly complicated x86
as example, but it showed that vendors could stash pretty much
anything there, so we should consider it here as well and maybe
forbid that in smp virt-arm parser)
 


After doing some investigation, I don't think the socket is NUMA node boundary.
Unfortunately, I didn't find it's documented like this in any documents after
checking device-tree specification, Linux CPU topology and NUMA binding 
documents.

However, there are two options here according to Linux (guest) kernel code:
(A) socket is NUMA node boundary  (B) CPU die is NUMA node boundary. They are
equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology
of one-to-one association between socket and NUMA node sounds natural and 
simplified.
So I think (A) is the best way to go.

Another thing I want to explain here is how the changes affect the memory
allocation in Linux guest. Taking the command lines included in the commit
log as an example, the first two NUMA nodes are bound to CPUs while the other
4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node
won't accommodate the memory allocation until the memory in the near (local)
NUMA node becomes exhausted. However, it's uncertain how the memory is hosted
if memory binding isn't applied.

Besides, I think the code should be improved like below to avoid overflow on
ms->numa_state->num_nodes.

    static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
    {
-    return idx % ms->numa_state->num_nodes;
+    int node_idx;
+
+    node_idx = idx / (ms-

Re: [PATCH] target/riscv: fix inverted checks for ext_zb[abcs]

2022-03-01 Thread Philipp Tomsich
On Tue, 1 Mar 2022 at 02:28, Vineet Gupta  wrote:

> Hi Alistair,
>
> On 2/3/22 16:59, Alistair Francis wrote:
> > On Fri, Feb 4, 2022 at 1:42 AM Philipp Tomsich 
> wrote:
> >>
> >> While changing to the use of cfg_ptr, the conditions for
> REQUIRE_ZB[ABCS]
> >> inadvertently became inverted and slipped through the initial testing
> (which
> >> used RV64GC_XVentanaCondOps as a target).
> >> This fixes the regression.
> >>
> >> Tested against SPEC2017 w/ GCC 12 (prerelease) for
> RV64GC_zba_zbb_zbc_zbs.
> >>
> >> Fixes: 718143c126 ("target/riscv: add a MAINTAINERS entry for
> XVentanaCondOps")
> >>
> >> Signed-off-by: Philipp Tomsich 
> >
> > Reviewed-by: Alistair Francis 
> >
> >>
> >> ---
> >> We may want to squash this onto the affected commit, if it hasn't made
> >> it beyond the next-tree, yet.
> >
> > Yeah, agreed. I'll squash it in
> >
> > Alistair
>
> Has this already been committed upstream. I was running into weird issue
> related to bitmanip and seems this was missing in my local tree.
>

After checking master now, this has not made it onto master yet.
Note that rc0 is planned for 2 weeks from now, so I am not overly concerned
yet.

Philipp.


> Also the "Fixes: " entry in changelog doesn't seem OK; the issue seems
> to have been introduced in f2a32bec8f0da99 ("target/riscv: access cfg
> structure through DisasContext")
>
> Thx,
> -Vineet
>
> >
> >>
> >>   target/riscv/insn_trans/trans_rvb.c.inc | 8 
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/target/riscv/insn_trans/trans_rvb.c.inc
> b/target/riscv/insn_trans/trans_rvb.c.inc
> >> index f9bd3b7ec4..e3c6b459d6 100644
> >> --- a/target/riscv/insn_trans/trans_rvb.c.inc
> >> +++ b/target/riscv/insn_trans/trans_rvb.c.inc
> >> @@ -19,25 +19,25 @@
> >>*/
> >>
> >>   #define REQUIRE_ZBA(ctx) do {\
> >> -if (ctx->cfg_ptr->ext_zba) { \
> >> +if (!ctx->cfg_ptr->ext_zba) { \
> >>   return false;\
> >>   }\
> >>   } while (0)
> >>
> >>   #define REQUIRE_ZBB(ctx) do {\
> >> -if (ctx->cfg_ptr->ext_zbb) { \
> >> +if (!ctx->cfg_ptr->ext_zbb) { \
> >>   return false;\
> >>   }\
> >>   } while (0)
> >>
> >>   #define REQUIRE_ZBC(ctx) do {\
> >> -if (ctx->cfg_ptr->ext_zbc) { \
> >> +if (!ctx->cfg_ptr->ext_zbc) { \
> >>   return false;\
> >>   }\
> >>   } while (0)
> >>
> >>   #define REQUIRE_ZBS(ctx) do {\
> >> -if (ctx->cfg_ptr->ext_zbs) { \
> >> +if (!ctx->cfg_ptr->ext_zbs) { \
> >>   return false;\
> >>   }\
> >>   } while (0)
> >> --
> >> 2.34.1
> >>
> >>
> >
> >
>
>


[PATCH v2 25/25] tests: Pass in MigrateStart** into test_migrate_start()

2022-03-01 Thread Peter Xu
test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers.  It can easily
be a source of use-after-free.

Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 09a9ce4401..67f0601988 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -496,7 +496,7 @@ static void migrate_start_destroy(MigrateStart *args)
 }
 
 static int test_migrate_start(QTestState **from, QTestState **to,
-  const char *uri, MigrateStart *args)
+  const char *uri, MigrateStart **pargs)
 {
 g_autofree gchar *arch_source = NULL;
 g_autofree gchar *arch_target = NULL;
@@ -508,6 +508,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_autofree char *shmem_path = NULL;
 const char *arch = qtest_get_arch();
 const char *machine_opts = NULL;
+MigrateStart *args = *pargs;
 const char *memory_size;
 int ret = 0;
 
@@ -622,6 +623,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 
 out:
 migrate_start_destroy(args);
+/* This tells the caller that this structure is gone */
+*pargs = NULL;
 return ret;
 }
 
@@ -668,7 +671,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 bool postcopy_preempt = args->postcopy_preempt;
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args)) {
 return -1;
 }
 
@@ -822,7 +825,7 @@ static void test_baddest(void)
 
 args->hide_stderr = true;
 
-if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", args)) {
+if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", &args)) {
 return;
 }
 migrate_qmp(from, "tcp:127.0.0.1:0", "{}");
@@ -838,7 +841,7 @@ static void test_precopy_unix_common(bool dirty_ring)
 
 args->use_dirty_ring = dirty_ring;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args)) {
 return;
 }
 
@@ -926,7 +929,7 @@ static void test_xbzrle(const char *uri)
 MigrateStart *args = migrate_start_new();
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args)) {
 return;
 }
 
@@ -980,7 +983,7 @@ static void test_precopy_tcp(void)
 g_autofree char *uri = NULL;
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", args)) {
+if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", &args)) {
 return;
 }
 
@@ -1025,7 +1028,7 @@ static void test_migrate_fd_proto(void)
 QDict *rsp;
 const char *error_desc;
 
-if (test_migrate_start(&from, &to, "defer", args)) {
+if (test_migrate_start(&from, &to, "defer", &args)) {
 return;
 }
 
@@ -1105,7 +1108,7 @@ static void do_test_validate_uuid(MigrateStart *args, 
bool should_fail)
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args)) {
 return;
 }
 
@@ -1197,7 +1200,7 @@ static void test_migrate_auto_converge(void)
  */
 const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args)) {
 return;
 }
 
@@ -1266,7 +1269,7 @@ static void test_multifd_tcp(const char *method)
 QDict *rsp;
 g_autofree char *uri = NULL;
 
-if (test_migrate_start(&from, &to, "defer", args)) {
+if (test_migrate_start(&from, &to, "defer", &args)) {
 return;
 }
 
@@ -1352,7 +1355,7 @@ static void test_multifd_tcp_cancel(void)
 
 args->hide_stderr = true;
 
-if (test_migrate_start(&from, &to, "defer", args)) {
+if (test_migrate_start(&from, &to, "defer", &args)) {
 return;
 }
 
@@ -1391,7 +1394,7 @@ static void test_multifd_tcp_cancel(void)
 args = migrate_start_new();
 args->only_target = true;
 
-if (test_migrate_start(&from, &to2, "defer", args)) {
+if (test_migrate_start(&from, &to2, "defer", &args)) {
 return;
 }
 
-- 
2.32.0




[PATCH v2 18/25] migration: Postcopy preemption enablement

2022-03-01 Thread Peter Xu
This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c| 250 +++--
 migration/trace-events |   7 ++
 4 files changed, 252 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3d7f897b72..d20db04097 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3153,6 +3153,8 @@ static int postcopy_start(MigrationState *ms)
   MIGRATION_STATUS_FAILED);
 }
 
+trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
 return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index caa910d956..b8aacfe3af 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
 /* Previously received RAM's RAMBlock pointer */
-RAMBlock *last_recv_block;
+RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index 713ef6e421..53dfd9be38 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -295,6 +295,20 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+/*
+ * Cached ramblock/offset values if preempted.  They're only meaningful if
+ * preempted==true below.
+ */
+RAMBlock *ram_block;
+unsigned long ram_page;
+/*
+ * Whether a postcopy preemption just happened.  Will be reset after
+ * precopy recovered to background migration.
+ */
+bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -349,6 +363,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/* Postcopy preemption informations */
+PostcopyPreemptState postcopy_preempt_state;
+/*
+ * Current channel we're using on src VM.  Only valid if postcopy-preempt
+ * is enabled.
+ */
+unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -356,6 +378,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1947,6 +1974,55 @@ void ram_write_tracking_stop(void)
 }
 #endif /* defined(__linux__) */
 
+/*
+ * Check whether tw

[PATCH v2 19/25] migration: Postcopy recover with preempt enabled

2022-03-01 Thread Peter Xu
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 migration/migration.c| 27 +++
 migration/migration.h| 19 +++
 migration/postcopy-ram.c | 24 ++--
 migration/qemu-file.c| 27 +++
 migration/qemu-file.h|  1 +
 migration/savevm.c   | 21 +++--
 migration/trace-events   |  2 ++
 7 files changed, 113 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d20db04097..69778cab23 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(¤t_incoming->rp_mutex);
+qemu_mutex_init(¤t_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(¤t_incoming->main_thread_load_event, false);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(¤t_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(¤t_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(&mis->postcopy_pause_sem_dst);
@@ -3466,6 +3468,18 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_file_shutdown(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(&s->state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3521,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 return MIG_THR_ERR_FATAL;
 }
 
-/* Try to detect any file errors */
-ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
+/*
+ * Try to detect any file errors.  Note that postcopy_qemufile_src will
+ * be NULL when postcopy preempt is not enabled.
+ */
+ret = qemu_file_get_error_obj_any(s->to_dst_file,
+  s->postcopy_qemufile_src,
+  &local_error);
 if (!ret) {
 /* Everything is fine */
 assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index b8aacfe3af..91f845e9e4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/*
+ * Used to sync between the ram load main thread and the fast ram load
+ * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+ * fast channel.
+ *
+ * The ram fast load thread will take it mostly for the whole lifecycle
+ * because it needs to continuously read data from the channel, and
+ * it'll only release this mutex if postcopy is interrupted, so that
+ * the ram load main thread will take this mutex over and properly
+ * release the broken channel.
+ */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of temp host huge pages to be used, one for each postcopy
  * channel.
@@ -147,6 +159,13 @@ struct MigrationIncomingS

[PATCH] tests/Makefile.include: Let "make clean" remove the TCG tests, too

2022-03-01 Thread Thomas Huth
"make clean" should clear all binaries that have been built, but so
far it left the TCG tests still in place. Let's make sure that they
are now removed, too.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index e7153c8e91..7a932caf91 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -162,6 +162,6 @@ check-build: run-ninja
 check-clean:
rm -rf $(TESTS_VENV_DIR) $(TESTS_RESULTS_DIR)
 
-clean: check-clean
+clean: check-clean clean-tcg
 
 endif
-- 
2.27.0




Re: What is the correct way to handle the VirtIO config space in vhost-user?

2022-03-01 Thread Stefan Hajnoczi
On Mon, Feb 28, 2022 at 04:44:47PM +, Peter Maydell wrote:
> On Mon, 28 Feb 2022 at 16:32, Alex Bennée  wrote:
> > Stefan Hajnoczi  writes:
> > > On Fri, Feb 25, 2022 at 05:32:43PM +, Alex Bennée wrote:
> > >> (aside: this continues my QOM confusion about when things should be in a
> > >> class or instance init, up until this point I hadn't needed it in my
> > >> stub).
> > >
> > > Class init is a one-time per-class initializer function. It is mostly
> > > used for setting up callbacks/overridden methods from the base class.
> > >
> > > Instance init is like an object constructor in object-oriented
> > > programming.
> >
> > I phrased my statement poorly. What I meant to say is I sometimes find
> > QEMUs approach to using class over instance initialisation inconsistent.
> > I think I understand the "policy" as use class init until there is a
> > case where you can't (e.g. having individual control of each instance of
> > a device).
> 
> Do you have examples of inconsistency? (I'm sure there are some,
> we're inconsistent about almost everything...)

Phew, at least we're inconsistent about being inconsistent. If we were
inconsistent about absolutely everything that just wouldn't do!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] vhost-vsock: detach the virqueue element in case of error

2022-03-01 Thread Stefan Hajnoczi
On Mon, Feb 28, 2022 at 10:50:58AM +0100, Stefano Garzarella wrote:
> In vhost_vsock_common_send_transport_reset(), if an element popped from
> the virtqueue is invalid, we should call virtqueue_detach_element() to
> detach it from the virtqueue before freeing its memory.
> 
> Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
> Cc: qemu-sta...@nongnu.org
> Reported-by: VictorV 
> Signed-off-by: Stefano Garzarella 
> ---
>  hw/virtio/vhost-vsock-common.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] hw/arm/virt: Validate memory size on the first NUMA node

2022-03-01 Thread Gavin Shan

Hi Igor,

On 2/28/22 5:08 PM, Igor Mammedov wrote:

On Mon, 28 Feb 2022 15:52:03 +0800
Gavin Shan  wrote:


When the memory size on the first NUMA node is less than 128MB, the
guest hangs inside EDK2 as the following logs show.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
   -accel kvm -machine virt,gic-version=host   \
   -cpu host -smp 8,sockets=2,cores=2,threads=2\
   -m 1024M,slots=16,maxmem=64G\
   -object memory-backend-ram,id=mem0,size=127M\
   -object memory-backend-ram,id=mem1,size=897M\
   -numa node,nodeid=0,memdev=mem0 \
   -numa node,nodeid=1,memdev=mem1 \
   -L /home/gavin/sandbox/qemu.main/build/pc-bios  \
:
   QemuVirtMemInfoPeiLibConstructor: System RAM @ 0x47F0 - 0x7FFF
   QemuVirtMemInfoPeiLibConstructor: System RAM @ 0x4000 - 0x47EF
   ASSERT [MemoryInit] 
/home/lacos/src/upstream/qemu/roms/edk2/ArmVirtPkg/Library/QemuVirtMemInfoLib/QemuVirtMemInfoPeiLibConstructor.c(93):
 NewSize >= 0x0800

This adds MachineClass::validate_numa_nodes() to validate the memory
size on the first NUMA node. The guest is stopped from booting and
the reason is given for this specific case.


Unless it architecturally wrong thing i.e. (node size less than 128Mb)
,in which case limiting it in QEMU would be justified, I'd prefer
firmware being fixed or it reporting more useful for user error message.
 


[include EDK2 developers]

I don't think 128MB node memory size is architecturally required.
I also thought EDK2 would be better place to provide a precise error
mesage and discussed it through with EDK2 developers. Lets see what
are their thoughts this time.
 

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c   | 9 +
  hw/core/numa.c  | 5 +
  include/hw/boards.h | 1 +
  3 files changed, 15 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 46bf7ceddf..234e7fca28 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2491,6 +2491,14 @@ static int64_t virt_get_default_cpu_node_id(const 
MachineState *ms, int idx)
  return idx % ms->numa_state->num_nodes;
  }
  
+static void virt_validate_numa_nodes(MachineState *ms)

+{
+if (ms->numa_state->nodes[0].node_mem < 128 * MiB) {
+error_report("The first NUMA node should have at least 128MB memory");
+exit(1);


perhaps error_fatal() would be better



Yes, I think so :)


+}
+}
+
  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
  {
  int n;
@@ -2836,6 +2844,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
  mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
  mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
  mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+mc->validate_numa_nodes = virt_validate_numa_nodes;
  mc->kvm_type = virt_kvm_type;
  assert(!mc->get_hotplug_handler);
  mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1aa05dcf42..543a2eaf11 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -724,6 +724,11 @@ void numa_complete_configuration(MachineState *ms)
  /* Validation succeeded, now fill in any missing distances. */
  complete_init_numa_distance(ms);
  }
+
+/* Validate NUMA nodes for the individual machine */
+if (mc->validate_numa_nodes) {
+mc->validate_numa_nodes(ms);
+}
  }
  }
  
diff --git a/include/hw/boards.h b/include/hw/boards.h

index c92ac8815c..9709a35eeb 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -282,6 +282,7 @@ struct MachineClass {
   unsigned cpu_index);
  const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
  int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
+void (*validate_numa_nodes)(MachineState *ms);
  ram_addr_t (*fixup_ram_size)(ram_addr_t size);
  };
  


Thanks,
Gavin




[PATCH v2 20/25] migration: Create the postcopy preempt channel asynchronously

2022-03-01 Thread Peter Xu
This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Signed-off-by: Peter Xu 
---
 migration/migration.c| 16 
 migration/migration.h|  7 +
 migration/postcopy-ram.c | 56 +++-
 migration/postcopy-ram.h |  1 +
 4 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 69778cab23..78e1e6bfb9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3016,6 +3016,12 @@ static int postcopy_start(MigrationState *ms)
 int64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
+
+if (postcopy_preempt_wait_channel(ms)) {
+migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+return -1;
+}
+
 if (!migrate_pause_before_switchover()) {
 migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -3497,6 +3503,14 @@ static MigThrError postcopy_pause(MigrationState *s)
 if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
 /* Woken up by a recover procedure. Give it a shot */
 
+if (postcopy_preempt_wait_channel(s)) {
+/*
+ * Preempt enabled, and new channel create failed; loop
+ * back to wait for another recovery.
+ */
+continue;
+}
+
 /*
  * Firstly, let's wake up the return path now, with a new
  * return path channel.
@@ -4356,6 +4370,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_sem_destroy(&ms->postcopy_pause_sem);
 qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
 qemu_sem_destroy(&ms->rp_state.rp_sem);
+qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
 error_free(ms->error);
 }
 
@@ -4402,6 +4417,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(&ms->rp_state.rp_sem, 0);
 qemu_sem_init(&ms->rate_limit_sem, 0);
 qemu_sem_init(&ms->wait_unplug_sem, 0);
+qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
 qemu_mutex_init(&ms->qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 91f845e9e4..f898b8547a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,13 @@ struct MigrationState {
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
 QEMUFile *postcopy_qemufile_src;
+/*
+ * It is posted when the preempt channel is established.  Note: this is
+ * used for both the start or recover of a postcopy migration.  We'll
+ * post to this sem every time a new preempt channel is created in the
+ * main thread, and we keep post() and wait() in pair.
+ */
+QemuSemaphore postcopy_qemufile_src_sem;
 QIOChannelBuffer *bioc;
 /*
  * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e20305a9e2..3ead5b1b3c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1552,10 +1552,50 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
-QIOChannel *ioc;
+MigrationState *s = opaque;
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, &local_err)) {
+/* Something wrong happened.. */
+object_unref(OBJECT(ioc));
+migrate_set_error(migrate_get_current(), local_err);
+error_free(local_err);
+} else {
+  

Re: [PATCH 1/5] crypto: perform permission checks under BQL

2022-03-01 Thread Kevin Wolf
Am 09.02.2022 um 11:54 hat Emanuele Giuseppe Esposito geschrieben:
> Move the permission API calls into driver-specific callbacks
> that always run under BQL. In this case, bdrv_crypto_luks
> needs to perform permission checks before and after
> qcrypto_block_amend_options(). The problem is that the caller,
> block_crypto_amend_options_generic_luks(), can also run in I/O
> from .bdrv_co_amend(). This does not comply with Global State-I/O API split,
> as permissions API must always run under BQL.
> 
> Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean()
> callbacks. These two callbacks are guaranteed to be invoked under
> BQL, respectively before and after .bdrv_co_amend().
> They take care of performing the permission checks
> in the same way as they are currently done before and after
> qcrypto_block_amend_options().
> These callbacks are in preparation for next patch, where we
> delete the original permission check. Right now they just add redundant
> control.
> 
> Then, call .bdrv_amend_pre_run() before job_start in
> qmp_x_blockdev_amend(), so that it will be run before the job coroutine
> is created and stay in the main loop.
> As a cleanup, use JobDriver's .clean() callback to call
> .bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  block/amend.c | 24 
>  block/crypto.c| 27 +++
>  include/block/block_int.h | 14 ++
>  3 files changed, 65 insertions(+)
> 
> diff --git a/block/amend.c b/block/amend.c
> index 392df9ef83..329bca53dc 100644
> --- a/block/amend.c
> +++ b/block/amend.c
> @@ -53,10 +53,29 @@ static int coroutine_fn blockdev_amend_run(Job *job, 
> Error **errp)
>  return ret;
>  }
>  
> +static int blockdev_amend_pre_run(BlockdevAmendJob *s, Error **errp)
> +{
> +if (s->bs->drv->bdrv_amend_pre_run) {
> +return s->bs->drv->bdrv_amend_pre_run(s->bs, errp);
> +}
> +
> +return 0;
> +}
> +
> +static void blockdev_amend_clean(Job *job)
> +{
> +BlockdevAmendJob *s = container_of(job, BlockdevAmendJob, common);
> +
> +if (s->bs->drv->bdrv_amend_clean) {
> +s->bs->drv->bdrv_amend_clean(s->bs);
> +}
> +}
> +
>  static const JobDriver blockdev_amend_job_driver = {
>  .instance_size = sizeof(BlockdevAmendJob),
>  .job_type  = JOB_TYPE_AMEND,
>  .run   = blockdev_amend_run,
> +.clean = blockdev_amend_clean,
>  };
>  
>  void qmp_x_blockdev_amend(const char *job_id,
> @@ -113,5 +132,10 @@ void qmp_x_blockdev_amend(const char *job_id,
>  s->bs = bs,
>  s->opts = QAPI_CLONE(BlockdevAmendOptions, options),
>  s->force = has_force ? force : false;
> +
> +if (blockdev_amend_pre_run(s, errp)) {
> +return;
> +}
> +
>  job_start(&s->common);
>  }
> diff --git a/block/crypto.c b/block/crypto.c
> index c8ba4681e2..59f768ea8d 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -777,6 +777,31 @@ block_crypto_get_specific_info_luks(BlockDriverState 
> *bs, Error **errp)
>  return spec_info;
>  }
>  
> +static int
> +block_crypto_amend_prepare(BlockDriverState *bs, Error **errp)
> +{
> +BlockCrypto *crypto = bs->opaque;
> +
> +/* apply for exclusive read/write permissions to the underlying file*/

Missing space before the end of the comment.

> +crypto->updating_keys = true;
> +return bdrv_child_refresh_perms(bs, bs->file, errp);
> +}
> +
> +static void
> +block_crypto_amend_cleanup(BlockDriverState *bs)
> +{
> +BlockCrypto *crypto = bs->opaque;
> +Error *errp = NULL;
> +
> +/* release exclusive read/write permissions to the underlying file*/

And here.

I can fix this up while applying.

Kevin




Re: [PATCH v2 00/25] migration: Postcopy Preemption

2022-03-01 Thread Daniel P . Berrangé
On Tue, Mar 01, 2022 at 04:39:00PM +0800, Peter Xu wrote:
> This is v2 of postcopy preempt series.  It can also be found here:
> 
>   https://github.com/xzpeter/qemu/tree/postcopy-preempt
> 
> RFC: 
> https://lore.kernel.org/qemu-devel/20220119080929.39485-1-pet...@redhat.com
> V1:  
> https://lore.kernel.org/qemu-devel/20220216062809.57179-1-pet...@redhat.com
> 
> v1->v2 changelog:
> - Picked up more r-bs from Dave
> - Rename both fault threads to drop "qemu/" prefix [Dave]
> - Further rework on postcopy recovery, to be able to detect qemufile errors
>   from either main channel or postcopy one [Dave]
> - shutdown() qemufile before close on src postcopy channel when postcopy is
>   paused [Dave]
> - In postcopy_preempt_new_channel(), explicitly set the new channel in
>   blocking state, even if it's the default [Dave]
> - Make RAMState.postcopy_channel unsigned int [Dave]
> - Added patches:
>   - "migration: Create the postcopy preempt channel asynchronously"
>   - "migration: Parameter x-postcopy-preempt-break-huge"
>   - "migration: Add helpers to detect TLS capability"
>   - "migration: Fail postcopy preempt with TLS"
>   - "tests: Pass in MigrateStart** into test_migrate_start()"
> 
> Abstract
> 
> 
> This series added a new migration capability called "postcopy-preempt".  It 
> can
> be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
> postcopy page requests handling process.

Is there no way we can just automatically enable this new feature, rather
than requiring apps to specify yet another new flag ?

> TODO List
> =
> 
> TLS support
> ---
> 
> I only noticed its missing very recently.  Since soft freeze is coming, and
> obviously I'm still growing this series, so I tend to have the existing
> material discussed. Let's see if it can still catch the train for QEMU 7.0
> release (soft freeze on 2022-03-08)..

I don't like the idea of shipping something that is only half finished.
It means that when apps probe for the feature, they'll see preempt
capability present, but have no idea whether they're using a QEMU that
is broken when combined with TLS or not. We shouldn't merge something
just to meet the soft freeze deadline if we know key features are broken.

> Multi-channel for preemption threads
> 
> 
> Currently the postcopy preempt feature use only one extra channel and one
> extra thread on dest (no new thread on src QEMU).  It should be mostly good
> enough for major use cases, but when the postcopy queue is long enough
> (e.g. hundreds of vCPUs faulted on different pages) logically we could
> still observe more delays in average.  Whether growing threads/channels can
> solve it is debatable, but sounds worthwhile a try.  That's yet another
> thing we can think about after this patchset lands.

If we don't think about it upfront, then we'll possibly end up with
yet another tunable flag that apps have to worry about. It also
could make migration code even more complex if we have to support
two different scenarios. If we think multiple threads are goign to
be a benefit lets check that and if so, design it into the exposed
application facing interface from the start rather than retrofitting
afterwards.

> Logically the design provides space for that - the receiving postcopy
> preempt thread can understand all ram-layer migration protocol, and for
> multi channel and multi threads we could simply grow that into multile
> threads handling the same protocol (with multiple PostcopyTmpPage).  The
> source needs more thoughts on synchronizations, though, but it shouldn't
> affect the whole protocol layer, so should be easy to keep compatible.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Igor Mammedov
On Mon, 28 Feb 2022 22:17:32 +0200
Liav Albani  wrote:

> This can allow the guest OS to determine more easily if i8042 controller
> is present in the system or not, so it doesn't need to do probing of the
> controller, but just initialize it immediately, before enumerating the
> ACPI AML namespace.
> 
> This change only applies to the x86/q35 machine type, as it uses FACP
> ACPI table with revision higher than 1, which should implement at least
> ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> flags register according to the ACPI 2.0 specification.
> 
> Signed-off-by: Liav Albani 
> ---
>  hw/acpi/aml-build.c | 11 ++-
>  hw/i386/acpi-build.c|  9 +
>  hw/i386/acpi-microvm.c  |  9 +
commit message says it's q35 specific, so wy it touched microvm anc piix4?

>  include/hw/acpi/acpi-defs.h |  1 +
>  4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 8966e16320..2085905b83 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2152,7 +2152,16 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, const 
> AcpiFadtData *f,
>  build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
>  build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
>  build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
> -build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
> +/* IAPC_BOOT_ARCH */
> +/*
> + * This register is not defined in ACPI spec version 1.0, where the FACP
> + * revision == 1 also applies. Therefore, just ignore setting this 
> register.
> + */
> +if (f->rev == 1) {
> +build_append_int_noprefix(tbl, 0, 2);
> +} else {
> +build_append_int_noprefix(tbl, f->iapc_boot_arch, 2);
> +}
>  build_append_int_noprefix(tbl, 0, 1); /* Reserved */
>  build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
>  
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index ebd47aa26f..c72c7bb9bb 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -38,6 +38,7 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/acpi/bios-linker-loader.h"
>  #include "hw/isa/isa.h"
> +#include "hw/input/i8042.h"
>  #include "hw/block/fdc.h"
>  #include "hw/acpi/memory_hotplug.h"
>  #include "sysemu/tpm.h"
> @@ -192,6 +193,14 @@ static void init_common_fadt_data(MachineState *ms, 
> Object *o,
>  .address = object_property_get_uint(o, ACPI_PM_PROP_GPE0_BLK, 
> NULL)
>  },
>  };
> +/*
> + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> presence
> + * or equivalent micro controller. See table 5-10 of APCI spec version 
> 2.0
> + * (the earliest acpi revision that supports this).

 /* APCI spec version 2.0, Table 5-10 */

is sufficient, the rest could be read from spec/

> + */
> +fadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) ?
> +0x0002 : 0x;
> +
>  *data = fadt;
>  }
>  
> diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> index 68ca7e7fc2..4bc72b1672 100644
> --- a/hw/i386/acpi-microvm.c
> +++ b/hw/i386/acpi-microvm.c
> @@ -31,6 +31,7 @@
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/acpi/utils.h"
>  #include "hw/acpi/erst.h"
> +#include "hw/input/i8042.h"
>  #include "hw/i386/fw_cfg.h"
>  #include "hw/i386/microvm.h"
>  #include "hw/pci/pci.h"
> @@ -189,6 +190,14 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
>  .reset_val = ACPI_GED_RESET_VALUE,
>  };
>  
> +/*
> + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> presence
> + * or equivalent micro controller. See table 5-10 of APCI spec version 
> 2.0
> + * (the earliest acpi revision that supports this).
> + */
> +pmfadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) ?
> +0x0002 : 0x;
> +
>  table_offsets = g_array_new(false, true /* clear */,
>  sizeof(uint32_t));
>  bios_linker_loader_alloc(tables->linker,
> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> index c97e8633ad..2b42e4192b 100644
> --- a/include/hw/acpi/acpi-defs.h
> +++ b/include/hw/acpi/acpi-defs.h
> @@ -77,6 +77,7 @@ typedef struct AcpiFadtData {
>  uint16_t plvl2_lat;/* P_LVL2_LAT */
>  uint16_t plvl3_lat;/* P_LVL3_LAT */
>  uint16_t arm_boot_arch;/* ARM_BOOT_ARCH */
> +uint16_t iapc_boot_arch;   /* IAPC_BOOT_ARCH */
>  uint8_t minor_ver; /* FADT Minor Version */
>  
>  /*




[PATCH] tests/tcg/s390x: Fix the exrl-trt* tests with Clang

2022-03-01 Thread Thomas Huth
The exrl-trt* tests use two pre-initialized variables for the
results of the assembly code:

uint64_t r1 = 0xull;
uint64_t r2 = 0xull;

But then the assembly code copies over the full contents
of the register into the output variable, without taking
care of this pre-initialized values:

"lgr %[r1],%%r1\n"
"lgr %[r2],%%r2\n"

The code then finally compares the register contents to
a value that apparently depends on the pre-initialized values:

if (r2 != 0xffaaull) {
write(1, "bad r2\n", 7);
return 1;
}

This all works with GCC, since the 0x got into
the r2 register there by accident, but it fails completely with
Clang.

Let's fix this by declaring the r1 and r2 variables as proper
register variables instead, so the pre-initialized values get
correctly passed into the inline assembly code.

Signed-off-by: Thomas Huth 
---
 tests/tcg/s390x/exrl-trt.c  | 8 +++-
 tests/tcg/s390x/exrl-trtr.c | 8 +++-
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/tests/tcg/s390x/exrl-trt.c b/tests/tcg/s390x/exrl-trt.c
index 16711a3181..451f777b9d 100644
--- a/tests/tcg/s390x/exrl-trt.c
+++ b/tests/tcg/s390x/exrl-trt.c
@@ -5,8 +5,8 @@ int main(void)
 {
 char op1[] = "hello";
 char op2[256];
-uint64_t r1 = 0xull;
-uint64_t r2 = 0xull;
+register uint64_t r1 asm("r1") = 0xull;
+register uint64_t r2 asm("r2") = 0xull;
 uint64_t cc;
 int i;
 
@@ -21,8 +21,6 @@ int main(void)
 "j 2f\n"
 "1:  trt 0(1,%[op1]),%[op2]\n"
 "2:  exrl %[op1_len],1b\n"
-"lgr %[r1],%%r1\n"
-"lgr %[r2],%%r2\n"
 "ipm %[cc]\n"
 : [r1] "+r" (r1),
   [r2] "+r" (r2),
@@ -30,7 +28,7 @@ int main(void)
 : [op1] "a" (&op1),
   [op1_len] "a" (5),
   [op2] "Q" (op2)
-: "r1", "r2", "cc");
+: "cc");
 cc = (cc >> 28) & 3;
 if (cc != 2) {
 write(1, "bad cc\n", 7);
diff --git a/tests/tcg/s390x/exrl-trtr.c b/tests/tcg/s390x/exrl-trtr.c
index 5f30cda6bd..422f7f385a 100644
--- a/tests/tcg/s390x/exrl-trtr.c
+++ b/tests/tcg/s390x/exrl-trtr.c
@@ -5,8 +5,8 @@ int main(void)
 {
 char op1[] = {0, 1, 2, 3};
 char op2[256];
-uint64_t r1 = 0xull;
-uint64_t r2 = 0xull;
+register uint64_t r1 asm("r1") = 0xull;
+register uint64_t r2 asm("r2") = 0xull;
 uint64_t cc;
 int i;
 
@@ -21,8 +21,6 @@ int main(void)
 "j 2f\n"
 "1:  trtr 3(1,%[op1]),%[op2]\n"
 "2:  exrl %[op1_len],1b\n"
-"lgr %[r1],%%r1\n"
-"lgr %[r2],%%r2\n"
 "ipm %[cc]\n"
 : [r1] "+r" (r1),
   [r2] "+r" (r2),
@@ -30,7 +28,7 @@ int main(void)
 : [op1] "a" (&op1),
   [op1_len] "a" (3),
   [op2] "Q" (op2)
-: "r1", "r2", "cc");
+: "cc");
 cc = (cc >> 28) & 3;
 if (cc != 1) {
 write(1, "bad cc\n", 7);
-- 
2.27.0




Re: [PATCH] aio-posix: fix spurious ->poll_ready() callbacks in main loop

2022-03-01 Thread Stefan Hajnoczi
On Wed, Feb 23, 2022 at 03:57:03PM +, Stefan Hajnoczi wrote:
> When ->poll() succeeds the AioHandler is placed on the ready list with
> revents set to the magic value 0. This magic value causes
> aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read()
> for G_IO_IN or ->io_write() for G_IO_OUT.
> 
> This magic value 0 hack works for the IOThread where AioHandlers are
> placed on ->ready_list and processed by aio_dispatch_ready_handlers().
> It does not work for the main loop where all AioHandlers are processed
> by aio_dispatch_handlers(), even those that are not ready and have a
> revents value of 0.
> 
> As a result the main loop invokes ->poll_ready() on AioHandlers that are
> not ready. These spurious ->poll_ready() calls waste CPU cycles and
> could lead to crashes if the code assumes ->poll() must have succeeded
> before ->poll_ready() is called (a reasonable asumption but I haven't
> seen it in practice).
> 
> Stop using revents to track whether ->poll_ready() will be called on an
> AioHandler. Introduce a separate AioHandler->poll_ready field instead.
> This eliminates spurious ->poll_ready() calls in the main loop.
> 
> Fixes: 826cc32423db2a99d184dbf4f507c737d7e7a4ae ("aio-posix: split poll check 
> from ready handler")
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/aio-posix.h |  1 +
>  util/aio-posix.c | 32 ++--
>  2 files changed, 19 insertions(+), 14 deletions(-)

Applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


[PATCH] tests/tcg/s390x: Fix mvc, mvo and pack tests with Clang

2022-03-01 Thread Thomas Huth
These instructions use addressing with a "base address", meaning
that if register r0 is used, it is always treated as zero, no matter
what value is stored in the register. So we have to make sure not
to use register r0 for these instructions in our tests. There was
no problem with GCC so far since it seems to always pick other
registers by default, but Clang likes to chose register r0, too,
so we have to use the "a" constraint to make sure that it does
not pick r0 here.

Signed-off-by: Thomas Huth 
---
 tests/tcg/s390x/mvc.c  | 4 ++--
 tests/tcg/s390x/mvo.c  | 4 ++--
 tests/tcg/s390x/pack.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/tcg/s390x/mvc.c b/tests/tcg/s390x/mvc.c
index aa552d52e5..7ae4c44550 100644
--- a/tests/tcg/s390x/mvc.c
+++ b/tests/tcg/s390x/mvc.c
@@ -20,8 +20,8 @@ static inline void mvc_256(const char *dst, const char *src)
 asm volatile (
 "mvc 0(256,%[dst]),0(%[src])\n"
 :
-: [dst] "d" (dst),
-  [src] "d" (src)
+: [dst] "a" (dst),
+  [src] "a" (src)
 : "memory");
 }
 
diff --git a/tests/tcg/s390x/mvo.c b/tests/tcg/s390x/mvo.c
index 5546fe2a97..0c3ecdde2e 100644
--- a/tests/tcg/s390x/mvo.c
+++ b/tests/tcg/s390x/mvo.c
@@ -11,8 +11,8 @@ int main(void)
 asm volatile (
 "mvo 0(4,%[dest]),0(3,%[src])\n"
 :
-: [dest] "d" (dest + 1),
-  [src] "d" (src + 1)
+: [dest] "a" (dest + 1),
+  [src] "a" (src + 1)
 : "memory");
 
 for (i = 0; i < sizeof(expected); i++) {
diff --git a/tests/tcg/s390x/pack.c b/tests/tcg/s390x/pack.c
index 4be36f29a7..55e7e214e8 100644
--- a/tests/tcg/s390x/pack.c
+++ b/tests/tcg/s390x/pack.c
@@ -9,7 +9,7 @@ int main(void)
 asm volatile(
 "pack 2(4,%[data]),2(4,%[data])\n"
 :
-: [data] "r" (&data[0])
+: [data] "a" (&data[0])
 : "memory");
 for (i = 0; i < 8; i++) {
 if (data[i] != exp[i]) {
-- 
2.27.0




Re: [PATCH] tests/tcg/s390x: Fix the exrl-trt* tests with Clang

2022-03-01 Thread David Hildenbrand
On 01.03.22 10:24, Thomas Huth wrote:
> The exrl-trt* tests use two pre-initialized variables for the
> results of the assembly code:
> 
> uint64_t r1 = 0xull;
> uint64_t r2 = 0xull;
> 
> But then the assembly code copies over the full contents
> of the register into the output variable, without taking
> care of this pre-initialized values:
> 
> "lgr %[r1],%%r1\n"
> "lgr %[r2],%%r2\n"
> 
> The code then finally compares the register contents to
> a value that apparently depends on the pre-initialized values:
> 
> if (r2 != 0xffaaull) {
> write(1, "bad r2\n", 7);
> return 1;
> }
> 
> This all works with GCC, since the 0x got into
> the r2 register there by accident, but it fails completely with
> Clang.
> 
> Let's fix this by declaring the r1 and r2 variables as proper
> register variables instead, so the pre-initialized values get
> correctly passed into the inline assembly code.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tests/tcg/s390x/exrl-trt.c  | 8 +++-
>  tests/tcg/s390x/exrl-trtr.c | 8 +++-
>  2 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/tcg/s390x/exrl-trt.c b/tests/tcg/s390x/exrl-trt.c
> index 16711a3181..451f777b9d 100644
> --- a/tests/tcg/s390x/exrl-trt.c
> +++ b/tests/tcg/s390x/exrl-trt.c
> @@ -5,8 +5,8 @@ int main(void)
>  {
>  char op1[] = "hello";
>  char op2[256];
> -uint64_t r1 = 0xull;
> -uint64_t r2 = 0xull;
> +register uint64_t r1 asm("r1") = 0xull;
> +register uint64_t r2 asm("r2") = 0xull;
>  uint64_t cc;
>  int i;
>  
> @@ -21,8 +21,6 @@ int main(void)
>  "j 2f\n"
>  "1:  trt 0(1,%[op1]),%[op2]\n"
>  "2:  exrl %[op1_len],1b\n"
> -"lgr %[r1],%%r1\n"
> -"lgr %[r2],%%r2\n"
>  "ipm %[cc]\n"
>  : [r1] "+r" (r1),
>[r2] "+r" (r2),
> @@ -30,7 +28,7 @@ int main(void)
>  : [op1] "a" (&op1),
>[op1_len] "a" (5),
>[op2] "Q" (op2)
> -: "r1", "r2", "cc");
> +: "cc");
>  cc = (cc >> 28) & 3;
>  if (cc != 2) {
>  write(1, "bad cc\n", 7);
> diff --git a/tests/tcg/s390x/exrl-trtr.c b/tests/tcg/s390x/exrl-trtr.c
> index 5f30cda6bd..422f7f385a 100644
> --- a/tests/tcg/s390x/exrl-trtr.c
> +++ b/tests/tcg/s390x/exrl-trtr.c
> @@ -5,8 +5,8 @@ int main(void)
>  {
>  char op1[] = {0, 1, 2, 3};
>  char op2[256];
> -uint64_t r1 = 0xull;
> -uint64_t r2 = 0xull;
> +register uint64_t r1 asm("r1") = 0xull;
> +register uint64_t r2 asm("r2") = 0xull;
>  uint64_t cc;
>  int i;
>  
> @@ -21,8 +21,6 @@ int main(void)
>  "j 2f\n"
>  "1:  trtr 3(1,%[op1]),%[op2]\n"
>  "2:  exrl %[op1_len],1b\n"
> -"lgr %[r1],%%r1\n"
> -"lgr %[r2],%%r2\n"
>  "ipm %[cc]\n"
>  : [r1] "+r" (r1),
>[r2] "+r" (r2),
> @@ -30,7 +28,7 @@ int main(void)
>  : [op1] "a" (&op1),
>[op1_len] "a" (3),
>[op2] "Q" (op2)
> -: "r1", "r2", "cc");
> +: "cc");
>  cc = (cc >> 28) & 3;
>  if (cc != 1) {
>  write(1, "bad cc\n", 7);

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb




Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Igor Mammedov
On Tue, 1 Mar 2022 08:29:05 +0530 (IST)
Ani Sinha  wrote:

> On Mon, 28 Feb 2022, Liav Albani wrote:
> 
> > This can allow the guest OS to determine more easily if i8042 controller
> > is present in the system or not, so it doesn't need to do probing of the
> > controller, but just initialize it immediately, before enumerating the
> > ACPI AML namespace.
> >
> > This change only applies to the x86/q35 machine type, as it uses FACP
> > ACPI table with revision higher than 1, which should implement at least
> > ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> > flags register according to the ACPI 2.0 specification.
> >
> > Signed-off-by: Liav Albani 
> > ---
> >  hw/acpi/aml-build.c | 11 ++-
> >  hw/i386/acpi-build.c|  9 +
> >  hw/i386/acpi-microvm.c  |  9 +
> >  include/hw/acpi/acpi-defs.h |  1 +
> >  4 files changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index 8966e16320..2085905b83 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -2152,7 +2152,16 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, 
> > const AcpiFadtData *f,
> >  build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
> >  build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
> >  build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
> > -build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
> > +/* IAPC_BOOT_ARCH */
> > +/*
> > + * This register is not defined in ACPI spec version 1.0, where the 
> > FACP  
> 
> I'd say "this IAPC_BOOT_ARCH register" to be more specific.
> 
> > + * revision == 1 also applies. Therefore, just ignore setting this 
> > register.
> > + */

I'd drop this comment altogether, like it's done with the rest of the fields  
in this function

> > +if (f->rev == 1) {
> > +build_append_int_noprefix(tbl, 0, 2);
> > +} else {
maybe add here
/* Since ACPI 2.0 */

> > +build_append_int_noprefix(tbl, f->iapc_boot_arch, 2);
> > +}
> >  build_append_int_noprefix(tbl, 0, 1); /* Reserved */
> >  build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
> >
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index ebd47aa26f..c72c7bb9bb 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -38,6 +38,7 @@
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "hw/acpi/bios-linker-loader.h"
> >  #include "hw/isa/isa.h"
> > +#include "hw/input/i8042.h"
> >  #include "hw/block/fdc.h"
> >  #include "hw/acpi/memory_hotplug.h"
> >  #include "sysemu/tpm.h"
> > @@ -192,6 +193,14 @@ static void init_common_fadt_data(MachineState *ms, 
> > Object *o,
> >  .address = object_property_get_uint(o, ACPI_PM_PROP_GPE0_BLK, 
> > NULL)
> >  },
> >  };
> > +/*
> > + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> > presence  
> 
> again typo here.
> 
> > + * or equivalent micro controller. See table 5-10 of APCI spec version 
> > 2.0
> > + * (the earliest acpi revision that supports this).
> > + */
> > +fadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) ?
> > +0x0002 : 0x;  
> 
> I thought I said we need to make sure the logic still applied when there
> are more than one device of this type. Please fix this.
> 
> > +
> >  *data = fadt;
> >  }
> >
> > diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> > index 68ca7e7fc2..4bc72b1672 100644
> > --- a/hw/i386/acpi-microvm.c
> > +++ b/hw/i386/acpi-microvm.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/acpi/generic_event_device.h"
> >  #include "hw/acpi/utils.h"
> >  #include "hw/acpi/erst.h"
> > +#include "hw/input/i8042.h"
> >  #include "hw/i386/fw_cfg.h"
> >  #include "hw/i386/microvm.h"
> >  #include "hw/pci/pci.h"
> > @@ -189,6 +190,14 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
> >  .reset_val = ACPI_GED_RESET_VALUE,
> >  };
> >
> > +/*
> > + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> > presence
> > + * or equivalent micro controller. See table 5-10 of APCI spec version 
> > 2.0
> > + * (the earliest acpi revision that supports this).
> > + */
> > +pmfadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) 
> > ?
> > +0x0002 : 0x;  
> 
> 
> Ditto.
> 
> > +
> >  table_offsets = g_array_new(false, true /* clear */,
> >  sizeof(uint32_t));
> >  bios_linker_loader_alloc(tables->linker,
> > diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> > index c97e8633ad..2b42e4192b 100644
> > --- a/include/hw/acpi/acpi-defs.h
> > +++ b/include/hw/acpi/acpi-defs.h
> > @@ -77,6 +77,7 @@ typedef struct AcpiFadtData {
> >  uint16_t plvl2_lat;/* P_LVL2_LAT */
> >  uint16_t plvl3_lat;/* P_LVL3_LAT */
> >  uint16_t 

[PULL 01/18] tests/docker: restore TESTS/IMAGES filtering

2022-03-01 Thread Alex Bennée
This was broken in the re-factor:

  e86c9a64f4 ("tests/docker/Makefile.include: add a generic docker-run target")

Rather than unwind the changes just apply the filters to the total set
of available images and tests. That way we don't inadvertently build
images only not to use them later.

Signed-off-by: Alex Bennée 
Reported-by: Alex Williamson 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-2-alex.ben...@linaro.org>

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index f1a0c5db7a..0ec59b2193 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -8,13 +8,19 @@ COMMA := ,
 
 HOST_ARCH = $(if $(ARCH),$(ARCH),$(shell uname -m))
 
+# These variables can be set by the user to limit the set of docker
+# images and tests to a more restricted subset
+TESTS ?= %
+IMAGES ?= %
+
 DOCKER_SUFFIX := .docker
 DOCKER_FILES_DIR := $(SRC_PATH)/tests/docker/dockerfiles
 # we don't run tests on intermediate images (used as base by another image)
 DOCKER_PARTIAL_IMAGES := debian10 debian11
 # we don't directly build virtual images (they are used to build other images)
 DOCKER_VIRTUAL_IMAGES := debian-bootstrap debian-toolchain empty
-DOCKER_IMAGES := $(sort $(filter-out $(DOCKER_VIRTUAL_IMAGES), $(notdir 
$(basename $(wildcard $(DOCKER_FILES_DIR)/*.docker)
+__IMAGES := $(sort $(filter-out $(DOCKER_VIRTUAL_IMAGES), $(notdir $(basename 
$(wildcard $(DOCKER_FILES_DIR)/*.docker)
+DOCKER_IMAGES := $(if $(IMAGES), $(filter $(IMAGES), $(__IMAGES)), $(__IMAGES))
 DOCKER_TARGETS := $(patsubst %,docker-image-%,$(DOCKER_IMAGES))
 # Use a global constant ccache directory to speed up repetitive builds
 DOCKER_CCACHE_DIR := $$HOME/.cache/qemu-docker-ccache
@@ -23,16 +29,14 @@ DOCKER_DEFAULT_REGISTRY := 
registry.gitlab.com/qemu-project/qemu
 endif
 DOCKER_REGISTRY := $(if $(REGISTRY),$(REGISTRY),$(DOCKER_DEFAULT_REGISTRY))
 
-DOCKER_TESTS := $(notdir $(shell \
-   find $(SRC_PATH)/tests/docker/ -name 'test-*' -type f))
+__TESTS := $(notdir $(shell \
+   find $(SRC_PATH)/tests/docker/ -name 'test-*' -type f))
+DOCKER_TESTS := $(if $(TESTS), $(filter $(TESTS), $(__TESTS)), $(__TESTS))
 
 ENGINE := auto
 
 DOCKER_SCRIPT=$(SRC_PATH)/tests/docker/docker.py --engine $(ENGINE)
 
-TESTS ?= %
-IMAGES ?= %
-
 CUR_TIME := $(shell date +%Y-%m-%d-%H.%M.%S.)
 DOCKER_SRC_COPY := $(BUILD_DIR)/docker-src.$(CUR_TIME)
 
@@ -274,8 +278,8 @@ endif
@echo 'TARGET_LIST=a,b,cOverride target list in builds.'
@echo 'EXTRA_CONFIGURE_OPTS="..."'
@echo ' Extra configure options.'
-   @echo 'IMAGES="a b c ..":   Filters which images to build or run.'
-   @echo 'TESTS="x y z .." Filters which tests to run (for 
docker-test).'
+   @echo 'IMAGES="a b c ..":   Restrict available images to subset.'
+   @echo 'TESTS="x y z .." Restrict available tests to subset.'
@echo 'J=[0..9]*Overrides the -jN parameter for make 
commands'
@echo ' (default is 1)'
@echo 'DEBUG=1  Stop and drop to shell in the created 
container'
-- 
2.30.2




Re: [PATCH] tests/tcg/s390x: Fix mvc, mvo and pack tests with Clang

2022-03-01 Thread David Hildenbrand
On 01.03.22 10:39, Thomas Huth wrote:
> These instructions use addressing with a "base address", meaning
> that if register r0 is used, it is always treated as zero, no matter
> what value is stored in the register. So we have to make sure not
> to use register r0 for these instructions in our tests. There was
> no problem with GCC so far since it seems to always pick other
> registers by default, but Clang likes to chose register r0, too,
> so we have to use the "a" constraint to make sure that it does
> not pick r0 here.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tests/tcg/s390x/mvc.c  | 4 ++--
>  tests/tcg/s390x/mvo.c  | 4 ++--
>  tests/tcg/s390x/pack.c | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/tcg/s390x/mvc.c b/tests/tcg/s390x/mvc.c
> index aa552d52e5..7ae4c44550 100644
> --- a/tests/tcg/s390x/mvc.c
> +++ b/tests/tcg/s390x/mvc.c
> @@ -20,8 +20,8 @@ static inline void mvc_256(const char *dst, const char *src)
>  asm volatile (
>  "mvc 0(256,%[dst]),0(%[src])\n"
>  :
> -: [dst] "d" (dst),
> -  [src] "d" (src)
> +: [dst] "a" (dst),
> +  [src] "a" (src)
>  : "memory");
>  }
>  
> diff --git a/tests/tcg/s390x/mvo.c b/tests/tcg/s390x/mvo.c
> index 5546fe2a97..0c3ecdde2e 100644
> --- a/tests/tcg/s390x/mvo.c
> +++ b/tests/tcg/s390x/mvo.c
> @@ -11,8 +11,8 @@ int main(void)
>  asm volatile (
>  "mvo 0(4,%[dest]),0(3,%[src])\n"
>  :
> -: [dest] "d" (dest + 1),
> -  [src] "d" (src + 1)
> +: [dest] "a" (dest + 1),
> +  [src] "a" (src + 1)
>  : "memory");
>  
>  for (i = 0; i < sizeof(expected); i++) {
> diff --git a/tests/tcg/s390x/pack.c b/tests/tcg/s390x/pack.c
> index 4be36f29a7..55e7e214e8 100644
> --- a/tests/tcg/s390x/pack.c
> +++ b/tests/tcg/s390x/pack.c
> @@ -9,7 +9,7 @@ int main(void)
>  asm volatile(
>  "pack 2(4,%[data]),2(4,%[data])\n"
>  :
> -: [data] "r" (&data[0])
> +: [data] "a" (&data[0])
>  : "memory");
>  for (i = 0; i < 8; i++) {
>  if (data[i] != exp[i]) {

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb




[PULL 00/18] testing and semihosting updates

2022-03-01 Thread Alex Bennée
The following changes since commit fa435db8ce1dff3b15e3f59a12f55f7b3a347b08:

  Merge remote-tracking branch 'remotes/jsnow-gitlab/tags/python-pull-request' 
into staging (2022-02-24 12:48:14 +)

are available in the Git repository at:

  https://github.com/stsquad/qemu.git tags/pull-testing-and-semihosting-280222-1

for you to fetch changes up to b904a9096f112795e47986448c145f5970d33c33:

  tests/tcg: port SYS_HEAPINFO to a system test (2022-02-28 16:42:42 +)


Testing and semihosting updates:

  - restore TESTS/IMAGES filtering to docker tests
  - add NOUSER to alpine image
  - bump lcitool version
  - move arm64/s390x cross build images to lcitool
  - add aarch32 runner CI scripts
  - expand testing to more vectors
  - update s390x jobs to focal for gitlab/travis
  - disable threadcount for all sh4
  - fix semihosting SYS_HEAPINFO and test


Alex Bennée (17):
  tests/docker: restore TESTS/IMAGES filtering
  tests/docker: add NOUSER for alpine image
  tests/lcitool: update to latest version
  tests/docker: update debian-arm64-cross with lcitool
  tests/docker: update debian-s390x-cross with lcitool
  tests/docker: introduce debian-riscv64-test-cross
  scripts/ci: add build env rules for aarch32 on aarch64
  scripts/ci: allow for a secondary runner
  gitlab: add a new aarch32 custom runner definition
  tests/tcg/ppc64: clean-up handling of byte-reverse
  tests/tcg: build sha1-vector with O3 and compare
  tests/tcg: add sha512 test
  tests/tcg: add vectorised sha512 versions
  gitlab: upgrade the job definition for s390x to 20.04
  tests/tcg: completely disable threadcount for sh4
  semihosting/arm-compat: replace heuristic for softmmu SYS_HEAPINFO
  tests/tcg: port SYS_HEAPINFO to a system test

Thomas Huth (1):
  travis.yml: Update the s390x jobs to Ubuntu Focal

 docs/devel/ci-jobs.rst.inc |   7 +
 include/hw/loader.h|  14 +
 hw/core/loader.c   |  86 ++
 semihosting/arm-compat-semi.c  | 124 +--
 tests/tcg/aarch64/system/semiheap.c|  93 ++
 tests/tcg/multiarch/sha512.c   | 990 +
 .gitlab-ci.d/container-cross.yml   |  20 +-
 .gitlab-ci.d/custom-runners.yml|   2 +-
 ...untu-18.04-s390x.yml => ubuntu-20.04-s390x.yml} |  28 +-
 .../custom-runners/ubuntu-20.40-aarch32.yml|  23 +
 .travis.yml|  12 +-
 MAINTAINERS|   1 +
 scripts/ci/setup/build-environment.yml |  25 +
 scripts/ci/setup/gitlab-runner.yml |  38 +
 tests/docker/Makefile.include  |  29 +-
 tests/docker/dockerfiles/debian-arm64-cross.docker | 186 +++-
 .../dockerfiles/debian-arm64-test-cross.docker |  13 -
 .../dockerfiles/debian-riscv64-test-cross.docker   |  12 +
 tests/docker/dockerfiles/debian-s390x-cross.docker | 181 +++-
 tests/docker/dockerfiles/opensuse-leap.docker  |   3 +-
 tests/docker/dockerfiles/ubuntu1804.docker |   3 +-
 tests/docker/dockerfiles/ubuntu2004.docker |   3 +-
 tests/lcitool/libvirt-ci   |   2 +-
 tests/lcitool/refresh  |  16 +
 tests/tcg/aarch64/Makefile.target  |  17 +
 tests/tcg/arm/Makefile.target  |  17 +
 tests/tcg/configure.sh |   4 +-
 tests/tcg/i386/Makefile.target |   9 +
 tests/tcg/ppc64/Makefile.target|  20 +-
 tests/tcg/ppc64le/Makefile.target  |   9 +-
 tests/tcg/s390x/Makefile.target|   9 +
 tests/tcg/sh4/Makefile.target  |   2 +
 tests/tcg/x86_64/Makefile.target   |   7 +
 33 files changed, 1816 insertions(+), 189 deletions(-)
 create mode 100644 tests/tcg/aarch64/system/semiheap.c
 create mode 100644 tests/tcg/multiarch/sha512.c
 rename .gitlab-ci.d/custom-runners/{ubuntu-18.04-s390x.yml => 
ubuntu-20.04-s390x.yml} (87%)
 create mode 100644 .gitlab-ci.d/custom-runners/ubuntu-20.40-aarch32.yml
 delete mode 100644 tests/docker/dockerfiles/debian-arm64-test-cross.docker
 create mode 100644 tests/docker/dockerfiles/debian-riscv64-test-cross.docker

-- 
2.30.2




[PULL 03/18] tests/lcitool: update to latest version

2022-03-01 Thread Alex Bennée
We will need an update shortly for some new images.

Signed-off-by: Alex Bennée 
Message-Id: <20220225172021.3493923-4-alex.ben...@linaro.org>

diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
b/tests/docker/dockerfiles/opensuse-leap.docker
index 1b78d8369a..e1ad9434a3 100644
--- a/tests/docker/dockerfiles/opensuse-leap.docker
+++ b/tests/docker/dockerfiles/opensuse-leap.docker
@@ -127,8 +127,7 @@ RUN zypper update -y && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/g++ && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/gcc
 
-RUN pip3 install \
- meson==0.56.0
+RUN pip3 install meson==0.56.0
 
 ENV LANG "en_US.UTF-8"
 ENV MAKE "/usr/bin/make"
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker 
b/tests/docker/dockerfiles/ubuntu1804.docker
index 699f2dfc6a..0a622b467c 100644
--- a/tests/docker/dockerfiles/ubuntu1804.docker
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
@@ -134,8 +134,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/g++ && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/gcc
 
-RUN pip3 install \
- meson==0.56.0
+RUN pip3 install meson==0.56.0
 
 ENV LANG "en_US.UTF-8"
 ENV MAKE "/usr/bin/make"
diff --git a/tests/docker/dockerfiles/ubuntu2004.docker 
b/tests/docker/dockerfiles/ubuntu2004.docker
index 87513125b8..b9d06cb040 100644
--- a/tests/docker/dockerfiles/ubuntu2004.docker
+++ b/tests/docker/dockerfiles/ubuntu2004.docker
@@ -136,8 +136,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/g++ && \
 ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/gcc
 
-RUN pip3 install \
- meson==0.56.0
+RUN pip3 install meson==0.56.0
 
 ENV LANG "en_US.UTF-8"
 ENV MAKE "/usr/bin/make"
diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
index 6dd9b6fab1..f83b916d5e 16
--- a/tests/lcitool/libvirt-ci
+++ b/tests/lcitool/libvirt-ci
@@ -1 +1 @@
-Subproject commit 6dd9b6fab1fe081b16bc975485d7a02c81ba5fbe
+Subproject commit f83b916d5efa4bd33fbf4b7ea41bf6d535cc63fb
-- 
2.30.2




Re: [PATCH v2 10/14] vdpa: Add custom IOTLB translations to SVQ

2022-03-01 Thread Eugenio Perez Martin
On Mon, Feb 28, 2022 at 8:37 AM Jason Wang  wrote:
>
>
> 在 2022/2/27 下午9:41, Eugenio Pérez 写道:
> > Use translations added in VhostIOVATree in SVQ.
> >
> > Only introduce usage here, not allocation and deallocation. As with
> > previous patches, we use the dead code paths of shadow_vqs_enabled to
> > avoid commiting too many changes at once. These are impossible to take
> > at the moment.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |   6 +-
> >   include/hw/virtio/vhost-vdpa.h |   3 +
> >   hw/virtio/vhost-shadow-virtqueue.c |  76 -
> >   hw/virtio/vhost-vdpa.c | 128 -
> >   4 files changed, 187 insertions(+), 26 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 04c67685fd..b2f722d101 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -13,6 +13,7 @@
> >   #include "qemu/event_notifier.h"
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> > +#include "hw/virtio/vhost-iova-tree.h"
> >
> >   /* Shadow virtqueue to relay notifications */
> >   typedef struct VhostShadowVirtqueue {
> > @@ -43,6 +44,9 @@ typedef struct VhostShadowVirtqueue {
> >   /* Virtio device */
> >   VirtIODevice *vdev;
> >
> > +/* IOVA mapping */
> > +VhostIOVATree *iova_tree;
> > +
> >   /* Map for use the guest's descriptors */
> >   VirtQueueElement **ring_id_maps;
> >
> > @@ -78,7 +82,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
> > VirtIODevice *vdev,
> >VirtQueue *vq);
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > -VhostShadowVirtqueue *vhost_svq_new(void);
> > +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
> >
> >   void vhost_svq_free(gpointer vq);
> >   G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 009a9f3b6b..ee8e939ad0 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -14,6 +14,7 @@
> >
> >   #include 
> >
> > +#include "hw/virtio/vhost-iova-tree.h"
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> > @@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
> >   MemoryListener listener;
> >   struct vhost_vdpa_iova_range iova_range;
> >   bool shadow_vqs_enabled;
> > +/* IOVA mapping used by the Shadow Virtqueue */
> > +VhostIOVATree *iova_tree;
> >   GPtrArray *shadow_vqs;
> >   struct vhost_dev *dev;
> >   VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index a38d313755..7e073773d1 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -11,6 +11,7 @@
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >
> >   #include "qemu/error-report.h"
> > +#include "qemu/log.h"
> >   #include "qemu/main-loop.h"
> >   #include "qemu/log.h"
> >   #include "linux-headers/linux/vhost.h"
> > @@ -84,7 +85,58 @@ static void 
> > vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> >   }
> >   }
> >
> > +/**
> > + * Translate addresses between the qemu's virtual address and the SVQ IOVA
> > + *
> > + * @svqShadow VirtQueue
> > + * @vaddr  Translated IOVA addresses
> > + * @iovec  Source qemu's VA addresses
> > + * @numLength of iovec and minimum length of vaddr
> > + */
> > +static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > + void **addrs, const struct iovec 
> > *iovec,
> > + size_t num)
> > +{
> > +if (num == 0) {
> > +return true;
> > +}
> > +
> > +for (size_t i = 0; i < num; ++i) {
> > +DMAMap needle = {
> > +.translated_addr = (hwaddr)iovec[i].iov_base,
> > +.size = iovec[i].iov_len,
> > +};
> > +size_t off;
> > +
> > +const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, 
> > &needle);
> > +/*
> > + * Map cannot be NULL since iova map contains all guest space and
> > + * qemu already has a physical address mapped
> > + */
> > +if (unlikely(!map)) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "Invalid address 0x%"HWADDR_PRIx" given by 
> > guest",
> > +  needle.translated_addr);
> > +return false;
> > +}
> > +
> > +off = needle.translated_addr - map->translated_addr;
> > +addrs[i] = (void *)(map->iova + off);
> > +
> > +if (unlikely(int128_gt(int128_add(needle.translated_addr,
> > +  iovec[i].iov_len),
> > + 

[PULL 15/18] gitlab: upgrade the job definition for s390x to 20.04

2022-03-01 Thread Alex Bennée
The new s390x machine has more of everything including the OS. As
18.04 will soon be going we might as well get onto something moderately
modern.

Signed-off-by: Alex Bennée 
Acked-by: Christian Borntraeger 
Reviewed-by: Thomas Huth 
Acked-by: Cornelia Huck 
Reviewed-by: Philippe Mathieu-Daudé 
Cc: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-16-alex.ben...@linaro.org>

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
index 056c374619..3e76a2034a 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -14,6 +14,6 @@ variables:
   GIT_STRATEGY: clone
 
 include:
-  - local: '/.gitlab-ci.d/custom-runners/ubuntu-18.04-s390x.yml'
+  - local: '/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml'
   - local: '/.gitlab-ci.d/custom-runners/ubuntu-20.04-aarch64.yml'
   - local: '/.gitlab-ci.d/custom-runners/centos-stream-8-x86_64.yml'
diff --git a/.gitlab-ci.d/custom-runners/ubuntu-18.04-s390x.yml 
b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
similarity index 87%
rename from .gitlab-ci.d/custom-runners/ubuntu-18.04-s390x.yml
rename to .gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
index f39d874a1e..0333872113 100644
--- a/.gitlab-ci.d/custom-runners/ubuntu-18.04-s390x.yml
+++ b/.gitlab-ci.d/custom-runners/ubuntu-20.04-s390x.yml
@@ -1,12 +1,12 @@
-# All ubuntu-18.04 jobs should run successfully in an environment
+# All ubuntu-20.04 jobs should run successfully in an environment
 # setup by the scripts/ci/setup/build-environment.yml task
-# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+# "Install basic packages to build QEMU on Ubuntu 20.04/20.04"
 
-ubuntu-18.04-s390x-all-linux-static:
+ubuntu-20.04-s390x-all-linux-static:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -21,11 +21,11 @@ ubuntu-18.04-s390x-all-linux-static:
  - make --output-sync -j`nproc` check V=1
  - make --output-sync -j`nproc` check-tcg V=1
 
-ubuntu-18.04-s390x-all:
+ubuntu-20.04-s390x-all:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -37,11 +37,11 @@ ubuntu-18.04-s390x-all:
  - make --output-sync -j`nproc`
  - make --output-sync -j`nproc` check V=1
 
-ubuntu-18.04-s390x-alldbg:
+ubuntu-20.04-s390x-alldbg:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -58,11 +58,11 @@ ubuntu-18.04-s390x-alldbg:
  - make --output-sync -j`nproc`
  - make --output-sync -j`nproc` check V=1
 
-ubuntu-18.04-s390x-clang:
+ubuntu-20.04-s390x-clang:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -78,11 +78,11 @@ ubuntu-18.04-s390x-clang:
  - make --output-sync -j`nproc`
  - make --output-sync -j`nproc` check V=1
 
-ubuntu-18.04-s390x-tci:
+ubuntu-20.04-s390x-tci:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
@@ -97,11 +97,11 @@ ubuntu-18.04-s390x-tci:
  - ../configure --disable-libssh --enable-tcg-interpreter
  - make --output-sync -j`nproc`
 
-ubuntu-18.04-s390x-notcg:
+ubuntu-20.04-s390x-notcg:
  needs: []
  stage: build
  tags:
- - ubuntu_18.04
+ - ubuntu_20.04
  - s390x
  rules:
  - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
-- 
2.30.2




[PULL 02/18] tests/docker: add NOUSER for alpine image

2022-03-01 Thread Alex Bennée
The alpine image doesn't have a standard useradd binary so disable
this convenience feature for it.

Signed-off-by: Alex Bennée 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-3-alex.ben...@linaro.org>

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 0ec59b2193..286f0ac5b5 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -158,6 +158,9 @@ docker-image-debian-native: DOCKER_REGISTRY=
 docker-image-debian10: NOUSER=1
 docker-image-debian11: NOUSER=1
 
+# alpine has no adduser
+docker-image-alpine: NOUSER=1
+
 #
 # The build rule for hexagon-cross is special in so far for most of
 # the time we don't want to build it. While dockers caching does avoid
-- 
2.30.2




[PULL 10/18] tests/tcg/ppc64: clean-up handling of byte-reverse

2022-03-01 Thread Alex Bennée
Rather than having an else leg for the missing compiler case we can
simply just not add the test - the same way as is done for ppc64le.
Also while we are at it fix up the compiler invocation.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-11-alex.ben...@linaro.org>

diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index 0368007028..9d6dfc1e26 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -10,19 +10,14 @@ PPC64_TESTS=bcdsub non_signalling_xscv
 endif
 $(PPC64_TESTS): CFLAGS += -mpower8-vector
 
-PPC64_TESTS += byte_reverse
 PPC64_TESTS += mtfsf
+
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER10),)
+PPC64_TESTS += byte_reverse
+endif
+byte_reverse: CFLAGS += -mcpu=power10
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
 run-plugin-byte_reverse-with-%: QEMU_OPTS+=-cpu POWER10
-else
-byte_reverse:
-   $(call skip-test, "BUILD of $@", "missing compiler support")
-run-byte_reverse:
-   $(call skip-test, "RUN of byte_reverse", "not built")
-run-plugin-byte_reverse-with-%:
-   $(call skip-test, "RUN of byte_reverse ($*)", "not built")
-endif
 
 PPC64_TESTS += signal_save_restore_xer
 
-- 
2.30.2




Re: [PATCH 1/3] util & iothread: Introduce event-loop abstract class

2022-03-01 Thread Stefan Hajnoczi
On Mon, Feb 28, 2022 at 08:05:52PM +0100, Nicolas Saenz Julienne wrote:
> On Thu, 2022-02-24 at 09:48 +, Stefan Hajnoczi wrote:
> > On Mon, Feb 21, 2022 at 06:08:43PM +0100, Nicolas Saenz Julienne wrote:
> > > diff --git a/qom/meson.build b/qom/meson.build
> > > index 062a3789d8..c20e5dd1cb 100644
> > > --- a/qom/meson.build
> > > +++ b/qom/meson.build
> > > @@ -4,6 +4,7 @@ qom_ss.add(files(
> > >'object.c',
> > >'object_interfaces.c',
> > >'qom-qobject.c',
> > > +  '../util/event-loop.c',
> > 
> > This looks strange. I expected util/event-loop.c to be in
> > util/meson.build and added to the util_ss SourceSet instead of qom_ss.
> > 
> > What is the reason for this?
> 
> Sorry I meant to move it into the qom directory while cleaning up the series
> but forgot about it.
> 
> That said, I can see how moving 'event-loop-backend' in qom_ss isn't the
> cleanest.

Yes, qom/ is meant for the QEMU Object Model infrastructure itself, not
for all the QOM classes that rely on it.

> So I tried moving it into util_ss, but for some reason nobody is calling
> 'type_init(even_loop_register_type)'. My guess is there's some compilation
> quirk I'm missing.

Maybe the issue is that libqemuutil.a (util_ss) object files are linked
on demand. If there are no symbol dependencies in the main QEMU code to
event-loop.o then it won't be linked into the executable. That may be
why event_loop_register_type() isn't being called (it's set up by an
__attribute__((constructor)) function in event-loop.o so it doesn't help
create a symbol dependency).

> Any suggestions? I wonder if util_ss is the right spot for 
> 'event-loop-backend'
> anyway, but I don't have a better idea.

What Paolo suggested sounds good: move event-loop.c next to iothread.c
in the top-level source directory.

Stefan


signature.asc
Description: PGP signature


[PULL 11/18] tests/tcg: build sha1-vector with O3 and compare

2022-03-01 Thread Alex Bennée
The aim of this is to test code generation for vectorised operations.
Unfortunately gcc struggles to do much with the messy sha1 code (try
-fopt-info-vec-missed to see why). However it's better than nothing.

We assume the non-vectorised output is gold and baring compiler bugs
the outputs should match.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-12-alex.ben...@linaro.org>

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 1d967901bd..df3f8e9438 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -50,6 +50,16 @@ sysregs: CFLAGS+=-march=armv8.1-a+sve
 AARCH64_TESTS += sve-ioctls
 sve-ioctls: CFLAGS+=-march=armv8.1-a+sve
 
+# Vector SHA1
+sha1-vector: CFLAGS=-O3
+sha1-vector: sha1.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+run-sha1-vector: sha1-vector run-sha1
+   $(call run-test, $<, $(QEMU) $(QEMU_OPTS) $<, "$< on $(TARGET_NAME)")
+   $(call diff-out, sha1-vector, sha1.out)
+
+TESTS += sha1-vector
+
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
index f509d823d4..2dc94931c3 100644
--- a/tests/tcg/arm/Makefile.target
+++ b/tests/tcg/arm/Makefile.target
@@ -70,6 +70,15 @@ endif
 
 ARM_TESTS += commpage
 
+# Vector SHA1
+sha1-vector: CFLAGS=-O3
+sha1-vector: sha1.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+run-sha1-vector: sha1-vector run-sha1
+   $(call run-test, $<, $(QEMU) $(QEMU_OPTS) $<, "$< on $(TARGET_NAME)")
+   $(call diff-out, sha1-vector, sha1.out)
+
+ARM_TESTS += sha1-vector
 TESTS += $(ARM_TESTS)
 
 # On ARM Linux only supports 4k pages
-- 
2.30.2




[PULL 04/18] tests/docker: update debian-arm64-cross with lcitool

2022-03-01 Thread Alex Bennée
Using lcitool update debian-arm64-cross to a Debian 11 based system.
As a result we can drop debian-arm64-test-cross just for building
tests.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-5-alex.ben...@linaro.org>

diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index a3b5b90552..ed620620f8 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -21,18 +21,10 @@ amd64-debian-user-cross-container:
 
 arm64-debian-cross-container:
   extends: .container_job_template
-  stage: containers-layer2
-  needs: ['amd64-debian10-container']
+  stage: containers
   variables:
 NAME: debian-arm64-cross
 
-arm64-test-debian-cross-container:
-  extends: .container_job_template
-  stage: containers-layer2
-  needs: ['amd64-debian11-container']
-  variables:
-NAME: debian-arm64-test-cross
-
 armel-debian-cross-container:
   extends: .container_job_template
   stage: containers-layer2
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 286f0ac5b5..1e6bdf 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -137,7 +137,6 @@ DOCKER_PARTIAL_IMAGES += fedora
 endif
 
 docker-image-debian-alpha-cross: docker-image-debian10
-docker-image-debian-arm64-cross: docker-image-debian10
 docker-image-debian-armel-cross: docker-image-debian10
 docker-image-debian-armhf-cross: docker-image-debian10
 docker-image-debian-hppa-cross: docker-image-debian10
@@ -213,14 +212,12 @@ docker-image-debian-nios2-cross: 
$(DOCKER_FILES_DIR)/debian-toolchain.docker \
 # Specialist build images, sometimes very limited tools
 docker-image-debian-tricore-cross: docker-image-debian10
 docker-image-debian-all-test-cross: docker-image-debian10
-docker-image-debian-arm64-test-cross: docker-image-debian11
 docker-image-debian-microblaze-cross: docker-image-debian10
 docker-image-debian-nios2-cross: docker-image-debian10
 docker-image-debian-powerpc-test-cross: docker-image-debian11
 
 # These images may be good enough for building tests but not for test builds
 DOCKER_PARTIAL_IMAGES += debian-alpha-cross
-DOCKER_PARTIAL_IMAGES += debian-arm64-test-cross
 DOCKER_PARTIAL_IMAGES += debian-powerpc-test-cross
 DOCKER_PARTIAL_IMAGES += debian-hppa-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
diff --git a/tests/docker/dockerfiles/debian-arm64-cross.docker 
b/tests/docker/dockerfiles/debian-arm64-cross.docker
index 166e24df13..589510a7be 100644
--- a/tests/docker/dockerfiles/debian-arm64-cross.docker
+++ b/tests/docker/dockerfiles/debian-arm64-cross.docker
@@ -1,32 +1,166 @@
+# THIS FILE WAS AUTO-GENERATED
 #
-# Docker arm64 cross-compiler target
+#  $ lcitool dockerfile --layers all --cross aarch64 debian-11 qemu
 #
-# This docker target builds on the debian Buster base image.
-#
-FROM qemu/debian10
+# https://gitlab.com/libvirt/libvirt-ci
 
-# Add the foreign architecture we want and install dependencies
-RUN dpkg --add-architecture arm64
-RUN apt update && \
-DEBIAN_FRONTEND=noninteractive eatmydata \
-apt install -y --no-install-recommends \
-crossbuild-essential-arm64
-RUN apt update && \
-DEBIAN_FRONTEND=noninteractive eatmydata \
-apt build-dep -yy -a arm64 --arch-only qemu
+FROM docker.io/library/debian:11-slim
 
-# Specify the cross prefix for this image (see tests/docker/common.rc)
-ENV QEMU_CONFIGURE_OPTS --cross-prefix=aarch64-linux-gnu-
-ENV DEF_TARGET_LIST aarch64-softmmu,aarch64-linux-user
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y eatmydata && \
+eatmydata apt-get dist-upgrade -y && \
+eatmydata apt-get install --no-install-recommends -y \
+bash \
+bc \
+bsdextrautils \
+bzip2 \
+ca-certificates \
+ccache \
+dbus \
+debianutils \
+diffutils \
+exuberant-ctags \
+findutils \
+gcovr \
+genisoimage \
+gettext \
+git \
+hostname \
+libpcre2-dev \
+libspice-protocol-dev \
+libtest-harness-perl \
+llvm \
+locales \
+make \
+meson \
+ncat \
+ninja-build \
+openssh-client \
+perl-base \
+pkgconf \
+python3 \
+python3-numpy \
+python3-opencv \
+python3-pillow \
+python3-pip \
+python3-sphinx \
+python3-sphinx-rtd-theme \
+python3-venv \
+python3-yaml \
+rpm2cpio \
+sed \
+sparse \
+tar \
+tesseract-ocr \
+tesseract-ocr-eng \
+texinfo && \
+eatmydata apt-get autoremove -y && \
+eatmydata apt-get autoclean -y && \
+sed -Ei 's,^# (en_US\.UTF-8 .

[PULL 14/18] travis.yml: Update the s390x jobs to Ubuntu Focal

2022-03-01 Thread Alex Bennée
From: Thomas Huth 

QEMU will soon drop the support for Ubuntu 18.04, so let's update
the Travis jobs that were still using this version to 20.04 instead.

While we're at it, also remove an obsolete comment about Ubuntu
Xenial being the default for our Travis jobs.

Signed-off-by: Thomas Huth 
Signed-off-by: Alex Bennée 
Message-Id: <20220221153423.1028465-1-th...@redhat.com>
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-15-alex.ben...@linaro.org>

diff --git a/.travis.yml b/.travis.yml
index 41010ebe6b..c3c8048842 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,6 +1,3 @@
-# The current Travis default is a VM based 16.04 Xenial on GCE
-# Additional builds with specific requirements for a full VM need to
-# be added as additional matrix: entries later on
 os: linux
 dist: focal
 language: c
@@ -190,7 +187,7 @@ jobs:
 
 - name: "[s390x] GCC check-tcg"
   arch: s390x
-  dist: bionic
+  dist: focal
   addons:
 apt_packages:
   - libaio-dev
@@ -233,7 +230,7 @@ jobs:
 
 - name: "[s390x] GCC (other-softmmu)"
   arch: s390x
-  dist: bionic
+  dist: focal
   addons:
 apt_packages:
   - libaio-dev
@@ -263,10 +260,11 @@ jobs:
 
 - name: "[s390x] GCC (user)"
   arch: s390x
-  dist: bionic
+  dist: focal
   addons:
 apt_packages:
   - libgcrypt20-dev
+  - libglib2.0-dev
   - libgnutls28-dev
   - ninja-build
   env:
@@ -274,7 +272,7 @@ jobs:
 
 - name: "[s390x] Clang (disable-tcg)"
   arch: s390x
-  dist: bionic
+  dist: focal
   compiler: clang
   addons:
 apt_packages:
-- 
2.30.2




[PULL 05/18] tests/docker: update debian-s390x-cross with lcitool

2022-03-01 Thread Alex Bennée
A later compiler is needed for some upcomming tests so we might as
well migrate to an lcitool generated docker file.

Signed-off-by: Alex Bennée 
Cc: David Hildenbrand 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-6-alex.ben...@linaro.org>

diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index ed620620f8..d38f657131 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -133,8 +133,7 @@ riscv64-debian-cross-container:
 
 s390x-debian-cross-container:
   extends: .container_job_template
-  stage: containers-layer2
-  needs: ['amd64-debian10-container']
+  stage: containers
   variables:
 NAME: debian-s390x-cross
 
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 1e6bdf..cce9faab36 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -146,7 +146,6 @@ docker-image-debian-mips64-cross: docker-image-debian10
 docker-image-debian-mips64el-cross: docker-image-debian10
 docker-image-debian-mipsel-cross: docker-image-debian10
 docker-image-debian-ppc64el-cross: docker-image-debian10
-docker-image-debian-s390x-cross: docker-image-debian10
 docker-image-debian-sh4-cross: docker-image-debian10
 docker-image-debian-sparc64-cross: docker-image-debian10
 
diff --git a/tests/docker/dockerfiles/debian-s390x-cross.docker 
b/tests/docker/dockerfiles/debian-s390x-cross.docker
index 9f2ab51eb0..aa1bd6eb4c 100644
--- a/tests/docker/dockerfiles/debian-s390x-cross.docker
+++ b/tests/docker/dockerfiles/debian-s390x-cross.docker
@@ -1,33 +1,164 @@
+# THIS FILE WAS AUTO-GENERATED
 #
-# Docker s390 cross-compiler target
+#  $ lcitool dockerfile --layers all --cross s390x debian-11 qemu
 #
-# This docker target builds on the debian Stretch base image.
-#
-FROM qemu/debian10
+# https://gitlab.com/libvirt/libvirt-ci
+
+FROM docker.io/library/debian:11-slim
 
-# Add the s390x architecture
-RUN dpkg --add-architecture s390x
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y eatmydata && \
+eatmydata apt-get dist-upgrade -y && \
+eatmydata apt-get install --no-install-recommends -y \
+bash \
+bc \
+bsdextrautils \
+bzip2 \
+ca-certificates \
+ccache \
+dbus \
+debianutils \
+diffutils \
+exuberant-ctags \
+findutils \
+gcovr \
+genisoimage \
+gettext \
+git \
+hostname \
+libpcre2-dev \
+libspice-protocol-dev \
+libtest-harness-perl \
+llvm \
+locales \
+make \
+meson \
+ncat \
+ninja-build \
+openssh-client \
+perl-base \
+pkgconf \
+python3 \
+python3-numpy \
+python3-opencv \
+python3-pillow \
+python3-pip \
+python3-sphinx \
+python3-sphinx-rtd-theme \
+python3-venv \
+python3-yaml \
+rpm2cpio \
+sed \
+sparse \
+tar \
+tesseract-ocr \
+tesseract-ocr-eng \
+texinfo && \
+eatmydata apt-get autoremove -y && \
+eatmydata apt-get autoclean -y && \
+sed -Ei 's,^# (en_US\.UTF-8 .*)$,\1,' /etc/locale.gen && \
+dpkg-reconfigure locales
 
-# Grab the updated list of packages
-RUN apt update && apt dist-upgrade -yy
-RUN apt update && \
-DEBIAN_FRONTEND=noninteractive eatmydata \
-apt install -y --no-install-recommends \
-gcc-multilib-s390x-linux-gnu
+ENV LANG "en_US.UTF-8"
+ENV MAKE "/usr/bin/make"
+ENV NINJA "/usr/bin/ninja"
+ENV PYTHON "/usr/bin/python3"
+ENV CCACHE_WRAPPERSDIR "/usr/libexec/ccache-wrappers"
 
-RUN apt update && \
-DEBIAN_FRONTEND=noninteractive eatmydata \
-apt build-dep -yy -a s390x --arch-only qemu
+RUN export DEBIAN_FRONTEND=noninteractive && \
+dpkg --add-architecture s390x && \
+eatmydata apt-get update && \
+eatmydata apt-get dist-upgrade -y && \
+eatmydata apt-get install --no-install-recommends -y dpkg-dev && \
+eatmydata apt-get install --no-install-recommends -y \
+g++-s390x-linux-gnu \
+gcc-s390x-linux-gnu \
+libaio-dev:s390x \
+libasan5:s390x \
+libasound2-dev:s390x \
+libattr1-dev:s390x \
+libbpf-dev:s390x \
+libbrlapi-dev:s390x \
+libbz2-dev:s390x \
+libc6-dev:s390x \
+libcacard-dev:s390x \
+libcap-ng-dev:s390x \
+libcapstone-dev:s390x \
+libcurl4-gnutls-dev:s390x \
+libdaxctl-dev:s390x \
+libdrm-dev:s390x \
+libepoxy-dev:s390x \
+libfdt-dev:s390x \
+libffi-dev:s390x \
+libfuse3-dev:s390x \
+

[PULL 13/18] tests/tcg: add vectorised sha512 versions

2022-03-01 Thread Alex Bennée
This builds vectorised versions of sha512 to exercise the vector code:

  - aarch64 (AdvSimd)
  - i386 (SSE)
  - s390x (MVX)
  - ppc64/ppc64le (power10 vectors)

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-14-alex.ben...@linaro.org>

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index df3f8e9438..ac07acde66 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -60,6 +60,13 @@ run-sha1-vector: sha1-vector run-sha1
 
 TESTS += sha1-vector
 
+# Vector versions of sha512 (-O3 triggers vectorisation)
+sha512-vector: CFLAGS=-O3
+sha512-vector: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+TESTS += sha512-vector
+
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
index 2dc94931c3..2f815120a5 100644
--- a/tests/tcg/arm/Makefile.target
+++ b/tests/tcg/arm/Makefile.target
@@ -79,6 +79,14 @@ run-sha1-vector: sha1-vector run-sha1
$(call diff-out, sha1-vector, sha1.out)
 
 ARM_TESTS += sha1-vector
+
+# Vector versions of sha512 (-O3 triggers vectorisation)
+sha512-vector: CFLAGS=-O3
+sha512-vector: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+ARM_TESTS += sha512-vector
+
 TESTS += $(ARM_TESTS)
 
 # On ARM Linux only supports 4k pages
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 38c10379af..e1c0310be6 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -71,3 +71,12 @@ TESTS=$(MULTIARCH_TESTS) $(I386_TESTS)
 
 # On i386 and x86_64 Linux only supports 4k pages (large pages are a different 
hack)
 EXTRA_RUNS+=run-test-mmap-4096
+
+sha512-sse: CFLAGS=-msse4.1 -O3
+sha512-sse: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+run-sha512-sse: QEMU_OPTS+=-cpu max
+run-plugin-sha512-sse-with-%: QEMU_OPTS+=-cpu max
+
+TESTS+=sha512-sse
diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index 9d6dfc1e26..c9498053df 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -13,12 +13,19 @@ $(PPC64_TESTS): CFLAGS += -mpower8-vector
 PPC64_TESTS += mtfsf
 
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER10),)
-PPC64_TESTS += byte_reverse
+PPC64_TESTS += byte_reverse sha512-vector
 endif
 byte_reverse: CFLAGS += -mcpu=power10
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
 run-plugin-byte_reverse-with-%: QEMU_OPTS+=-cpu POWER10
 
+sha512-vector: CFLAGS +=-mcpu=power10 -O3
+sha512-vector: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+run-sha512-vector: QEMU_OPTS+=-cpu POWER10
+run-plugin-sha512-vector-with-%: QEMU_OPTS+=-cpu POWER10
+
 PPC64_TESTS += signal_save_restore_xer
 
 TESTS += $(PPC64_TESTS)
diff --git a/tests/tcg/ppc64le/Makefile.target 
b/tests/tcg/ppc64le/Makefile.target
index 480ff0898d..12d85e946b 100644
--- a/tests/tcg/ppc64le/Makefile.target
+++ b/tests/tcg/ppc64le/Makefile.target
@@ -10,12 +10,19 @@ endif
 $(PPC64LE_TESTS): CFLAGS += -mpower8-vector
 
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER10),)
-PPC64LE_TESTS += byte_reverse
+PPC64LE_TESTS += byte_reverse sha512-vector
 endif
 byte_reverse: CFLAGS += -mcpu=power10
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
 run-plugin-byte_reverse-with-%: QEMU_OPTS+=-cpu POWER10
 
+sha512-vector: CFLAGS +=-mcpu=power10 -O3
+sha512-vector: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+run-sha512-vector: QEMU_OPTS+=-cpu POWER10
+run-plugin-sha512-vector-with-%: QEMU_OPTS+=-cpu POWER10
+
 PPC64LE_TESTS += mtfsf
 PPC64LE_TESTS += signal_save_restore_xer
 
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 1a7238b4eb..e53b599b22 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -25,3 +25,12 @@ run-gdbstub-signals-s390x: signals-s390x
 
 EXTRA_RUNS += run-gdbstub-signals-s390x
 endif
+
+# MVX versions of sha512
+sha512-mvx: CFLAGS=-march=z13 -mvx -O3
+sha512-mvx: sha512.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+run-sha512-mvx: QEMU_OPTS+=-cpu max
+
+TESTS+=sha512-mvx
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index 4a8a464c57..17cf168f0a 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -22,3 +22,10 @@ test-x86_64: test-i386.c test-i386.h test-i386-shift.h 
test-i386-muldiv.h
 
 vsyscall: $(SRC_PATH)/tests/tcg/x86_64/vsyscall.c
$(CC) $(CFLAGS) $< -o $@ $(LDFLAGS)
+
+# TCG does not yet support all SSE (SIGILL on pshufb)
+# sha512-sse: CFLAGS=-march=core2 -O3
+# sha512-sse: sha512.c
+#  $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+TESTS+=sha512-sse
-- 
2.30.2




[PULL 08/18] scripts/ci: allow for a secondary runner

2022-03-01 Thread Alex Bennée
Some HW can run multiple architecture profiles so we can install a
secondary runner to build and run tests for those profiles. This
allows setting up secondary service.

Signed-off-by: Alex Bennée 
Acked-by: Richard Henderson 
Message-Id: <20220225172021.3493923-9-alex.ben...@linaro.org>

diff --git a/scripts/ci/setup/gitlab-runner.yml 
b/scripts/ci/setup/gitlab-runner.yml
index 1127db516f..33128be85d 100644
--- a/scripts/ci/setup/gitlab-runner.yml
+++ b/scripts/ci/setup/gitlab-runner.yml
@@ -69,3 +69,41 @@
 name: gitlab-runner
 state: started
 enabled: yes
+
+- name: Download secondary gitlab-runner
+  get_url:
+dest: /usr/local/bin/gitlab-runner-arm
+url: "https://s3.amazonaws.com/gitlab-runner-downloads/v{{ 
gitlab_runner_version  }}/binaries/gitlab-runner-{{ gitlab_runner_os }}-arm"
+owner: gitlab-runner
+group: gitlab-runner
+mode: u=rwx,g=rwx,o=rx
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['architecture'] == 'aarch64'
+- ansible_facts['distribution_version'] == '20.04'
+
+- name: Register secondary gitlab-runner
+  command: "/usr/local/bin/gitlab-runner-arm register --non-interactive 
--url {{ gitlab_runner_server_url }} --registration-token {{ 
gitlab_runner_registration_token }} --executor shell --tag-list aarch32,{{ 
ansible_facts[\"distribution\"]|lower }}_{{ 
ansible_facts[\"distribution_version\"] }} --description '{{ 
ansible_facts[\"distribution\"] }} {{ ansible_facts[\"distribution_version\"] 
}} {{ ansible_facts[\"architecture\"] }} ({{ ansible_facts[\"os_family\"] }})'"
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['architecture'] == 'aarch64'
+- ansible_facts['distribution_version'] == '20.04'
+
+- name: Install the secondary gitlab-runner service using its own 
functionality
+  command: /usr/local/bin/gitlab-runner-arm install --user gitlab-runner 
--working-directory /home/gitlab-runner/arm -n gitlab-runner-arm
+  register: gitlab_runner_install_service_result
+  failed_when: "gitlab_runner_install_service_result.rc != 0 and \"already 
exists\" not in gitlab_runner_install_service_result.stderr"
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['architecture'] == 'aarch64'
+- ansible_facts['distribution_version'] == '20.04'
+
+- name: Enable the secondary gitlab-runner service
+  service:
+name: gitlab-runner-arm
+state: started
+enabled: yes
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['architecture'] == 'aarch64'
+- ansible_facts['distribution_version'] == '20.04'
-- 
2.30.2




[PULL 06/18] tests/docker: introduce debian-riscv64-test-cross

2022-03-01 Thread Alex Bennée
Cross building QEMU for riscv64 still involves messing about with sid
and ports. However for building tests we can have a slimmer compiler
only container which should be more stable.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-7-alex.ben...@linaro.org>

diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index d38f657131..e622ac2d21 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -131,6 +131,13 @@ riscv64-debian-cross-container:
   variables:
 NAME: debian-riscv64-cross
 
+# we can however build TCG tests using a non-sid base
+riscv64-debian-test-cross-container:
+  extends: .container_job_template
+  stage: containers-layer2
+  variables:
+NAME: debian-riscv64-test-cross
+
 s390x-debian-cross-container:
   extends: .container_job_template
   stage: containers
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index cce9faab36..e495b163a0 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -214,6 +214,7 @@ docker-image-debian-all-test-cross: docker-image-debian10
 docker-image-debian-microblaze-cross: docker-image-debian10
 docker-image-debian-nios2-cross: docker-image-debian10
 docker-image-debian-powerpc-test-cross: docker-image-debian11
+docker-image-debian-riscv64-test-cross: docker-image-debian11
 
 # These images may be good enough for building tests but not for test builds
 DOCKER_PARTIAL_IMAGES += debian-alpha-cross
@@ -222,6 +223,7 @@ DOCKER_PARTIAL_IMAGES += debian-hppa-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
 DOCKER_PARTIAL_IMAGES += debian-microblaze-cross
 DOCKER_PARTIAL_IMAGES += debian-nios2-cross
+DOCKER_PARTIAL_IMAGES += debian-riscv64-test-cross
 DOCKER_PARTIAL_IMAGES += debian-sh4-cross debian-sparc64-cross
 DOCKER_PARTIAL_IMAGES += debian-tricore-cross
 DOCKER_PARTIAL_IMAGES += debian-xtensa-cross
diff --git a/tests/docker/dockerfiles/debian-riscv64-test-cross.docker 
b/tests/docker/dockerfiles/debian-riscv64-test-cross.docker
new file mode 100644
index 00..1d90901298
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-riscv64-test-cross.docker
@@ -0,0 +1,12 @@
+#
+# Docker cross-compiler target
+#
+# This docker target builds on the Debian Bullseye base image.
+#
+FROM qemu/debian11
+
+RUN apt update && \
+DEBIAN_FRONTEND=noninteractive eatmydata \
+apt install -y --no-install-recommends \
+gcc-riscv64-linux-gnu \
+libc6-dev-riscv64-cross
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index adc95d6a44..0663bd19f4 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -180,7 +180,7 @@ for target in $target_list; do
   ;;
 riscv64-*)
   container_hosts=x86_64
-  container_image=debian-riscv64-cross
+  container_image=debian-riscv64-test-cross
   container_cross_cc=riscv64-linux-gnu-gcc
   ;;
 s390x-*)
-- 
2.30.2




[PULL 07/18] scripts/ci: add build env rules for aarch32 on aarch64

2022-03-01 Thread Alex Bennée
At least the current crop of Aarch64 HW can support running 32 bit EL0
code. Before we can build and test we need a minimal set of packages
installed. We can't use "apt build-dep" because it currently gets
confused trying to keep two sets of build-deps installed at once.
Instead we install a minimal set of libraries that will allow us to
continue.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-8-alex.ben...@linaro.org>

diff --git a/scripts/ci/setup/build-environment.yml 
b/scripts/ci/setup/build-environment.yml
index 599896cc5b..9182e0c253 100644
--- a/scripts/ci/setup/build-environment.yml
+++ b/scripts/ci/setup/build-environment.yml
@@ -19,6 +19,13 @@
   - '((ansible_version.major == 2) and (ansible_version.minor >= 8)) 
or (ansible_version.major >= 3)'
 msg: "Unsuitable ansible version, please use version 2.8.0 or later"
 
+- name: Add armhf foreign architecture to aarch64 hosts
+  command: dpkg --add-architecture armhf
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['architecture'] == 'aarch64'
+- ansible_facts['distribution_version'] == '20.04'
+
 - name: Update apt cache / upgrade packages via apt
   apt:
 update_cache: yes
@@ -115,6 +122,24 @@
 - ansible_facts['distribution'] == 'Ubuntu'
 - ansible_facts['distribution_version'] == '20.04'
 
+- name: Install armhf cross-compile packages to build QEMU on AArch64 
Ubuntu 20.04
+  package:
+name:
+  - binutils-arm-linux-gnueabihf
+  - gcc-arm-linux-gnueabihf
+  - libblkid-dev:armhf
+  - libc6-dev:armhf
+  - libffi-dev:armhf
+  - libglib2.0-dev:armhf
+  - libmount-dev:armhf
+  - libpcre2-dev:armhf
+  - libpixman-1-dev:armhf
+  - zlib1g-dev:armhf
+  when:
+- ansible_facts['distribution'] == 'Ubuntu'
+- ansible_facts['distribution_version'] == '20.04'
+- ansible_facts['architecture'] == 'aarch64'
+
 - name: Install basic packages to build QEMU on EL8
   dnf:
 # This list of packages start with 
tests/docker/dockerfiles/centos8.docker
-- 
2.30.2




[PULL 09/18] gitlab: add a new aarch32 custom runner definition

2022-03-01 Thread Alex Bennée
Although running on aarch64 hardware we can still target 32bit builds
with a cross compiler and run the resulting binaries.

Signed-off-by: Alex Bennée 
Message-Id: <20220225172021.3493923-10-alex.ben...@linaro.org>

diff --git a/docs/devel/ci-jobs.rst.inc b/docs/devel/ci-jobs.rst.inc
index db3f571d5f..92e25872aa 100644
--- a/docs/devel/ci-jobs.rst.inc
+++ b/docs/devel/ci-jobs.rst.inc
@@ -44,6 +44,13 @@ If you've got access to an aarch64 host that can be used as 
a gitlab-CI
 runner, you can set this variable to enable the tests that require this
 kind of host. The runner should be tagged with "aarch64".
 
+AARCH32_RUNNER_AVAILABLE
+
+If you've got access to an armhf host or an arch64 host that can run
+aarch32 EL0 code to be used as a gitlab-CI runner, you can set this
+variable to enable the tests that require this kind of host. The
+runner should be tagged with "aarch32".
+
 S390X_RUNNER_AVAILABLE
 ~~
 If you've got access to an IBM Z host that can be used as a gitlab-CI
diff --git a/.gitlab-ci.d/custom-runners/ubuntu-20.40-aarch32.yml 
b/.gitlab-ci.d/custom-runners/ubuntu-20.40-aarch32.yml
new file mode 100644
index 00..9c589bc4cf
--- /dev/null
+++ b/.gitlab-ci.d/custom-runners/ubuntu-20.40-aarch32.yml
@@ -0,0 +1,23 @@
+# All ubuntu-20.04 jobs should run successfully in an environment
+# setup by the scripts/ci/setup/qemu/build-environment.yml task
+# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+
+ubuntu-20.04-aarch32-all:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch32
+ rules:
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+   allow_failure: true
+ - if: "$AARCH32_RUNNER_AVAILABLE"
+   when: manual
+   allow_failure: true
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --cross-prefix=arm-linux-gnueabihf-
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
-- 
2.30.2




[PULL 12/18] tests/tcg: add sha512 test

2022-03-01 Thread Alex Bennée
This imports the sha512 algorithm and related tests from ccan which
offers a cleaner hash implementation with its own validation tests
with which we can exercise TCG code generations.

Signed-off-by: Alex Bennée 
Acked-by: Richard Henderson 
Message-Id: <20220225172021.3493923-13-alex.ben...@linaro.org>

diff --git a/tests/tcg/multiarch/sha512.c b/tests/tcg/multiarch/sha512.c
new file mode 100644
index 00..e1729828b9
--- /dev/null
+++ b/tests/tcg/multiarch/sha512.c
@@ -0,0 +1,990 @@
+/*
+ * sha512 test based on CCAN: https://ccodearchive.net/info/crypto/sha512.html
+ *
+ * src/crypto/sha512.cpp commit f914f1a746d7f91951c1da262a4a749dd3ebfa71
+ * Copyright (c) 2014 The Bitcoin Core developers
+ * Distributed under the MIT software license, see:
+ *  http://www.opensource.org/licenses/mit-license.php.
+ *
+ * SPDX-License-Identifier: MIT CC0-1.0
+ */
+#define _GNU_SOURCE /* See feature_test_macros(7) */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Required portions from endian.h */
+
+/**
+ * BSWAP_64 - reverse bytes in a constant uint64_t value.
+ * @val: constantvalue whose bytes to swap.
+ *
+ * Designed to be usable in constant-requiring initializers.
+ *
+ * Example:
+ *  struct mystruct {
+ *  char buf[BSWAP_64(0xff00ULL)];
+ *  };
+ */
+#define BSWAP_64(val)   \
+uint64_t)(val) & 0x00ffULL) << 56)  \
+ | (((uint64_t)(val) & 0xff00ULL) << 40)\
+ | (((uint64_t)(val) & 0x00ffULL) << 24)\
+ | (((uint64_t)(val) & 0xff00ULL) << 8) \
+ | (((uint64_t)(val) & 0x00ffULL) >> 8) \
+ | (((uint64_t)(val) & 0xff00ULL) >> 24)\
+ | (((uint64_t)(val) & 0x00ffULL) >> 40)\
+ | (((uint64_t)(val) & 0xff00ULL) >> 56))
+
+
+typedef uint64_t beint64_t;
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+
+/**
+ * CPU_TO_BE64 - convert a constant uint64_t value to big-endian
+ * @native: constant to convert
+ */
+#define CPU_TO_BE64(native) ((beint64_t)(native))
+/**
+ * BE64_TO_CPU - convert a big-endian uint64_t constant
+ * @le_val: big-endian constant to convert
+ */
+#define BE64_TO_CPU(le_val) ((uint64_t)(le_val))
+
+#else /* ... HAVE_LITTLE_ENDIAN */
+#define CPU_TO_BE64(native) ((beint64_t)BSWAP_64(native))
+#define BE64_TO_CPU(le_val) BSWAP_64((uint64_t)le_val)
+#endif /* HAVE_LITTE_ENDIAN */
+
+/**
+ * cpu_to_be64 - convert a uint64_t value to big endian.
+ * @native: value to convert
+ */
+static inline beint64_t cpu_to_be64(uint64_t native)
+{
+return CPU_TO_BE64(native);
+}
+
+/**
+ * be64_to_cpu - convert a big-endian uint64_t value
+ * @be_val: big-endian value to convert
+ */
+static inline uint64_t be64_to_cpu(beint64_t be_val)
+{
+return BE64_TO_CPU(be_val);
+}
+
+/* From compiler.h */
+
+#ifndef UNUSED
+/**
+ * UNUSED - a parameter is unused
+ *
+ * Some compilers (eg. gcc with -W or -Wunused) warn about unused
+ * function parameters.  This suppresses such warnings and indicates
+ * to the reader that it's deliberate.
+ *
+ * Example:
+ *  // This is used as a callback, so needs to have this prototype.
+ *  static int some_callback(void *unused UNUSED)
+ *  {
+ *  return 0;
+ *  }
+ */
+#define UNUSED __attribute__((__unused__))
+#endif
+
+/* From sha512.h */
+
+/**
+ * struct sha512 - structure representing a completed SHA512.
+ * @u.u8: an unsigned char array.
+ * @u.u64: a 64-bit integer array.
+ *
+ * Other fields may be added to the union in future.
+ */
+struct sha512 {
+union {
+uint64_t u64[8];
+unsigned char u8[64];
+} u;
+};
+
+/**
+ * sha512 - return sha512 of an object.
+ * @sha512: the sha512 to fill in
+ * @p: pointer to memory,
+ * @size: the number of bytes pointed to by @p
+ *
+ * The bytes pointed to by @p is SHA512 hashed into @sha512.  This is
+ * equivalent to sha512_init(), sha512_update() then sha512_done().
+ */
+void sha512(struct sha512 *sha, const void *p, size_t size);
+
+/**
+ * struct sha512_ctx - structure to store running context for sha512
+ */
+struct sha512_ctx {
+uint64_t s[8];
+union {
+uint64_t u64[16];
+unsigned char u8[128];
+} buf;
+size_t bytes;
+};
+
+/**
+ * sha512_init - initialize an SHA512 context.
+ * @ctx: the sha512_ctx to initialize
+ *
+ * This must be called before sha512_update or sha512_done, or
+ * alternately you can assign SHA512_INIT.
+ *
+ * If it was already initialized, this forgets anything which was
+ * hashed before.
+ *
+ * Example:
+ * static void hash_all(const char **arr, struct sha512 *hash)
+ * {
+ *  size_t i;
+ *  struct sha512_ctx ctx;
+ *
+ *  sha512_init(&ctx);
+ *  for (i = 0; arr[i]; i++)
+ *  sha512_update(&ctx, arr[i], strlen(arr[i]));
+ *  sha512_done(&ctx, hash);
+ * }
+ */
+void sha512_init(struct sha512_ctx *ctx);
+
+/**
+ * SHA512_INIT - initializer for an SHA512 context.
+ *
+ * This can be used to statically initialize an S

[PULL 17/18] semihosting/arm-compat: replace heuristic for softmmu SYS_HEAPINFO

2022-03-01 Thread Alex Bennée
The previous numbers were a guess at best and rather arbitrary without
taking into account anything that might be loaded. Instead of using
guesses based on the state of registers implement a new function that:

 a) scans the MemoryRegions for the largest RAM block
 b) iterates through all "ROM" blobs looking for the biggest gap

The "ROM" blobs include all code loaded via -kernel and the various
-device loader techniques.

Signed-off-by: Alex Bennée 
Cc: Andrew Strauss 
Cc: Keith Packard 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20220225172021.3493923-18-alex.ben...@linaro.org>

diff --git a/include/hw/loader.h b/include/hw/loader.h
index 4fa485bd61..5572108ba5 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -343,4 +343,18 @@ int rom_add_option(const char *file, int32_t bootindex);
  * overflow on real hardware too. */
 #define UBOOT_MAX_GUNZIP_BYTES (64 << 20)
 
+typedef struct RomGap {
+hwaddr base;
+size_t size;
+} RomGap;
+
+/**
+ * rom_find_largest_gap_between: return largest gap between ROMs in given range
+ *
+ * Given a range of addresses, this function finds the largest
+ * contiguous subrange which has no ROMs loaded to it. That is,
+ * it finds the biggest gap which is free for use for other things.
+ */
+RomGap rom_find_largest_gap_between(hwaddr base, size_t size);
+
 #endif
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 19edb928e9..ca2f2431fb 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1333,6 +1333,92 @@ static Rom *find_rom(hwaddr addr, size_t size)
 return NULL;
 }
 
+typedef struct RomSec {
+hwaddr base;
+int se; /* start/end flag */
+} RomSec;
+
+
+/*
+ * Sort into address order. We break ties between rom-startpoints
+ * and rom-endpoints in favour of the startpoint, by sorting the 0->1
+ * transition before the 1->0 transition. Either way round would
+ * work, but this way saves a little work later by avoiding
+ * dealing with "gaps" of 0 length.
+ */
+static gint sort_secs(gconstpointer a, gconstpointer b)
+{
+RomSec *ra = (RomSec *) a;
+RomSec *rb = (RomSec *) b;
+
+if (ra->base == rb->base) {
+return ra->se - rb->se;
+}
+return ra->base > rb->base ? 1 : -1;
+}
+
+static GList *add_romsec_to_list(GList *secs, hwaddr base, int se)
+{
+   RomSec *cand = g_new(RomSec, 1);
+   cand->base = base;
+   cand->se = se;
+   return g_list_prepend(secs, cand);
+}
+
+RomGap rom_find_largest_gap_between(hwaddr base, size_t size)
+{
+Rom *rom;
+RomSec *cand;
+RomGap res = {0, 0};
+hwaddr gapstart = base;
+GList *it, *secs = NULL;
+int count = 0;
+
+QTAILQ_FOREACH(rom, &roms, next) {
+/* Ignore blobs being loaded to special places */
+if (rom->mr || rom->fw_file) {
+continue;
+}
+/* ignore anything finishing bellow base */
+if (rom->addr + rom->romsize <= base) {
+continue;
+}
+/* ignore anything starting above the region */
+if (rom->addr >= base + size) {
+continue;
+}
+
+/* Save the start and end of each relevant ROM */
+secs = add_romsec_to_list(secs, rom->addr, 1);
+
+if (rom->addr + rom->romsize < base + size) {
+secs = add_romsec_to_list(secs, rom->addr + rom->romsize, -1);
+}
+}
+
+/* sentinel */
+secs = add_romsec_to_list(secs, base + size, 1);
+
+secs = g_list_sort(secs, sort_secs);
+
+for (it = g_list_first(secs); it; it = g_list_next(it)) {
+cand = (RomSec *) it->data;
+if (count == 0 && count + cand->se == 1) {
+size_t gap = cand->base - gapstart;
+if (gap > res.size) {
+res.base = gapstart;
+res.size = gap;
+}
+} else if (count == 1 && count + cand->se == 0) {
+gapstart = cand->base;
+}
+count += cand->se;
+}
+
+g_list_free_full(secs, g_free);
+return res;
+}
+
 /*
  * Copies memory from registered ROMs to dest. Any memory that is contained in
  * a ROM between addr and addr + size is copied. Note that this can involve
diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c
index 37963becae..7a51fd0737 100644
--- a/semihosting/arm-compat-semi.c
+++ b/semihosting/arm-compat-semi.c
@@ -44,6 +44,7 @@
 #define COMMON_SEMI_HEAP_SIZE (128 * 1024 * 1024)
 #else
 #include "qemu/cutils.h"
+#include "hw/loader.h"
 #ifdef TARGET_ARM
 #include "hw/arm/boot.h"
 #endif
@@ -144,33 +145,69 @@ typedef struct GuestFD {
 static GArray *guestfd_array;
 
 #ifndef CONFIG_USER_ONLY
-#include "exec/address-spaces.h"
-/*
- * Find the base of a RAM region containing the specified address
+
+/**
+ * common_semi_find_bases: find information about ram and heap base
+ *
+ * This function attempts to provide meaningful numbers for RAM and
+ * HEAP base addresses. The rambase is simply the lowest addressable
+ * RAM position. For the heapbase we ask the loa

[PULL 16/18] tests/tcg: completely disable threadcount for sh4

2022-03-01 Thread Alex Bennée
The previous disabling of threadcount 3bdc19af00 ("tests/tcg/sh4:
disable another unreliable test") just for plugins was being too
conservative. It's all broken so skip it.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220225172021.3493923-17-alex.ben...@linaro.org>

diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 620ccc23c1..35ebe6b4e3 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -20,5 +20,7 @@ run-plugin-linux-test-with-%:
$(call skip-test, $<, "BROKEN")
 
 # This test is currently unreliable: 
https://gitlab.com/qemu-project/qemu/-/issues/856
+run-threadcount:
+   $(call skip-test, $<, "BROKEN")
 run-plugin-threadcount-with-%:
$(call skip-test, $<, "BROKEN")
-- 
2.30.2




Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Ani Sinha



On Tue, 1 Mar 2022, Igor Mammedov wrote:

> On Mon, 28 Feb 2022 22:17:32 +0200
> Liav Albani  wrote:
>
> > This can allow the guest OS to determine more easily if i8042 controller
> > is present in the system or not, so it doesn't need to do probing of the
> > controller, but just initialize it immediately, before enumerating the
> > ACPI AML namespace.
> >
> > This change only applies to the x86/q35 machine type, as it uses FACP
> > ACPI table with revision higher than 1, which should implement at least
> > ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> > flags register according to the ACPI 2.0 specification.
> >
> > Signed-off-by: Liav Albani 
> > ---
> >  hw/acpi/aml-build.c | 11 ++-
> >  hw/i386/acpi-build.c|  9 +
> >  hw/i386/acpi-microvm.c  |  9 +
> commit message says it's q35 specific, so wy it touched microvm anc piix4?

Igor is correct. Although I see that currently there are no 8042 devices
for microvms, maybe we should be conservative and add the code to detect
the device anyway. In that case, the change could affect microvms too when
such devices get added in the future.


echo -e "info qtree\r\nquit\r\n" | ./qemu-system-x86_64 -machine microvm
-monitor stdio 2>/dev/null | grep 8042






[PULL 18/18] tests/tcg: port SYS_HEAPINFO to a system test

2022-03-01 Thread Alex Bennée
This allows us to check our new SYS_HEAPINFO implementation generates
sane values.

Signed-off-by: Alex Bennée 
Reviewed-by: Peter Maydell 
Message-Id: <20220225172021.3493923-19-alex.ben...@linaro.org>

diff --git a/tests/tcg/aarch64/system/semiheap.c 
b/tests/tcg/aarch64/system/semiheap.c
new file mode 100644
index 00..4ed258476d
--- /dev/null
+++ b/tests/tcg/aarch64/system/semiheap.c
@@ -0,0 +1,93 @@
+/*
+ * Semihosting System HEAPINFO Test
+ *
+ * Copyright (c) 2021 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+
+#define SYS_HEAPINFO0x16
+
+uintptr_t __semi_call(uintptr_t type, uintptr_t arg0)
+{
+register uintptr_t t asm("x0") = type;
+register uintptr_t a0 asm("x1") = arg0;
+asm("hlt 0xf000"
+: "=r" (t)
+: "r" (t), "r" (a0)
+: "memory" );
+
+return t;
+}
+
+int main(int argc, char *argv[argc])
+{
+struct {
+void *heap_base;
+void *heap_limit;
+void *stack_base;
+void *stack_limit;
+} info = { };
+void *ptr_to_info = (void *) &info;
+uint32_t *ptr_to_heap;
+int i;
+
+ml_printf("Semihosting Heap Info Test\n");
+
+__semi_call(SYS_HEAPINFO, (uintptr_t) &ptr_to_info);
+
+if (info.heap_base == NULL || info.heap_limit == NULL) {
+ml_printf("null heap: %p -> %p\n", info.heap_base, info.heap_limit);
+return -1;
+}
+
+/* Error if heap base is above limit */
+if ((uintptr_t) info.heap_base >= (uintptr_t) info.heap_limit) {
+ml_printf("heap base %p >= heap_limit %p\n",
+   info.heap_base, info.heap_limit);
+return -2;
+}
+
+if (info.stack_base == NULL) {
+ml_printf("null stack: %p -> %p\n", info.stack_base, info.stack_limit);
+return -3;
+}
+
+/*
+ * boot.S put our stack somewhere inside the data segment of the
+ * ELF file, and we know that SYS_HEAPINFO won't pick a range
+ * that overlaps with part of a loaded ELF file. So the info
+ * struct (on the stack) should not be inside the reported heap.
+ */
+if (ptr_to_info > info.heap_base && ptr_to_info < info.heap_limit) {
+ml_printf("info appears to be inside the heap: %p in %p:%p\n",
+   ptr_to_info, info.heap_base, info.heap_limit);
+return -4;
+}
+
+ml_printf("heap: %p -> %p\n", info.heap_base, info.heap_limit);
+ml_printf("stack: %p <- %p\n", info.stack_limit, info.stack_base);
+
+/* finally can we read/write the heap */
+ptr_to_heap = (uint32_t *) info.heap_base;
+for (i = 0; i < 512; i++) {
+*ptr_to_heap++ = i;
+}
+ptr_to_heap = (uint32_t *) info.heap_base;
+for (i = 0; i < 512; i++) {
+uint32_t tmp = *ptr_to_heap;
+if (tmp != i) {
+ml_printf("unexpected value in heap: %d @ %p", tmp, ptr_to_heap);
+return -5;
+}
+ptr_to_heap++;
+}
+ml_printf("r/w to heap upto %p\n", ptr_to_heap);
+
+ml_printf("Passed HeapInfo checks\n");
+return 0;
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index fa8adc2618..68adaac373 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3549,6 +3549,7 @@ S: Maintained
 F: semihosting/
 F: include/semihosting/
 F: tests/tcg/multiarch/arm-compat-semi/
+F: tests/tcg/aarch64/system/semiheap.c
 
 Multi-process QEMU
 M: Elena Ufimtseva 
-- 
2.30.2




Re: [PATCH v2 00/25] migration: Postcopy Preemption

2022-03-01 Thread Daniel P . Berrangé
On Tue, Mar 01, 2022 at 06:17:49PM +0800, Peter Xu wrote:
> On Tue, Mar 01, 2022 at 09:25:55AM +, Daniel P. Berrangé wrote:
> > On Tue, Mar 01, 2022 at 04:39:00PM +0800, Peter Xu wrote:
> > > This is v2 of postcopy preempt series.  It can also be found here:
> > > 
> > >   https://github.com/xzpeter/qemu/tree/postcopy-preempt
> > > 
> > > RFC: 
> > > https://lore.kernel.org/qemu-devel/20220119080929.39485-1-pet...@redhat.com
> > > V1:  
> > > https://lore.kernel.org/qemu-devel/20220216062809.57179-1-pet...@redhat.com
> > > 
> > > v1->v2 changelog:
> > > - Picked up more r-bs from Dave
> > > - Rename both fault threads to drop "qemu/" prefix [Dave]
> > > - Further rework on postcopy recovery, to be able to detect qemufile 
> > > errors
> > >   from either main channel or postcopy one [Dave]
> > > - shutdown() qemufile before close on src postcopy channel when postcopy 
> > > is
> > >   paused [Dave]
> > > - In postcopy_preempt_new_channel(), explicitly set the new channel in
> > >   blocking state, even if it's the default [Dave]
> > > - Make RAMState.postcopy_channel unsigned int [Dave]
> > > - Added patches:
> > >   - "migration: Create the postcopy preempt channel asynchronously"
> > >   - "migration: Parameter x-postcopy-preempt-break-huge"
> > >   - "migration: Add helpers to detect TLS capability"
> > >   - "migration: Fail postcopy preempt with TLS"
> > >   - "tests: Pass in MigrateStart** into test_migrate_start()"
> > > 
> > > Abstract
> > > 
> > > 
> > > This series added a new migration capability called "postcopy-preempt".  
> > > It can
> > > be enabled when postcopy is enabled, and it'll simply (but greatly) speed 
> > > up
> > > postcopy page requests handling process.
> > 
> > Is there no way we can just automatically enable this new feature, rather
> > than requiring apps to specify yet another new flag ?
> 
> I didn't make it the default for now, but I do have thought about making it
> the default when it consolidates a bit, perhaps on a new machine type.

Toggling based on machine type feels strange. If it needs a toggle that
implies it is not something that can be transparently enabled without
the app being aware of it. Given that toggling based on machine  type
would be inappropriate as existing apps expect to work with new QEMU.

> I also didn't know whether there's other limitations of it.  For example,
> will a new socket pair be a problem for any VM environment (either a
> limitation from the management app, container, and so on)?  I think it's
> the same to multifd in that aspect, but I never explored.

If it needs extra sockets that is something apps will need to be aware
of unfortunately and explicitly opt-in to :-( Migration is often
tunnelled/proxied over other channels, so whatever does that needs to
be aware of possibility of seeing extra sockets.

> > > TODO List
> > > =
> > > 
> > > TLS support
> > > ---
> > > 
> > > I only noticed its missing very recently.  Since soft freeze is coming, 
> > > and
> > > obviously I'm still growing this series, so I tend to have the existing
> > > material discussed. Let's see if it can still catch the train for QEMU 7.0
> > > release (soft freeze on 2022-03-08)..
> > 
> > I don't like the idea of shipping something that is only half finished.
> > It means that when apps probe for the feature, they'll see preempt
> > capability present, but have no idea whether they're using a QEMU that
> > is broken when combined with TLS or not. We shouldn't merge something
> > just to meet the soft freeze deadline if we know key features are broken.
> 
> IMHO merging and declaring support are two problems.
> 
> To me, it's always fine to merge the code that implemented the fundation of a
> feature.  The feature can be worked upon in the future.
> 
> Requiring a feature to be "complete" sometimes can cause burden to not only
> the author of the series but also reviewers.  It's IMHO not necessary to
> bind these two ideas.
> 
> It's sometimes also hard to define "complete": take the TLS as example, no
> one probably even noticed that it won't work with TLS and I just noticed it
> merely these two days..  We obviously can't merge partial patchset, but if
> the patchset is well isolated, then it's not a blocker for merging, imho.
> 
> Per my understanding, what you worried is when we declare it supported but
> later we never know when TLS will be ready for it.  One solution is I can
> rename the capability as x-, then after the TLS side ready I drop the x-
> prefix.  Then Libvirt or any mgmt software doesn't need to support this
> until we drop the x-, so there's no risk of compatibility.
> 
> Would that sound okay to you?

If it has an x- prefix then we can basically ignore it from a mgmt app
POV until it is actually finished.

> I can always step back and work on TLS first before it's merged, but again
> I don't think it's required.

Apps increasingly consider use of TLS to be a mandatory feature for
migration, so until that wor

Re: [PATCH v2 04/18] tests/docker: update debian-arm64-cross with lci-tool

2022-03-01 Thread Daniel P . Berrangé
On Mon, Feb 28, 2022 at 02:39:17PM +, Alex Bennée wrote:
> 
> Daniel P. Berrangé  writes:
> 
> > $SUBJECT  =~ s/lci-tool/lcitool/
> >
> > On Fri, Feb 25, 2022 at 05:20:07PM +, Alex Bennée wrote:
> >> Using lci-tool update debian-arm64-cross to a Debian 11 based system.
> >
> > Likewise
> >
> >> As a result we can drop debian-arm64-test-cross just for building
> >> tests.
> >> 
> >> Signed-off-by: Alex Bennée 
> >> Reviewed-by: Richard Henderson 
> >> Message-Id: <20220211160309.335014-5-alex.ben...@linaro.org>
> >> ---
> >>  .gitlab-ci.d/container-cross.yml  |  10 +-
> >>  tests/docker/Makefile.include |   3 -
> >>  .../dockerfiles/debian-arm64-cross.docker | 186 +++---
> >>  .../debian-arm64-test-cross.docker|  13 --
> >>  tests/lcitool/refresh |  11 ++
> >>  tests/tcg/configure.sh|   2 +-
> >>  6 files changed, 173 insertions(+), 52 deletions(-)
> >>  delete mode 100644 tests/docker/dockerfiles/debian-arm64-test-cross.docker
> >> 


> > This cross dockerfile is a fully self-contained image.
> >
> > Traditionally QEMU has had a split image for Debian cross targets,
> > where there is a base with common native packages, and then a
> > layer for the cross packages.
> >
> > lcitool is capable of generating the image in this split format
> > using the arg
> >
> >--layers {all,native,foreign}
> >
> > Personally I think it is simpler to just use the fully self
> > contained image, as it would simplify our gitlab pipeline
> > to only need 1 build stage for containers.  The cost is that
> > we'll not be sharing layers for native packages and more wall
> > clock time building since we're installing the same native
> > packages over & over.
> >
> > I'm not saying to change your patch, I just wanted to point
> > out the possibility in case someone cares strongly about
> > keeping a split layer model for cross containers.
> 
> My thinking on our layered approach has evolved over the years. One of
> the problems is when the two layers get out of sync and you run into
> build issues due to different states of cached layers.

Oh, I'd not even thought about that possibility but yes, it makes
sense. We could have cached the base layer and when we do an
'apt-get update' in the cross layer we'll end up pulling in new
copies of packages otherwise present in the base layer, partly
defeating the point of having two layers.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 08/14] util: Add iova_tree_alloc

2022-03-01 Thread Eugenio Perez Martin
On Mon, Feb 28, 2022 at 7:39 AM Jason Wang  wrote:
>
>
> 在 2022/2/27 下午9:41, Eugenio Pérez 写道:
> > This iova tree function allows it to look for a hole in allocated
> > regions and return a totally new translation for a given translated
> > address.
> >
> > It's usage is mainly to allow devices to access qemu address space,
> > remapping guest's one into a new iova space where qemu can add chunks of
> > addresses.
> >
> > Signed-off-by: Eugenio Pérez 
> > Reviewed-by: Peter Xu 
> > ---
> >   include/qemu/iova-tree.h |  18 ++
> >   util/iova-tree.c | 133 +++
> >   2 files changed, 151 insertions(+)
> >
> > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > index 8249edd764..a623136cd8 100644
> > --- a/include/qemu/iova-tree.h
> > +++ b/include/qemu/iova-tree.h
> > @@ -29,6 +29,7 @@
> >   #define  IOVA_OK   (0)
> >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > +#define  IOVA_ERR_NOMEM(-3) /* Cannot allocate */
> >
> >   typedef struct IOVATree IOVATree;
> >   typedef struct DMAMap {
> > @@ -119,6 +120,23 @@ const DMAMap *iova_tree_find_address(const IOVATree 
> > *tree, hwaddr iova);
> >*/
> >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> >
> > +/**
> > + * iova_tree_alloc:
>
>
> Should be iova_tree_alloc_map.
>

That's right, I'll change. It's also missing from the patch subject.

>
> > + *
> > + * @tree: the iova tree to allocate from
> > + * @map: the new map (as translated addr & size) to allocate in the iova 
> > region
> > + * @iova_begin: the minimum address of the allocation
> > + * @iova_end: the maximum addressable direction of the allocation
> > + *
> > + * Allocates a new region of a given size, between iova_min and iova_max.
> > + *
> > + * Return: Same as iova_tree_insert, but cannot overlap and can return 
> > error if
> > + * iova tree is out of free contiguous range. The caller gets the assigned 
> > iova
> > + * in map->iova.
> > + */
> > +int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +hwaddr iova_end);
> > +
> >   /**
> >* iova_tree_destroy:
> >*
> > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > index 23ea35b7a4..302b01f1cc 100644
> > --- a/util/iova-tree.c
> > +++ b/util/iova-tree.c
> > @@ -16,6 +16,39 @@ struct IOVATree {
> >   GTree *tree;
> >   };
> >
> > +/* Args to pass to iova_tree_alloc foreach function. */
> > +struct IOVATreeAllocArgs {
> > +/* Size of the desired allocation */
> > +size_t new_size;
> > +
> > +/* The minimum address allowed in the allocation */
> > +hwaddr iova_begin;
> > +
> > +/* Map at the left of the hole, can be NULL if "this" is first one */
> > +const DMAMap *prev;
> > +
> > +/* Map at the right of the hole, can be NULL if "prev" is the last one 
> > */
> > +const DMAMap *this;
> > +
> > +/* If found, we fill in the IOVA here */
> > +hwaddr iova_result;
> > +
> > +/* Whether have we found a valid IOVA */
> > +bool iova_found;
> > +};
> > +
> > +/**
> > + * Iterate args to the next hole
> > + *
> > + * @args  The alloc arguments
> > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > + */
> > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > + const DMAMap *next) {
> > +args->prev = args->this;
> > +args->this = next;
> > +}
> > +
> >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer 
> > data)
> >   {
> >   const DMAMap *m1 = a, *m2 = b;
> > @@ -107,6 +140,106 @@ int iova_tree_remove(IOVATree *tree, const DMAMap 
> > *map)
> >   return IOVA_OK;
> >   }
> >
> > +/**
> > + * Try to find an unallocated IOVA range between prev and this elements.
> > + *
> > + * @args Arguments to allocation
> > + *
> > + * Cases:
> > + *
> > + * (1) !prev, !this: No entries allocated, always succeed
> > + *
> > + * (2) !prev, this: We're iterating at the 1st element.
> > + *
> > + * (3) prev, !this: We're iterating at the last element.
> > + *
> > + * (4) prev, this: this is the most common case, we'll try to find a hole
> > + * between "prev" and "this" mapping.
> > + *
> > + * Note that this function assumes the last valid iova is HWADDR_MAX, but 
> > it
> > + * searches linearly so it's easy to discard the result if it's not the 
> > case.
> > + */
> > +static void iova_tree_alloc_map_in_hole(struct IOVATreeAllocArgs *args)
> > +{
> > +const DMAMap *prev = args->prev, *this = args->this;
> > +uint64_t hole_start, hole_last;
> > +
> > +if (this && this->iova + this->size < args->iova_begin) {
> > +return;
> > +}
> > +
> > +hole_start = MAX(prev ? prev->iova + prev->size + 1 : 0, 
> > args->iova_begin);
> > +hole_last = this ? this->iova : HWADDR_MAX;
>
>
> Do we need to use iova_last instead of HWADDR_M

Re: [PATCH v2 00/25] migration: Postcopy Preemption

2022-03-01 Thread Peter Xu
On Tue, Mar 01, 2022 at 09:25:55AM +, Daniel P. Berrangé wrote:
> On Tue, Mar 01, 2022 at 04:39:00PM +0800, Peter Xu wrote:
> > This is v2 of postcopy preempt series.  It can also be found here:
> > 
> >   https://github.com/xzpeter/qemu/tree/postcopy-preempt
> > 
> > RFC: 
> > https://lore.kernel.org/qemu-devel/20220119080929.39485-1-pet...@redhat.com
> > V1:  
> > https://lore.kernel.org/qemu-devel/20220216062809.57179-1-pet...@redhat.com
> > 
> > v1->v2 changelog:
> > - Picked up more r-bs from Dave
> > - Rename both fault threads to drop "qemu/" prefix [Dave]
> > - Further rework on postcopy recovery, to be able to detect qemufile errors
> >   from either main channel or postcopy one [Dave]
> > - shutdown() qemufile before close on src postcopy channel when postcopy is
> >   paused [Dave]
> > - In postcopy_preempt_new_channel(), explicitly set the new channel in
> >   blocking state, even if it's the default [Dave]
> > - Make RAMState.postcopy_channel unsigned int [Dave]
> > - Added patches:
> >   - "migration: Create the postcopy preempt channel asynchronously"
> >   - "migration: Parameter x-postcopy-preempt-break-huge"
> >   - "migration: Add helpers to detect TLS capability"
> >   - "migration: Fail postcopy preempt with TLS"
> >   - "tests: Pass in MigrateStart** into test_migrate_start()"
> > 
> > Abstract
> > 
> > 
> > This series added a new migration capability called "postcopy-preempt".  It 
> > can
> > be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
> > postcopy page requests handling process.
> 
> Is there no way we can just automatically enable this new feature, rather
> than requiring apps to specify yet another new flag ?

I didn't make it the default for now, but I do have thought about making it
the default when it consolidates a bit, perhaps on a new machine type.

I also didn't know whether there's other limitations of it.  For example,
will a new socket pair be a problem for any VM environment (either a
limitation from the management app, container, and so on)?  I think it's
the same to multifd in that aspect, but I never explored.

> 
> > TODO List
> > =
> > 
> > TLS support
> > ---
> > 
> > I only noticed its missing very recently.  Since soft freeze is coming, and
> > obviously I'm still growing this series, so I tend to have the existing
> > material discussed. Let's see if it can still catch the train for QEMU 7.0
> > release (soft freeze on 2022-03-08)..
> 
> I don't like the idea of shipping something that is only half finished.
> It means that when apps probe for the feature, they'll see preempt
> capability present, but have no idea whether they're using a QEMU that
> is broken when combined with TLS or not. We shouldn't merge something
> just to meet the soft freeze deadline if we know key features are broken.

IMHO merging and declaring support are two problems.

To me, it's always fine to merge the code that implemented the fundation of a
feature.  The feature can be worked upon in the future.

Requiring a feature to be "complete" sometimes can cause burden to not only
the author of the series but also reviewers.  It's IMHO not necessary to
bind these two ideas.

It's sometimes also hard to define "complete": take the TLS as example, no
one probably even noticed that it won't work with TLS and I just noticed it
merely these two days..  We obviously can't merge partial patchset, but if
the patchset is well isolated, then it's not a blocker for merging, imho.

Per my understanding, what you worried is when we declare it supported but
later we never know when TLS will be ready for it.  One solution is I can
rename the capability as x-, then after the TLS side ready I drop the x-
prefix.  Then Libvirt or any mgmt software doesn't need to support this
until we drop the x-, so there's no risk of compatibility.

Would that sound okay to you?

I can always step back and work on TLS first before it's merged, but again
I don't think it's required.

> 
> > Multi-channel for preemption threads
> > 
> > 
> > Currently the postcopy preempt feature use only one extra channel and one
> > extra thread on dest (no new thread on src QEMU).  It should be mostly good
> > enough for major use cases, but when the postcopy queue is long enough
> > (e.g. hundreds of vCPUs faulted on different pages) logically we could
> > still observe more delays in average.  Whether growing threads/channels can
> > solve it is debatable, but sounds worthwhile a try.  That's yet another
> > thing we can think about after this patchset lands.
> 
> If we don't think about it upfront, then we'll possibly end up with
> yet another tunable flag that apps have to worry about. It also
> could make migration code even more complex if we have to support
> two different scenarios. If we think multiple threads are goign to
> be a benefit lets check that and if so, design it into the exposed
> application facing interface from the 

Re: [PATCH v7 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-03-01 Thread Thomas Huth

On 28/02/2022 19.31, David Miller wrote:

Had it on my TODO list for this morning, thank you.


Thanks! Please send it as additional patch on top of my s390x-next, since I 
already sent a pull request for the other patches yesterday:


 https://gitlab.com/thuth/qemu/-/commits/s390x-next/

On Mon, Feb 28, 2022 at 12:59 PM Richard Henderson 
mailto:richard.hender...@linaro.org>> wrote:


On 2/28/22 00:14, Thomas Huth wrote:
 > Full patch can be seen here:
 >
 > https://gitlab.com/thuth/qemu/-/commit/38af118ea2fef0c473



 > static inline void mvcrl_8(const char *dst, const char *src)
 > {
 >     asm volatile (
 >     "llill %%r0, 8\n"
 >     ".insn sse, 0xE50A, 0(%[dst]), 0(%[src])"
 >     : : [dst] "d" (dst), [src] "d" (src)
 >     : "memory");
 > }

Need clobber of r0 here.


Right. This test fails with Clang, indeed, as I discovered today, since 
Clang uses r0 more often than GCC, as it seems. I've already sent some 
patches for some other tests today, so there'll be another s390x pull 
request next week for TCG tests fixups :-)


 Thomas



 > #define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
 > {                            \
 >     uint64_t res = 0;        \
 >     asm (                    \
 >          "lg %%r2, %[a]\n"   \
 >          "lg %%r3, %[b]\n"   \
 >          "lg %%r0, %[c]\n"   \
 >          "ltgr %%r0, %%r0\n" \
 >          ASM                 \
 >          "stg %%r0, %[res] " \
 >          : [res] "=m" (res)  \
 >          : [a] "m" (a),      \
 >            [b] "m" (b),      \
 >            [c] "m" (c)       \
 >          : "r0", "r2",       \
 >            "r3", "r4"        \
 >     );                       \
 >     return res;              \
 > }
 >
 > Fi3 (_selre,     ".insn rrf, 0xB9F0, %%r0, %%r3, %%r2, 8\n")
 > Fi3 (_selgrz,    ".insn rrf, 0xB9E3, %%r0, %%r3, %%r2, 8\n")
 > Fi3 (_selfhrnz,  ".insn rrf, 0xB9C0, %%r0, %%r3, %%r2, 7\n")

This isn't actively broken, but could use the same treatment as NCRK et al:

#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
{                            \
      uint64_t res;            \
      asm("ltgr %[c], %[c]\n\t" ASM
          : [res] "=&r" (res)
          : [a] "r" (a), [b] "r" (b), [c] "r" (c)
          : "cc");
      return res;
}

Fi3(_selre,   ".insn rrf, 0xB9F0, %[res], %[a], %[b], 8")

etc.


r~






Re: [PATCH 0/5] block layer: permission API refactoring in preparation

2022-03-01 Thread Kevin Wolf
Am 09.02.2022 um 11:54 hat Emanuele Giuseppe Esposito geschrieben:
> This serie aims to refactoring and fixing permission API related bugs that 
> came
> up in the serie "block layer: split block APIs in global state and I/O".
> In that serie, we are splitting all block layer headers in
> Global State (GS) APIs, holding always the BQL and running in the
> main loop, and I/O running in iothreads.
> 
> The patches in this serie are taken from v6 of the API split,
> to reduce its size and apply these fixes independently.
> 
> Patches 1 and 2 take care of crypto and amend jobs, since they
> incorrectly use the permission API also in iothreads.
> Patches 3-4-5 take care of bdrv_invalidate_cache and callers,
> since this function checks too for permisisons while being
> called by an iothread.

Thanks, applied to the block branch.

Kevin




Re: [PULL 0/7] aspeed queue

2022-03-01 Thread Peter Maydell
On Mon, 28 Feb 2022 at 07:12, Cédric Le Goater  wrote:
>
> The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c:
>
>   Merge remote-tracking branch 
> 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 
> 09:57:11 +)
>
> are available in the Git repository at:
>
>   https://github.com/legoater/qemu/ tags/pull-aspeed-20220227
>
> for you to fetch changes up to 3671342a38f21316a2bda62e7d607bbaedd60fd8:
>
>   aspeed/sdmc: Add trace events (2022-02-26 18:40:51 +0100)
>
> 
> aspeed queue:
>
> * Removal of the swift-bmc machine
> * New Secure Boot Controller model
> * Improvements on the rainier machine
> * Various small cleanups
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



[PATCH v2 0/6] hw/nvme: enhanced protection information (64-bit guard)

2022-03-01 Thread Klaus Jensen
From: Klaus Jensen 

This adds support for one possible new protection information format
introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard
and 48-bit reference tag. This version does not support storage tags.

Like the CRC16 support already present, this uses a software
implementation of CRC64 (so it is naturally pretty slow). But its good
enough for verification purposes.

This goes hand-in-hand with the support that Keith submitted for the
Linux kernel[1].

  [1]: 
https://lore.kernel.org/linux-nvme/20220201190128.3075065-1-kbu...@kernel.org/

Changes since v1

- Check metadata size depending on pi guard type selected. (Keith)

Klaus Jensen (3):
  hw/nvme: move dif/pi prototypes into dif.h
  hw/nvme: move format parameter parsing
  hw/nvme: add pi tuple size helper

Naveen Nagar (3):
  hw/nvme: add host behavior support feature
  hw/nvme: add support for the lbafee hbs feature
  hw/nvme: 64-bit pi support

 hw/nvme/ctrl.c   | 235 +--
 hw/nvme/dif.c| 378 +--
 hw/nvme/dif.h| 191 ++
 hw/nvme/ns.c |  50 --
 hw/nvme/nvme.h   |  58 +--
 hw/nvme/trace-events |  12 +-
 include/block/nvme.h |  81 --
 7 files changed, 793 insertions(+), 212 deletions(-)
 create mode 100644 hw/nvme/dif.h

-- 
2.35.1




[PATCH v2 1/6] hw/nvme: move dif/pi prototypes into dif.h

2022-03-01 Thread Klaus Jensen
From: Klaus Jensen 

Move dif/pi data structures and inlines to dif.h.

Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c |  1 +
 hw/nvme/dif.c  |  1 +
 hw/nvme/dif.h  | 53 ++
 hw/nvme/nvme.h | 50 ---
 4 files changed, 55 insertions(+), 50 deletions(-)
 create mode 100644 hw/nvme/dif.h

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 98aac98bef5f..d08af3bdc1a2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -163,6 +163,7 @@
 #include "migration/vmstate.h"
 
 #include "nvme.h"
+#include "dif.h"
 #include "trace.h"
 
 #define NVME_MAX_IOQPAIRS 0x
diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index 5dbd18b2a4a5..cd0cea2b5ebd 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -13,6 +13,7 @@
 #include "sysemu/block-backend.h"
 
 #include "nvme.h"
+#include "dif.h"
 #include "trace.h"
 
 uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint8_t prinfo, uint64_t slba,
diff --git a/hw/nvme/dif.h b/hw/nvme/dif.h
new file mode 100644
index ..e36fea30e71e
--- /dev/null
+++ b/hw/nvme/dif.h
@@ -0,0 +1,53 @@
+#ifndef HW_NVME_DIF_H
+#define HW_NVME_DIF_H
+
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static const uint16_t t10_dif_crc_table[256] = {
+0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
+uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint8_t prinfo, uint64_t slba,
+   uint32_t reftag);
+uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf, size_t mlen,
+   uint64_t slba);
+void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf, size_t len,
+ uint8_t *mbuf, size_t mlen, uint16_t apptag,
+ uint32_t *reftag);
+uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, size_t len,
+uint8_t *mbuf, size_t mlen, uint8_t prinfo,
+uint64_t slba, uint16_t apptag,
+uint16_t appmask, uint32_t *reftag);
+uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req);
+
+#endif /* HW_NVME_DIF_H */
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 90c0bb7ce236..801176a2bd5e 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -513,54 +513,4 @@ void nvme_rw_complete_cb(void *opaque, int ret);
 uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
NvmeCmd *cmd);
 
-/* from Linux kernel (crypto/crct10dif_common.c) */
-static const uint16_t t10_dif_crc_table[256] = {
-0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
-0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
-0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
-0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
-0xA99A

[PATCH v2 5/6] hw/nvme: add pi tuple size helper

2022-03-01 Thread Klaus Jensen
From: Klaus Jensen 

A subsequent patch will introduce a new tuple size; so add a helper and
use that instead of sizeof() and magic numbers.

Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 14 --
 hw/nvme/dif.c  | 16 
 hw/nvme/dif.h  |  5 +
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 52ab3450b975..f1683960b87e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1068,7 +1068,8 @@ static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, 
NvmeRequest *req)
 size_t len = nvme_l2b(ns, nlb);
 uint16_t status;
 
-if (nvme_ns_ext(ns) && !(pi && pract && ns->lbaf.ms == 8)) {
+if (nvme_ns_ext(ns) &&
+!(pi && pract && ns->lbaf.ms == nvme_pi_tuple_size(ns))) {
 NvmeSg sg;
 
 len += nvme_m2b(ns, nlb);
@@ -1247,7 +1248,8 @@ uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, 
uint32_t len,
 bool pi = !!NVME_ID_NS_DPS_TYPE(ns->id_ns.dps);
 bool pract = !!(le16_to_cpu(rw->control) & NVME_RW_PRINFO_PRACT);
 
-if (nvme_ns_ext(ns) && !(pi && pract && ns->lbaf.ms == 8)) {
+if (nvme_ns_ext(ns) &&
+!(pi && pract && ns->lbaf.ms == nvme_pi_tuple_size(ns))) {
 return nvme_tx_interleaved(n, &req->sg, ptr, len, ns->lbasz,
ns->lbaf.ms, 0, dir);
 }
@@ -2184,7 +2186,7 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
  * tuple.
  */
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - nvme_pi_tuple_size(ns);
 }
 
 for (bufp = buf; mbufp < end; bufp += ns->lbaf.ms, mbufp += 
ns->lbaf.ms) {
@@ -3167,7 +3169,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
 if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
 bool pract = prinfo & NVME_PRINFO_PRACT;
 
-if (pract && ns->lbaf.ms == 8) {
+if (pract && ns->lbaf.ms == nvme_pi_tuple_size(ns)) {
 mapped_size = data_size;
 }
 }
@@ -3244,7 +3246,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
 bool pract = prinfo & NVME_PRINFO_PRACT;
 
-if (pract && ns->lbaf.ms == 8) {
+if (pract && ns->lbaf.ms == nvme_pi_tuple_size(ns)) {
 mapped_size -= nvme_m2b(ns, nlb);
 }
 }
@@ -5553,7 +,7 @@ static uint16_t nvme_format_check(NvmeNamespace *ns, 
uint8_t lbaf, uint8_t pi)
 return NVME_INVALID_FORMAT | NVME_DNR;
 }
 
-if (pi && (ns->id_ns.lbaf[lbaf].ms < sizeof(NvmeDifTuple))) {
+if (pi && (ns->id_ns.lbaf[lbaf].ms < nvme_pi_tuple_size(ns))) {
 return NVME_INVALID_FORMAT | NVME_DNR;
 }
 
diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index cd0cea2b5ebd..891385f33f20 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -48,7 +48,7 @@ void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t 
*buf, size_t len,
 int16_t pil = 0;
 
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - nvme_pi_tuple_size(ns);
 }
 
 trace_pci_nvme_dif_pract_generate_dif(len, ns->lbasz, ns->lbasz + pil,
@@ -145,7 +145,7 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, 
size_t len,
 }
 
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - nvme_pi_tuple_size(ns);
 }
 
 trace_pci_nvme_dif_check(prinfo, ns->lbasz + pil);
@@ -184,7 +184,7 @@ uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t 
*mbuf, size_t mlen,
 
 
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - nvme_pi_tuple_size(ns);
 }
 
 do {
@@ -210,7 +210,7 @@ uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t 
*mbuf, size_t mlen,
 end = mbufp + mlen;
 
 for (; mbufp < end; mbufp += ns->lbaf.ms) {
-memset(mbufp + pil, 0xff, sizeof(NvmeDifTuple));
+memset(mbufp + pil, 0xff, nvme_pi_tuple_size(ns));
 }
 }
 
@@ -284,7 +284,7 @@ static void nvme_dif_rw_check_cb(void *opaque, int ret)
 goto out;
 }
 
-if (prinfo & NVME_PRINFO_PRACT && ns->lbaf.ms == 8) {
+if (prinfo & NVME_PRINFO_PRACT && ns->lbaf.ms == nvme_pi_tuple_size(ns)) {
 goto out;
 }
 
@@ -388,7 +388,7 @@ uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req)
 
 if (pract) {
 uint8_t *mbuf, *end;
-int16_t pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
+int16_t pil = ns->lbaf.ms - nvme_pi_tuple_size(ns);
 
 status = nvme_check_prinfo(ns, prinfo, slba, reftag);
 if (status) {
@@ -428,7 +428,7 @@ uint16_t nvme_dif_rw(NvmeCtrl *n, 

[PATCH v2 6/6] hw/nvme: 64-bit pi support

2022-03-01 Thread Klaus Jensen
From: Naveen Nagar 

This adds support for one possible new protection information format
introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard
and 48-bit reference tag. This version does not support storage tags.

Like the CRC16 support already present, this uses a software
implementation of CRC64 (so it is naturally pretty slow). But its good
enough for verification purposes.

This may go nicely hand-in-hand with the support that Keith submitted
for the Linux kernel[1].

  [1]: 
https://lore.kernel.org/linux-nvme/20220126165214.ga1782...@dhcp-10-100-145-180.wdc.com/T/

Signed-off-by: Naveen Nagar 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 163 +++
 hw/nvme/dif.c| 363 +--
 hw/nvme/dif.h| 143 -
 hw/nvme/ns.c |  35 -
 hw/nvme/nvme.h   |   3 +
 hw/nvme/trace-events |  12 +-
 include/block/nvme.h |  67 ++--
 7 files changed, 648 insertions(+), 138 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f1683960b87e..03760ddeae8c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -2050,9 +2050,12 @@ static void nvme_verify_cb(void *opaque, int ret)
 uint8_t prinfo = NVME_RW_PRINFO(le16_to_cpu(rw->control));
 uint16_t apptag = le16_to_cpu(rw->apptag);
 uint16_t appmask = le16_to_cpu(rw->appmask);
-uint32_t reftag = le32_to_cpu(rw->reftag);
+uint64_t reftag = le32_to_cpu(rw->reftag);
+uint64_t cdw3 = le32_to_cpu(rw->cdw3);
 uint16_t status;
 
+reftag |= cdw3 << 32;
+
 trace_pci_nvme_verify_cb(nvme_cid(req), prinfo, apptag, appmask, reftag);
 
 if (ret) {
@@ -2141,7 +2144,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 uint8_t prinfo = NVME_RW_PRINFO(le16_to_cpu(rw->control));
 uint16_t apptag = le16_to_cpu(rw->apptag);
 uint16_t appmask = le16_to_cpu(rw->appmask);
-uint32_t reftag = le32_to_cpu(rw->reftag);
+uint64_t reftag = le32_to_cpu(rw->reftag);
+uint64_t cdw3 = le32_to_cpu(rw->cdw3);
 struct nvme_compare_ctx *ctx = req->opaque;
 g_autofree uint8_t *buf = NULL;
 BlockBackend *blk = ns->blkconf.blk;
@@ -2149,6 +2153,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 BlockAcctStats *stats = blk_get_stats(blk);
 uint16_t status = NVME_SUCCESS;
 
+reftag |= cdw3 << 32;
+
 trace_pci_nvme_compare_mdata_cb(nvme_cid(req));
 
 if (ret) {
@@ -2527,7 +2533,8 @@ typedef struct NvmeCopyAIOCB {
 QEMUBH *bh;
 int ret;
 
-NvmeCopySourceRange *ranges;
+void *ranges;
+unsigned int format;
 int nr;
 int idx;
 
@@ -2538,7 +2545,7 @@ typedef struct NvmeCopyAIOCB {
 BlockAcctCookie write;
 } acct;
 
-uint32_t reftag;
+uint64_t reftag;
 uint64_t slba;
 
 NvmeZone *zone;
@@ -2592,13 +2599,101 @@ static void nvme_copy_bh(void *opaque)
 
 static void nvme_copy_cb(void *opaque, int ret);
 
+static void nvme_copy_source_range_parse_format0(void *ranges, int idx,
+ uint64_t *slba, uint32_t *nlb,
+ uint16_t *apptag,
+ uint16_t *appmask,
+ uint64_t *reftag)
+{
+NvmeCopySourceRangeFormat0 *_ranges = ranges;
+
+if (slba) {
+*slba = le64_to_cpu(_ranges[idx].slba);
+}
+
+if (nlb) {
+*nlb = le16_to_cpu(_ranges[idx].nlb) + 1;
+}
+
+if (apptag) {
+*apptag = le16_to_cpu(_ranges[idx].apptag);
+}
+
+if (appmask) {
+*appmask = le16_to_cpu(_ranges[idx].appmask);
+}
+
+if (reftag) {
+*reftag = le32_to_cpu(_ranges[idx].reftag);
+}
+}
+
+static void nvme_copy_source_range_parse_format1(void *ranges, int idx,
+ uint64_t *slba, uint32_t *nlb,
+ uint16_t *apptag,
+ uint16_t *appmask,
+ uint64_t *reftag)
+{
+NvmeCopySourceRangeFormat1 *_ranges = ranges;
+
+if (slba) {
+*slba = le64_to_cpu(_ranges[idx].slba);
+}
+
+if (nlb) {
+*nlb = le16_to_cpu(_ranges[idx].nlb) + 1;
+}
+
+if (apptag) {
+*apptag = le16_to_cpu(_ranges[idx].apptag);
+}
+
+if (appmask) {
+*appmask = le16_to_cpu(_ranges[idx].appmask);
+}
+
+if (reftag) {
+*reftag = 0;
+
+*reftag |= (uint64_t)_ranges[idx].sr[4] << 40;
+*reftag |= (uint64_t)_ranges[idx].sr[5] << 32;
+*reftag |= (uint64_t)_ranges[idx].sr[6] << 24;
+*reftag |= (uint64_t)_ranges[idx].sr[7] << 16;
+*reftag |= (uint64_t)_ranges[idx].sr[8] << 8;
+*reftag |= (uint64_t)_ranges[idx].sr[9];
+}
+}
+
+static void nvme_copy_source_range_parse(void *ranges, int idx, uint8_t format,
+ uint64_t

[PATCH v2 3/6] hw/nvme: move format parameter parsing

2022-03-01 Thread Klaus Jensen
From: Klaus Jensen 

There is no need to extract the format command parameters for each
namespace. Move it to the entry point.

Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 71c60482c75f..d8701ebf2fa8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5452,6 +5452,11 @@ typedef struct NvmeFormatAIOCB {
 uint32_t nsid;
 bool broadcast;
 int64_t offset;
+
+uint8_t lbaf;
+uint8_t mset;
+uint8_t pi;
+uint8_t pil;
 } NvmeFormatAIOCB;
 
 static void nvme_format_bh(void *opaque);
@@ -5471,14 +5476,9 @@ static const AIOCBInfo nvme_format_aiocb_info = {
 .get_aio_context = nvme_get_aio_context,
 };
 
-static void nvme_format_set(NvmeNamespace *ns, NvmeCmd *cmd)
+static void nvme_format_set(NvmeNamespace *ns, uint8_t lbaf, uint8_t mset,
+uint8_t pi, uint8_t pil)
 {
-uint32_t dw10 = le32_to_cpu(cmd->cdw10);
-uint8_t lbaf = dw10 & 0xf;
-uint8_t pi = (dw10 >> 5) & 0x7;
-uint8_t mset = (dw10 >> 4) & 0x1;
-uint8_t pil = (dw10 >> 8) & 0x1;
-
 trace_pci_nvme_format_set(ns->params.nsid, lbaf, mset, pi, pil);
 
 ns->id_ns.dps = (pil << 3) | pi;
@@ -5490,7 +5490,6 @@ static void nvme_format_set(NvmeNamespace *ns, NvmeCmd 
*cmd)
 static void nvme_format_ns_cb(void *opaque, int ret)
 {
 NvmeFormatAIOCB *iocb = opaque;
-NvmeRequest *req = iocb->req;
 NvmeNamespace *ns = iocb->ns;
 int bytes;
 
@@ -5512,7 +5511,7 @@ static void nvme_format_ns_cb(void *opaque, int ret)
 return;
 }
 
-nvme_format_set(ns, &req->cmd);
+nvme_format_set(ns, iocb->lbaf, iocb->mset, iocb->pi, iocb->pil);
 ns->status = 0x0;
 iocb->ns = NULL;
 iocb->offset = 0;
@@ -5548,9 +5547,6 @@ static void nvme_format_bh(void *opaque)
 NvmeFormatAIOCB *iocb = opaque;
 NvmeRequest *req = iocb->req;
 NvmeCtrl *n = nvme_ctrl(req);
-uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
-uint8_t lbaf = dw10 & 0xf;
-uint8_t pi = (dw10 >> 5) & 0x7;
 uint16_t status;
 int i;
 
@@ -5572,7 +5568,7 @@ static void nvme_format_bh(void *opaque)
 goto done;
 }
 
-status = nvme_format_check(iocb->ns, lbaf, pi);
+status = nvme_format_check(iocb->ns, iocb->lbaf, iocb->pi);
 if (status) {
 req->status = status;
 goto done;
@@ -5595,6 +5591,11 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest 
*req)
 {
 NvmeFormatAIOCB *iocb;
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+uint8_t lbaf = dw10 & 0xf;
+uint8_t mset = (dw10 >> 4) & 0x1;
+uint8_t pi = (dw10 >> 5) & 0x7;
+uint8_t pil = (dw10 >> 8) & 0x1;
 uint16_t status;
 
 iocb = qemu_aio_get(&nvme_format_aiocb_info, NULL, nvme_misc_cb, req);
@@ -5604,6 +5605,10 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest 
*req)
 iocb->ret = 0;
 iocb->ns = NULL;
 iocb->nsid = 0;
+iocb->lbaf = lbaf;
+iocb->mset = mset;
+iocb->pi = pi;
+iocb->pil = pil;
 iocb->broadcast = (nsid == NVME_NSID_BROADCAST);
 iocb->offset = 0;
 
-- 
2.35.1




[PATCH v2 2/6] hw/nvme: add host behavior support feature

2022-03-01 Thread Klaus Jensen
From: Naveen Nagar 

Add support for getting and setting the Host Behavior Support feature.

Reviewed-by: Keith Busch 
Signed-off-by: Naveen Nagar 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 8 
 hw/nvme/nvme.h   | 4 +++-
 include/block/nvme.h | 9 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index d08af3bdc1a2..71c60482c75f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -196,6 +196,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
 [NVME_WRITE_ATOMICITY]  = true,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = true,
 [NVME_TIMESTAMP]= true,
+[NVME_HOST_BEHAVIOR_SUPPORT]= true,
 [NVME_COMMAND_SET_PROFILE]  = true,
 };
 
@@ -206,6 +207,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
+[NVME_HOST_BEHAVIOR_SUPPORT]= NVME_FEAT_CAP_CHANGE,
 [NVME_COMMAND_SET_PROFILE]  = NVME_FEAT_CAP_CHANGE,
 };
 
@@ -5091,6 +5093,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest 
*req)
 goto out;
 case NVME_TIMESTAMP:
 return nvme_get_feature_timestamp(n, req);
+case NVME_HOST_BEHAVIOR_SUPPORT:
+return nvme_c2h(n, (uint8_t *)&n->features.hbs,
+sizeof(n->features.hbs), req);
 default:
 break;
 }
@@ -5281,6 +5286,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 break;
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, req);
+case NVME_HOST_BEHAVIOR_SUPPORT:
+return nvme_h2c(n, (uint8_t *)&n->features.hbs,
+sizeof(n->features.hbs), req);
 case NVME_COMMAND_SET_PROFILE:
 if (dw11 & 0x1ff) {
 trace_pci_nvme_err_invalid_iocsci(dw11 & 0x1ff);
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 801176a2bd5e..103407038e74 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -468,7 +468,9 @@ typedef struct NvmeCtrl {
 uint16_t temp_thresh_hi;
 uint16_t temp_thresh_low;
 };
-uint32_tasync_config;
+
+uint32_tasync_config;
+NvmeHostBehaviorSupport hbs;
 } features;
 } NvmeCtrl;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index cd068ac89142..e527c728f975 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1216,6 +1216,7 @@ enum NvmeFeatureIds {
 NVME_WRITE_ATOMICITY= 0xa,
 NVME_ASYNCHRONOUS_EVENT_CONF= 0xb,
 NVME_TIMESTAMP  = 0xe,
+NVME_HOST_BEHAVIOR_SUPPORT  = 0x16,
 NVME_COMMAND_SET_PROFILE= 0x19,
 NVME_SOFTWARE_PROGRESS_MARKER   = 0x80,
 NVME_FID_MAX= 0x100,
@@ -1257,6 +1258,13 @@ typedef struct QEMU_PACKED NvmeRangeType {
 uint8_t rsvd48[16];
 } NvmeRangeType;
 
+typedef struct NvmeHostBehaviorSupport {
+uint8_t acre;
+uint8_t etdas;
+uint8_t lbafee;
+uint8_t rsvd3[509];
+} NvmeHostBehaviorSupport;
+
 typedef struct QEMU_PACKED NvmeLBAF {
 uint16_tms;
 uint8_t ds;
@@ -1520,6 +1528,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCopyCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeHostBehaviorSupport) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
-- 
2.35.1




[PATCH v2 4/6] hw/nvme: add support for the lbafee hbs feature

2022-03-01 Thread Klaus Jensen
From: Naveen Nagar 

Add support for up to 64 LBA formats through the LBAFEE field of the
Host Behavior Support feature.

Reviewed-by: Keith Busch 
Signed-off-by: Naveen Nagar 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 34 +++---
 hw/nvme/ns.c | 15 +--
 hw/nvme/nvme.h   |  1 +
 include/block/nvme.h |  7 +--
 4 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index d8701ebf2fa8..52ab3450b975 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5165,6 +5165,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 uint32_t nsid = le32_to_cpu(cmd->nsid);
 uint8_t fid = NVME_GETSETFEAT_FID(dw10);
 uint8_t save = NVME_SETFEAT_SAVE(dw10);
+uint16_t status;
 int i;
 
 trace_pci_nvme_setfeat(nvme_cid(req), nsid, fid, save, dw11);
@@ -5287,8 +5288,26 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
NvmeRequest *req)
 case NVME_TIMESTAMP:
 return nvme_set_feature_timestamp(n, req);
 case NVME_HOST_BEHAVIOR_SUPPORT:
-return nvme_h2c(n, (uint8_t *)&n->features.hbs,
-sizeof(n->features.hbs), req);
+status = nvme_h2c(n, (uint8_t *)&n->features.hbs,
+  sizeof(n->features.hbs), req);
+if (status) {
+return status;
+}
+
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+ns = nvme_ns(n, i);
+
+if (!ns) {
+continue;
+}
+
+ns->id_ns.nlbaf = ns->nlbaf - 1;
+if (!n->features.hbs.lbafee) {
+ns->id_ns.nlbaf = MIN(ns->id_ns.nlbaf, 15);
+}
+}
+
+return status;
 case NVME_COMMAND_SET_PROFILE:
 if (dw11 & 0x1ff) {
 trace_pci_nvme_err_invalid_iocsci(dw11 & 0x1ff);
@@ -5479,10 +5498,13 @@ static const AIOCBInfo nvme_format_aiocb_info = {
 static void nvme_format_set(NvmeNamespace *ns, uint8_t lbaf, uint8_t mset,
 uint8_t pi, uint8_t pil)
 {
+uint8_t lbafl = lbaf & 0xf;
+uint8_t lbafu = lbaf >> 4;
+
 trace_pci_nvme_format_set(ns->params.nsid, lbaf, mset, pi, pil);
 
 ns->id_ns.dps = (pil << 3) | pi;
-ns->id_ns.flbas = lbaf | (mset << 4);
+ns->id_ns.flbas = (lbafu << 5) | (mset << 4) | lbafl;
 
 nvme_ns_init_format(ns);
 }
@@ -5596,6 +5618,7 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req)
 uint8_t mset = (dw10 >> 4) & 0x1;
 uint8_t pi = (dw10 >> 5) & 0x7;
 uint8_t pil = (dw10 >> 8) & 0x1;
+uint8_t lbafu = (dw10 >> 12) & 0x3;
 uint16_t status;
 
 iocb = qemu_aio_get(&nvme_format_aiocb_info, NULL, nvme_misc_cb, req);
@@ -5612,6 +5635,10 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest 
*req)
 iocb->broadcast = (nsid == NVME_NSID_BROADCAST);
 iocb->offset = 0;
 
+if (n->features.hbs.lbafee) {
+iocb->lbaf |= lbafu << 4;
+}
+
 if (!iocb->broadcast) {
 if (!nvme_nsid_valid(n, nsid)) {
 status = NVME_INVALID_NSID | NVME_DNR;
@@ -6587,6 +6614,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->cntlid = cpu_to_le16(n->cntlid);
 
 id->oaes = cpu_to_le32(NVME_OAES_NS_ATTR);
+id->ctratt |= cpu_to_le32(NVME_CTRATT_ELBAS);
 
 id->rab = 6;
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index ee673f1a5bef..8dfb55130beb 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -112,10 +112,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 [7] = { .ds = 12, .ms = 64 },
 };
 
+ns->nlbaf = 8;
+
 memcpy(&id_ns->lbaf, &lbaf, sizeof(lbaf));
-id_ns->nlbaf = 7;
 
-for (i = 0; i <= id_ns->nlbaf; i++) {
+for (i = 0; i < ns->nlbaf; i++) {
 NvmeLBAF *lbaf = &id_ns->lbaf[i];
 if (lbaf->ds == ds) {
 if (lbaf->ms == ms) {
@@ -126,12 +127,14 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 }
 
 /* add non-standard lba format */
-id_ns->nlbaf++;
-id_ns->lbaf[id_ns->nlbaf].ds = ds;
-id_ns->lbaf[id_ns->nlbaf].ms = ms;
-id_ns->flbas |= id_ns->nlbaf;
+id_ns->lbaf[ns->nlbaf].ds = ds;
+id_ns->lbaf[ns->nlbaf].ms = ms;
+ns->nlbaf++;
+
+id_ns->flbas |= i;
 
 lbaf_found:
+id_ns->nlbaf = ns->nlbaf - 1;
 nvme_ns_init_format(ns);
 
 return 0;
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 103407038e74..e715c3255a29 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -128,6 +128,7 @@ typedef struct NvmeNamespace {
 int64_t  moff;
 NvmeIdNs id_ns;
 NvmeLBAF lbaf;
+unsigned int nlbaf;
 size_t   lbasz;
 const uint32_t *iocs;
 uint8_t  csi;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index e527c728f975..37afc9be9b18 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -,6 +,10 @@ enum NvmeIdCtrlOaes {
 NVME_OAES_NS_ATTR   = 1 << 8,
 };
 
+enum NvmeIdCtrlCtratt {
+NVME_CTRATT_ELBAS   = 1 << 15,

Re: [PATCH v2 00/25] migration: Postcopy Preemption

2022-03-01 Thread Peter Xu
On Tue, Mar 01, 2022 at 10:27:10AM +, Daniel P. Berrangé wrote:
> > I also didn't know whether there's other limitations of it.  For example,
> > will a new socket pair be a problem for any VM environment (either a
> > limitation from the management app, container, and so on)?  I think it's
> > the same to multifd in that aspect, but I never explored.
> 
> If it needs extra sockets that is something apps will need to be aware
> of unfortunately and explicitly opt-in to :-( Migration is often
> tunnelled/proxied over other channels, so whatever does that needs to
> be aware of possibility of seeing extra sockets.

Ah, then probably it can never be the default.  But for sure it could be
nice that higher level can opt-in and make it a default at some point as
long as it knows the network topology is safe to do so.

> 
> > > > TODO List
> > > > =
> > > > 
> > > > TLS support
> > > > ---
> > > > 
> > > > I only noticed its missing very recently.  Since soft freeze is coming, 
> > > > and
> > > > obviously I'm still growing this series, so I tend to have the existing
> > > > material discussed. Let's see if it can still catch the train for QEMU 
> > > > 7.0
> > > > release (soft freeze on 2022-03-08)..
> > > 
> > > I don't like the idea of shipping something that is only half finished.
> > > It means that when apps probe for the feature, they'll see preempt
> > > capability present, but have no idea whether they're using a QEMU that
> > > is broken when combined with TLS or not. We shouldn't merge something
> > > just to meet the soft freeze deadline if we know key features are broken.
> > 
> > IMHO merging and declaring support are two problems.
> > 
> > To me, it's always fine to merge the code that implemented the fundation of 
> > a
> > feature.  The feature can be worked upon in the future.
> > 
> > Requiring a feature to be "complete" sometimes can cause burden to not only
> > the author of the series but also reviewers.  It's IMHO not necessary to
> > bind these two ideas.
> > 
> > It's sometimes also hard to define "complete": take the TLS as example, no
> > one probably even noticed that it won't work with TLS and I just noticed it
> > merely these two days..  We obviously can't merge partial patchset, but if
> > the patchset is well isolated, then it's not a blocker for merging, imho.
> > 
> > Per my understanding, what you worried is when we declare it supported but
> > later we never know when TLS will be ready for it.  One solution is I can
> > rename the capability as x-, then after the TLS side ready I drop the x-
> > prefix.  Then Libvirt or any mgmt software doesn't need to support this
> > until we drop the x-, so there's no risk of compatibility.
> > 
> > Would that sound okay to you?
> 
> If it has an x- prefix then we can basically ignore it from a mgmt app
> POV until it is actually finished.
> 
> > I can always step back and work on TLS first before it's merged, but again
> > I don't think it's required.
> 
> Apps increasingly consider use of TLS to be a mandatory feature for
> migration, so until that works, this preempt has to be considered
> unsupported & unfinished IMHO. So either TLS should be ready when
> it merges, or it should be clearly marked unsupported at the QAPI
> level.

Yes, I fully agree with it, and for huge vm migrations I think TLS is in
many cases mandatory.

I do plan to work on it right afterwards if this series land, but as the
series grows I just noticed maybe we should start landing some codes that's
already solid.  Landing the code as another benefit that I want to make
sure the code merged at least won't affect the existing features.

So what I'm curious is why TLS is getting quite some attentions in the past
few years but I didn't even see any selftests included in migration-test on
tls.  That's something I wanted to look into, maybe even before adding the
preempt+tls support. But maybe I just missed something, as I didn't use tls
a lot in the past.

Thanks,

-- 
Peter Xu




Re: [PATCH v2 03/14] vhost: Add Shadow VirtQueue call forwarding capabilities

2022-03-01 Thread Eugenio Perez Martin
On Mon, Feb 28, 2022 at 4:18 AM Jason Wang  wrote:
>
>
> 在 2022/2/27 下午9:41, Eugenio Pérez 写道:
> > This will make qemu aware of the device used buffers, allowing it to
> > write the guest memory with its contents if needed.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  4 
> >   hw/virtio/vhost-shadow-virtqueue.c | 34 ++
> >   hw/virtio/vhost-vdpa.c | 31 +--
> >   3 files changed, 67 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > b/hw/virtio/vhost-shadow-virtqueue.h
> > index 1cbc87d5d8..1d4c160d0a 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -28,9 +28,13 @@ typedef struct VhostShadowVirtqueue {
> >* So shadow virtqueue must not clean it, or we would lose VirtQueue 
> > one.
> >*/
> >   EventNotifier svq_kick;
> > +
> > +/* Guest's call notifier, where the SVQ calls guest. */
> > +EventNotifier svq_call;
> >   } VhostShadowVirtqueue;
> >
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int 
> > svq_kick_fd);
> > +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int 
> > call_fd);
> >
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index a5d0659f86..54c701a196 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -23,6 +23,38 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> >   event_notifier_set(&svq->hdev_kick);
> >   }
> >
> > +/* Forward vhost notifications */
> > +static void vhost_svq_handle_call(EventNotifier *n)
> > +{
> > +VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > + hdev_call);
> > +event_notifier_test_and_clear(n);
> > +event_notifier_set(&svq->svq_call);
> > +}
> > +
> > +/**
> > + * Set the call notifier for the SVQ to call the guest
> > + *
> > + * @svq Shadow virtqueue
> > + * @call_fd call notifier
> > + *
> > + * Called on BQL context.
> > + */
> > +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int 
> > call_fd)
>
>
> I think we need to have consistent naming for both kick and call. Note
> that in patch 2 we had
>
> vhost_svq_set_svq_kick_fd
>
> Maybe it's better to use vhost_svq_set_guest_call_fd() here.
>

I think the same, I will replace it for the next version.

>
> > +{
> > +if (call_fd == VHOST_FILE_UNBIND) {
> > +/*
> > + * Fail event_notifier_set if called handling device call.
> > + *
> > + * SVQ still needs device notifications, since it needs to keep
> > + * forwarding used buffers even with the unbind.
> > + */
> > +memset(&svq->svq_call, 0, sizeof(svq->svq_call));
>
>
> I may miss something but shouldn't we stop polling svq_call here like
>
> event_notifier_set_handle(&svq->svq_call, false);
>

SVQ never polls that descriptor: It uses that descriptor to call (as
notify) the guest at vhost_svq_flush when SVQ uses descriptors.

svq_kick, svq_call: Descriptors that the guest send to SVQ
hdev_kick, hdev_call: Descriptors that qemu/SVQ send to the device.

I admit it is confusing when reading the code but I cannot come up
with a better naming. Maybe it helps to add a diagram at the top of
the file like:

+---+-> svq_kick_fd ->+-+-> hdev_kick ->+-+
| Guest | | SVQ |   | Dev |
+---+<- svq_call_fd <-+-+<- hdev_call <-+-+

Thanks!

> ?
>
> Thanks
>
>




Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Michael S. Tsirkin
On Tue, Mar 01, 2022 at 09:43:54AM +0100, Igor Mammedov wrote:
> On Mon, 28 Feb 2022 22:17:32 +0200
> Liav Albani  wrote:
> 
> > This can allow the guest OS to determine more easily if i8042 controller
> > is present in the system or not, so it doesn't need to do probing of the
> > controller, but just initialize it immediately, before enumerating the
> > ACPI AML namespace.
> > 
> > This change only applies to the x86/q35 machine type, as it uses FACP
> > ACPI table with revision higher than 1, which should implement at least
> > ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> > flags register according to the ACPI 2.0 specification.
> > 
> > Signed-off-by: Liav Albani 
> > ---
> >  hw/acpi/aml-build.c | 11 ++-
> >  hw/i386/acpi-build.c|  9 +
> >  hw/i386/acpi-microvm.c  |  9 +
> commit message says it's q35 specific, so wy it touched microvm anc piix4?
> 
> >  include/hw/acpi/acpi-defs.h |  1 +
> >  4 files changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index 8966e16320..2085905b83 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -2152,7 +2152,16 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, 
> > const AcpiFadtData *f,
> >  build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
> >  build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
> >  build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
> > -build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
> > +/* IAPC_BOOT_ARCH */
> > +/*
> > + * This register is not defined in ACPI spec version 1.0, where the 
> > FACP
> > + * revision == 1 also applies. Therefore, just ignore setting this 
> > register.
> > + */
> > +if (f->rev == 1) {
> > +build_append_int_noprefix(tbl, 0, 2);
> > +} else {
> > +build_append_int_noprefix(tbl, f->iapc_boot_arch, 2);
> > +}
> >  build_append_int_noprefix(tbl, 0, 1); /* Reserved */
> >  build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
> >  
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index ebd47aa26f..c72c7bb9bb 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -38,6 +38,7 @@
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "hw/acpi/bios-linker-loader.h"
> >  #include "hw/isa/isa.h"
> > +#include "hw/input/i8042.h"
> >  #include "hw/block/fdc.h"
> >  #include "hw/acpi/memory_hotplug.h"
> >  #include "sysemu/tpm.h"
> > @@ -192,6 +193,14 @@ static void init_common_fadt_data(MachineState *ms, 
> > Object *o,
> >  .address = object_property_get_uint(o, ACPI_PM_PROP_GPE0_BLK, 
> > NULL)
> >  },
> >  };
> > +/*
> > + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> > presence
> > + * or equivalent micro controller. See table 5-10 of APCI spec version 
> > 2.0
> > + * (the earliest acpi revision that supports this).
> 
>  /* APCI spec version 2.0, Table 5-10 */
> 
> is sufficient, the rest could be read from spec/

ACPI though, not APCI.
The comment can be shorter and more clearer, but I feel quoting spec
and including table name is a good idea actually, but pls quote verbatim:

/* ACPI spec version 2.0, Table 5-10: Fixed ACPI Description Table Boot 
Architecture Flags */
/* Bit offset 1 -  port 60 and 64 based keyboard controller, usually 
implemented as an 8042 or equivalent micro-controller. */

> 
> > + */
> > +fadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) ?
> > +0x0002 : 0x;

and make it 0x1 << 1 - clearer that this is bit 1. Leading zeroes are
not helpful since compiler does not check there's a correct number of
these.

> > +
> >  *data = fadt;
> >  }
> >  
> > diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> > index 68ca7e7fc2..4bc72b1672 100644
> > --- a/hw/i386/acpi-microvm.c
> > +++ b/hw/i386/acpi-microvm.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/acpi/generic_event_device.h"
> >  #include "hw/acpi/utils.h"
> >  #include "hw/acpi/erst.h"
> > +#include "hw/input/i8042.h"
> >  #include "hw/i386/fw_cfg.h"
> >  #include "hw/i386/microvm.h"
> >  #include "hw/pci/pci.h"
> > @@ -189,6 +190,14 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
> >  .reset_val = ACPI_GED_RESET_VALUE,
> >  };
> >  
> > +/*
> > + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> > presence
> > + * or equivalent micro controller. See table 5-10 of APCI spec version 
> > 2.0
> > + * (the earliest acpi revision that supports this).
> > + */
> > +pmfadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) 
> > ?
> > +0x0002 : 0x;
> > +

let's avoid code duplication pls.

> >  table_offsets = g_array_new(false, true /* clear */,
> >  sizeof(uint32_t));
> >  bios_linker

Re: [PATCH v4 3/3] tests/acpi: i386: update FACP table differences

2022-03-01 Thread Michael S. Tsirkin
On Tue, Mar 01, 2022 at 08:29:57AM +0530, Ani Sinha wrote:
> 
> 
> On Mon, 28 Feb 2022, Liav Albani wrote:
> 
> > After changing the IAPC boot flags register to indicate support of i8042
> > in the machine chipset to help the guest OS to determine its existence
> > "faster", we need to have the updated FACP ACPI binary images in tree.
> >
> > @@ -1,32 +1,32 @@
> >  /*
> >   * Intel ACPI Component Architecture
> >   * AML/ASL+ Disassembler version 20211217 (64-bit version)
> >   * Copyright (c) 2000 - 2021 Intel Corporation
> >   *
> > - * Disassembly of tests/data/acpi/q35/FACP, Wed Feb 23 22:37:39 2022
> > + * Disassembly of /tmp/aml-BBFBI1, Wed Feb 23 22:37:39 2022

cut this out pls

> >   *
> >   * ACPI Data Table [FACP]
> >   *
> >   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue 
> > (in hex)
> >   */
> >
> >  [000h    4]Signature : "FACP"[Fixed ACPI 
> > Description Table (FADT)]
> >  [004h 0004   4] Table Length : 00F4
> >  [008h 0008   1] Revision : 03
> > -[009h 0009   1] Checksum : B9
> > +[009h 0009   1] Checksum : B7

and this

> >  [00Ah 0010   6]   Oem ID : "BOCHS "
> >  [010h 0016   8] Oem Table ID : "BXPC"
> >  [018h 0024   4] Oem Revision : 0001
> >  [01Ch 0028   4]  Asl Compiler ID : "BXPC"
> >  [020h 0032   4]Asl Compiler Revision : 0001
> >
> >  [024h 0036   4] FACS Address : 
> >  [028h 0040   4] DSDT Address : 
> >  [02Ch 0044   1]Model : 01
> >  [02Dh 0045   1]   PM Profile : 00 [Unspecified]
> >  [02Eh 0046   2]SCI Interrupt : 0009
> >  [030h 0048   4] SMI Command Port : 00B2
> >  [034h 0052   1]ACPI Enable Value : 02
> >  [035h 0053   1]   ACPI Disable Value : 03
> >  [036h 0054   1]   S4BIOS Command : 00
> >  [037h 0055   1]  P-State Control : 00
> > @@ -42,35 +42,35 @@
> >  [059h 0089   1] PM1 Control Block Length : 02
> >  [05Ah 0090   1] PM2 Control Block Length : 00
> >  [05Bh 0091   1]PM Timer Block Length : 04
> >  [05Ch 0092   1]GPE0 Block Length : 10
> >  [05Dh 0093   1]GPE1 Block Length : 00
> >  [05Eh 0094   1] GPE1 Base Offset : 00
> >  [05Fh 0095   1] _CST Support : 00
> >  [060h 0096   2]   C2 Latency : 0FFF
> >  [062h 0098   2]   C3 Latency : 0FFF
> >  [064h 0100   2]   CPU Cache Size : 
> >  [066h 0102   2]   Cache Flush Stride : 
> >  [068h 0104   1]Duty Cycle Offset : 00
> >  [069h 0105   1] Duty Cycle Width : 00
> >  [06Ah 0106   1]  RTC Day Alarm Index : 00
> >  [06Bh 0107   1]RTC Month Alarm Index : 00
> >  [06Ch 0108   1]RTC Century Index : 32
> > -[06Dh 0109   2]   Boot Flags (decoded below) : 
> > +[06Dh 0109   2]   Boot Flags (decoded below) : 0002
> > Legacy Devices Supported (V2) : 0
> > -8042 Present on ports 60/64 (V2) : 0
> > +8042 Present on ports 60/64 (V2) : 1
> >  VGA Not Present (V4) : 0
> >MSI Not Supported (V4) : 0
> >  PCIe ASPM Not Supported (V4) : 0
> > CMOS RTC Not Present (V5) : 0


leaving just this

> >  [06Fh 0111   1] Reserved : 00
> >  [070h 0112   4]Flags (decoded below) : 84A5
> >WBINVD instruction is operational (V1) : 1
> >WBINVD flushes all caches (V1) : 0
> >  All CPUs support C1 (V1) : 1
> >C2 works on MP system (V1) : 0
> >  Control Method Power Button (V1) : 0
> >  Control Method Sleep Button (V1) : 1
> >  RTC wake not in fixed reg space (V1) : 0
> >  RTC can wake system from S4 (V1) : 1
> >  32-bit PM Timer (V1) : 0
> >Docking Supported (V1) : 0
> > @@ -148,32 +148,32 @@
> >  [0DCh 0220   1] Space ID : 01 [SystemIO]
> >  [0DDh 0221   1]Bit Width : 80
> >  [0DEh 0222   1]   Bit Offset : 00
> >  [0DFh 0223   1] Encoded Access Width : 00 [Undefined/Legacy]
> >  [0E0h 0224   8]  Address : 0620
> >
> >  [0E8h 0232  12]   GPE1 Block : [Generic Address Structure]
> >  [0E8h 0232   1] Space ID : 00 [SystemMemory]
> >  [0E9h 0233   1]Bit Width : 00
> >  [0EAh 0234   1]   Bit Offset : 00
> >  [0EBh 0235   1] Encoded Access Width : 00 [Undefined/Legacy]
> >  [0ECh 0236   8]  Address : 
> >
> >  Raw Table Data: Length 244 (0xF4)
> >
> > -: 46 41 43 50 F4 00 00 00

Re: [PATCH v7 02/31] main loop: macros to mark GS and I/O functions

2022-03-01 Thread Kevin Wolf
Am 11.02.2022 um 15:51 hat Emanuele Giuseppe Esposito geschrieben:
> Righ now, IO_CODE and IO_OR_GS_CODE are nop, as there isn't
> really a way to check that a function is only called in I/O.
> On the other side, we can use qemu_in_main_thread to check if
> we are in the main loop.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  include/qemu/main-loop.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index bc42b5939d..77adc51627 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -269,6 +269,15 @@ bool qemu_mutex_iothread_locked(void);
>   */
>  bool qemu_in_main_thread(void);
>  
> +/* Mark and check that the function is part of the global state API. */
> +#define GLOBAL_STATE_CODE() assert(qemu_in_main_thread())
> +
> +/* Mark and check that the function is part of the I/O API. */
> +#define IO_CODE() /* nop */
> +
> +/* Mark and check that the function is part of the "I/O OR GS" API. */
> +#define IO_OR_GS_CODE() /* nop */
> +

I don't think it is actually a problem with the current macro expansions
and the places where they are used are limited, but if you have to
respin, I'd consider wrapping things in the usual do { ... } while (0)
just to be sure.

Kevin




Re: [PATCH v9 3/3] qapi/monitor: allow VNC display id in set/expire_password

2022-03-01 Thread Dr. David Alan Gilbert
* Fabian Ebner (f.eb...@proxmox.com) wrote:
> From: Stefan Reiter 
> 
> It is possible to specify more than one VNC server on the command line,
> either with an explicit ID or the auto-generated ones à la "default",
> "vnc2", "vnc3", ...
> 
> It is not possible to change the password on one of these extra VNC
> displays though. Fix this by adding a "display" parameter to the
> "set_password" and "expire_password" QMP and HMP commands.
> 
> For HMP, the display is specified using the "-d" value flag.
> 
> For QMP, the schema is updated to explicitly express the supported
> variants of the commands with protocol-discriminated unions.
> 
> Signed-off-by: Stefan Reiter 
> [FE: update "Since: " from 6.2 to 7.0
>  make @connected a common member of @SetPasswordOptions]
> Signed-off-by: Fabian Ebner 

Reviewed-by: Dr. David Alan Gilbert 

> ---
> 
> v8 -> v9:
> * Make @connected a common member of @SetPasswordOptions.
> * Use s rather than V to indicate that the flag takes a string value.
> 
>  hmp-commands.hx| 24 ++--
>  monitor/hmp-cmds.c | 40 +--
>  monitor/qmp-cmds.c | 34 +++-
>  qapi/ui.json   | 96 +++---
>  4 files changed, 129 insertions(+), 65 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 70a9136ac2..8476277aa9 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1514,33 +1514,35 @@ ERST
>  
>  {
>  .name   = "set_password",
> -.args_type  = "protocol:s,password:s,connected:s?",
> -.params = "protocol password action-if-connected",
> +.args_type  = "protocol:s,password:s,display:-ds,connected:s?",
> +.params = "protocol password [-d display] [action-if-connected]",
>  .help   = "set spice/vnc password",
>  .cmd= hmp_set_password,
>  },
>  
>  SRST
> -``set_password [ vnc | spice ] password [ action-if-connected ]``
> -  Change spice/vnc password.  *action-if-connected* specifies what
> -  should happen in case a connection is established: *fail* makes the
> -  password change fail.  *disconnect* changes the password and
> +``set_password [ vnc | spice ] password [ -d display ] [ action-if-connected 
> ]``
> +  Change spice/vnc password.  *display* can be used with 'vnc' to specify
> +  which display to set the password on.  *action-if-connected* specifies
> +  what should happen in case a connection is established: *fail* makes
> +  the password change fail.  *disconnect* changes the password and
>disconnects the client.  *keep* changes the password and keeps the
>connection up.  *keep* is the default.
>  ERST
>  
>  {
>  .name   = "expire_password",
> -.args_type  = "protocol:s,time:s",
> -.params = "protocol time",
> +.args_type  = "protocol:s,time:s,display:-ds",
> +.params = "protocol time [-d display]",
>  .help   = "set spice/vnc password expire-time",
>  .cmd= hmp_expire_password,
>  },
>  
>  SRST
> -``expire_password [ vnc | spice ]`` *expire-time*
> -  Specify when a password for spice/vnc becomes
> -  invalid. *expire-time* accepts:
> +``expire_password [ vnc | spice ] expire-time [ -d display ]``
> +  Specify when a password for spice/vnc becomes invalid.
> +  *display* behaves the same as in ``set_password``.
> +  *expire-time* accepts:
>  
>``now``
>  Invalidate password instantly.
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index ff78741b75..634968498b 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1396,24 +1396,33 @@ void hmp_set_password(Monitor *mon, const QDict 
> *qdict)
>  {
>  const char *protocol  = qdict_get_str(qdict, "protocol");
>  const char *password  = qdict_get_str(qdict, "password");
> +const char *display = qdict_get_try_str(qdict, "display");
>  const char *connected = qdict_get_try_str(qdict, "connected");
>  Error *err = NULL;
> -DisplayProtocol proto;
> -SetPasswordAction conn;
>  
> -proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
> -DISPLAY_PROTOCOL_VNC, &err);
> +SetPasswordOptions opts = {
> +.password = (char *)password,
> +.has_connected = !!connected,
> +};
> +
> +opts.connected = qapi_enum_parse(&SetPasswordAction_lookup, connected,
> + SET_PASSWORD_ACTION_KEEP, &err);
>  if (err) {
>  goto out;
>  }
>  
> -conn = qapi_enum_parse(&SetPasswordAction_lookup, connected,
> -   SET_PASSWORD_ACTION_KEEP, &err);
> +opts.protocol = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
> +DISPLAY_PROTOCOL_VNC, &err);
>  if (err) {
>  goto out;
>  }
>  
> -qmp_set_password(proto, password, !!connected, conn, &err);
> +if (opts.protocol == DISPLAY_PROTOCOL_VNC) {
> +opts.u.vnc.has_displa

Re: [PATCH v2 00/14] vDPA shadow virtqueue

2022-03-01 Thread Eugenio Perez Martin
On Mon, Feb 28, 2022 at 3:32 AM Jason Wang  wrote:
>
> On Sun, Feb 27, 2022 at 9:42 PM Eugenio Pérez  wrote:
> >
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> >
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked.
> >
> > This effectively means that vDPA device passthrough is intercepted by
> > qemu. While SVQ should only be enabled at migration time, the switching
> > from regular mode to SVQ mode is left for a future series.
> >
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> >
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx, indirect descriptors, packed and in_order.
> > These features are easy to implement on top of this base, but is left
> > for a future series for simplicity.
> >
> > SVQ needs to be enabled at qemu start time with vdpa cmdline parameter:
> >
> > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,x-svq=off
> >
> > The first three patches enables notifications forwarding with
> > assistance of qemu. It's easy to enable only this if the relevant
> > cmdline part of the last patch is applied on top of these.
> >
> > Next four patches implement the actual buffer forwarding. However,
> > address are not translated from HVA so they will need a host device with
> > an iommu allowing them to access all of the HVA range.
> >
> > The last part of the series uses properly the host iommu, so qemu
> > creates a new iova address space in the device's range and translates
> > the buffers in it. Finally, it adds the cmdline parameter.
> >
> > Some simple performance tests with netperf were done. They used a nested
> > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > baseline average of ~9980.13Mbps:
> > Recv   SendSend
> > Socket Socket  Message  Elapsed
> > Size   SizeSize Time Throughput
> > bytes  bytes   bytessecs.10^6bits/sec
> >
> > 131072  16384  1638430.019910.61
> > 131072  16384  1638430.0010030.94
> > 131072  16384  1638430.019998.84
> >
> > To enable the notifications interception reduced performance to an
> > average of ~9577.73Mbit/s:
> > Recv   SendSend
> > Socket Socket  Message  Elapsed
> > Size   SizeSize Time Throughput
> > bytes  bytes   bytessecs.10^6bits/sec
> >
> > 131072  16384  1638430.009563.03
> > 131072  16384  1638430.019626.65
> > 131072  16384  1638430.019543.51
> >
> > Finally, to enable buffers forwarding reduced the throughput again to
> > ~8902.92Mbit/s:
> > Recv   SendSend
> > Socket Socket  Message  Elapsed
> > Size   SizeSize Time Throughput
> > bytes  bytes   bytessecs.10^6bits/sec
> >
> > 131072  16384  1638430.018643.19
> > 131072  16384  1638430.019033.56
> > 131072  16384  1638430.019032.02
> >
> > However, many performance improvements were left out of this series for
> > simplicity, so difference if performance should shrink in the future.
>
> I think the performance should be acceptable as a start.
>
> >
> > Comments are welcome.
> >
> > TODO in future series:
> > * Event, indirect, packed, and others features of virtio.
> > * To support different set of features between the device<->SVQ and the
> >   SVQ<->guest communication.
> > * Support of device host notifier memory regions.
> > * To sepparate buffers forwarding in its own AIO context, so we can
> >   throw more threads to that task and we don't need to stop the main
> >   event loop.
> > * Support multiqueue virtio-net vdpa.
> > * Proper documentation.
> >
> > Changes from v1:
> > * Feature set at device->SVQ is now the same as SVQ->guest.
> > * Size of SVQ is not max available device size anymore, but guest's
> >   negotiated.
> > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > * Make SVQ a public struct
> > * Come back to previous approach to iova-tree
> > * Some assertions are now fail paths. Some errors are now log_guest.
> > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > * Refactor some errors and messages. Add missing error unwindings.
> > * Add memory barrier at _F_NO_NOTIFY set.
> > * Stop checking for features flags out of transport range.
> > v1 link:
> > ht

Re: [PATCH] hw/arm/virt: Validate memory size on the first NUMA node

2022-03-01 Thread Gerd Hoffmann
  Hi,

> > Unless it architecturally wrong thing i.e. (node size less than 128Mb)
> > ,in which case limiting it in QEMU would be justified, I'd prefer
> > firmware being fixed or it reporting more useful for user error message.
> 
> [include EDK2 developers]
> 
> I don't think 128MB node memory size is architecturally required.
> I also thought EDK2 would be better place to provide a precise error
> mesage and discussed it through with EDK2 developers. Lets see what
> are their thoughts this time.

Useful error reporting that early in the firmware initialization is a
rather hard problem, it's much easier for qemu to catch those problems
and print a useful error message.

Fixing the firmware would be possible.  The firmware simply uses the
memory of the first numa note in the early initialization phase, which
could be changed to look for additional numa nodes.  It's IMHO simply
not worth the trouble though.  numa nodes with less memory than 128M
simply doesn't happen in practice, except when QE does questionable
scaleability testing (scale up the number of numa nodes without also
scaling up the total amount of memory, ending up with rather tiny
numa nodes and a configuration nobody actually uses in practice).

take care,
  Gerd




Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Igor Mammedov
On Tue, 1 Mar 2022 06:19:51 -0500
"Michael S. Tsirkin"  wrote:

> On Tue, Mar 01, 2022 at 09:43:54AM +0100, Igor Mammedov wrote:
> > On Mon, 28 Feb 2022 22:17:32 +0200
> > Liav Albani  wrote:
> >   
> > > This can allow the guest OS to determine more easily if i8042 controller
> > > is present in the system or not, so it doesn't need to do probing of the
> > > controller, but just initialize it immediately, before enumerating the
> > > ACPI AML namespace.
> > > 
> > > This change only applies to the x86/q35 machine type, as it uses FACP
> > > ACPI table with revision higher than 1, which should implement at least
> > > ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> > > flags register according to the ACPI 2.0 specification.
> > > 
> > > Signed-off-by: Liav Albani 
> > > ---
> > >  hw/acpi/aml-build.c | 11 ++-
> > >  hw/i386/acpi-build.c|  9 +
> > >  hw/i386/acpi-microvm.c  |  9 +  
> > commit message says it's q35 specific, so wy it touched microvm anc piix4?
> >   
> > >  include/hw/acpi/acpi-defs.h |  1 +
> > >  4 files changed, 29 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > > index 8966e16320..2085905b83 100644
> > > --- a/hw/acpi/aml-build.c
> > > +++ b/hw/acpi/aml-build.c
> > > @@ -2152,7 +2152,16 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, 
> > > const AcpiFadtData *f,
> > >  build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
> > >  build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
> > >  build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
> > > -build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
> > > +/* IAPC_BOOT_ARCH */
> > > +/*
> > > + * This register is not defined in ACPI spec version 1.0, where the 
> > > FACP
> > > + * revision == 1 also applies. Therefore, just ignore setting this 
> > > register.
> > > + */
> > > +if (f->rev == 1) {
> > > +build_append_int_noprefix(tbl, 0, 2);
> > > +} else {
> > > +build_append_int_noprefix(tbl, f->iapc_boot_arch, 2);
> > > +}
> > >  build_append_int_noprefix(tbl, 0, 1); /* Reserved */
> > >  build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
> > >  
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index ebd47aa26f..c72c7bb9bb 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -38,6 +38,7 @@
> > >  #include "hw/nvram/fw_cfg.h"
> > >  #include "hw/acpi/bios-linker-loader.h"
> > >  #include "hw/isa/isa.h"
> > > +#include "hw/input/i8042.h"
> > >  #include "hw/block/fdc.h"
> > >  #include "hw/acpi/memory_hotplug.h"
> > >  #include "sysemu/tpm.h"
> > > @@ -192,6 +193,14 @@ static void init_common_fadt_data(MachineState *ms, 
> > > Object *o,
> > >  .address = object_property_get_uint(o, 
> > > ACPI_PM_PROP_GPE0_BLK, NULL)
> > >  },
> > >  };
> > > +/*
> > > + * second bit of 16 of the IAPC_BOOT_ARCH register indicates i8042 
> > > presence
> > > + * or equivalent micro controller. See table 5-10 of APCI spec 
> > > version 2.0
> > > + * (the earliest acpi revision that supports this).  
> > 
> >  /* APCI spec version 2.0, Table 5-10 */
> > 
> > is sufficient, the rest could be read from spec/  
> 
> ACPI though, not APCI.
> The comment can be shorter and more clearer,

> but I feel quoting spec
> and including table name is a good idea actually, but pls quote verbatim:
I don't do that  and don't ask it from others.

The reason being that pointing where to look in spec and having
verbatim copy of field name is sufficient for looking it up and
QEMU does not endup with half of spec copied in (+unintentional mistakes).
(As reviewer I will check if whatever written in patch actually matches
spec anyways)

That's why I typically use
  'spec ver, verbatim field name[, chapter/table name]'
policy. The later optional part is usually used for pointing
to values description.
 
> /* ACPI spec version 2.0, Table 5-10: Fixed ACPI Description Table Boot 
> Architecture Flags */
> /* Bit offset 1 -  port 60 and 64 based keyboard controller, usually 
> implemented as an 8042 or equivalent micro-controller. */
> 
> >   
> > > + */
> > > +fadt.iapc_boot_arch = object_resolve_path_type("", TYPE_I8042, NULL) 
> > > ?
> > > +0x0002 : 0x;  
> 
> and make it 0x1 << 1 - clearer that this is bit 1. Leading zeroes are
> not helpful since compiler does not check there's a correct number of
> these.
> 
> > > +
> > >  *data = fadt;
> > >  }
> > >  
> > > diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
> > > index 68ca7e7fc2..4bc72b1672 100644
> > > --- a/hw/i386/acpi-microvm.c
> > > +++ b/hw/i386/acpi-microvm.c
> > > @@ -31,6 +31,7 @@
> > >  #include "hw/acpi/generic_event_device.h"
> > >  #include "hw/acpi/utils.h"
> > >  #include "hw/acpi/erst.h"
> > > +#include "hw/input/i8042.h"
> > >  #include "hw/i38

Re: [PATCH v4 2/3] hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table

2022-03-01 Thread Igor Mammedov
On Tue, 1 Mar 2022 15:22:17 +0530 (IST)
Ani Sinha  wrote:

> On Tue, 1 Mar 2022, Igor Mammedov wrote:
> 
> > On Mon, 28 Feb 2022 22:17:32 +0200
> > Liav Albani  wrote:
> >  
> > > This can allow the guest OS to determine more easily if i8042 controller
> > > is present in the system or not, so it doesn't need to do probing of the
> > > controller, but just initialize it immediately, before enumerating the
> > > ACPI AML namespace.
> > >
> > > This change only applies to the x86/q35 machine type, as it uses FACP
> > > ACPI table with revision higher than 1, which should implement at least
> > > ACPI 2.0 features within the table, hence it can also set the IA-PC boot
> > > flags register according to the ACPI 2.0 specification.
> > >
> > > Signed-off-by: Liav Albani 
> > > ---
> > >  hw/acpi/aml-build.c | 11 ++-
> > >  hw/i386/acpi-build.c|  9 +
> > >  hw/i386/acpi-microvm.c  |  9 +  
> > commit message says it's q35 specific, so wy it touched microvm anc piix4?  
> 
> Igor is correct. Although I see that currently there are no 8042 devices
> for microvms, maybe we should be conservative and add the code to detect
> the device anyway. In that case, the change could affect microvms too when
> such devices get added in the future.

when that case actually arises implement it then, so I'd say don't generalize
that unless it's actually used within series.
Or planned to be used in near future in which case commit message should
mention that.

> 
> echo -e "info qtree\r\nquit\r\n" | ./qemu-system-x86_64 -machine microvm
> -monitor stdio 2>/dev/null | grep 8042
> 
> 
> 




Re: Portable inline asm to get address of TLS variable

2022-03-01 Thread Florian Weimer
* Stefan Hajnoczi:

>> But going against ABI and toolchain in this way is really no long-term
>> solution.  You need to switch to stackless co-routines, or we need to
>> provide proper ABI-level support for this.  Today it's the thread
>> pointer, tomorrow it's the shadow stack pointer, and the day after that,
>> it's the SafeStack pointer.  And further down the road, it's some thread
>> state for garbage collection support.  Or something like that.
>
> Yes, understood :(. This does feel like solving an undefined behavior
> problem by adding more undefined behavior on top!
>
> Stackless coroutines have been tried in the past using Continuation
> Passing C (https://github.com/kerneis/cpc). Ideally we'd use a solution
> built into the compiler though. I'm concerned that CPC might not be
> supported or available everywhere QEMU needs to run now and in the
> future.

That seems to be require an entirely different toolchain (based on CIL).
It's one way to solve the ABI issues, but perhaps not the direction
you want to go in.

> I took a quick look at C++20 coroutines since they are available in
> compilers but the primitives look hard to use even from C++, let alone
> from C.

Could you go into details what makes them hard to use?  Is it because
coroutines are infectious across the call stack?

Thanks,
Florian




[PATCH v8 07/14] target/riscv: rvk: add support for zkne/zknd extension in RV64

2022-03-01 Thread Weiwei Li
 - add aes64dsm, aes64ds, aes64im, aes64es, aes64esm, aes64ks2, aes64ks1i 
instructions

Co-authored-by: Ruibo Lu 
Co-authored-by: Zewen Ye 
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/crypto_helper.c| 169 
 target/riscv/helper.h   |   8 ++
 target/riscv/insn32.decode  |  12 ++
 target/riscv/insn_trans/trans_rvk.c.inc |  47 +++
 4 files changed, 236 insertions(+)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 220d51c742..cb4783a1e9 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -102,4 +102,173 @@ target_ulong HELPER(aes32dsi)(target_ulong rs1, 
target_ulong rs2,
 {
 return aes32_operation(shamt, rs1, rs2, false, false);
 }
+
+#define BY(X, I) ((X >> (8 * I)) & 0xFF)
+
+#define AES_SHIFROWS_LO(RS1, RS2) ( \
+(((RS1 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
+(((RS2 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
+(((RS2 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
+(((RS1 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
+
+#define AES_INVSHIFROWS_LO(RS1, RS2) ( \
+(((RS2 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
+(((RS1 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
+(((RS1 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
+(((RS2 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
+
+#define AES_MIXBYTE(COL, B0, B1, B2, B3) ( \
+BY(COL, B3) ^ BY(COL, B2) ^ AES_GFMUL(BY(COL, B1), 3) ^ \
+AES_GFMUL(BY(COL, B0), 2))
+
+#define AES_MIXCOLUMN(COL) ( \
+AES_MIXBYTE(COL, 3, 0, 1, 2) << 24 | \
+AES_MIXBYTE(COL, 2, 3, 0, 1) << 16 | \
+AES_MIXBYTE(COL, 1, 2, 3, 0) << 8 | AES_MIXBYTE(COL, 0, 1, 2, 3) << 0)
+
+#define AES_INVMIXBYTE(COL, B0, B1, B2, B3) ( \
+AES_GFMUL(BY(COL, B3), 0x9) ^ AES_GFMUL(BY(COL, B2), 0xd) ^ \
+AES_GFMUL(BY(COL, B1), 0xb) ^ AES_GFMUL(BY(COL, B0), 0xe))
+
+#define AES_INVMIXCOLUMN(COL) ( \
+AES_INVMIXBYTE(COL, 3, 0, 1, 2) << 24 | \
+AES_INVMIXBYTE(COL, 2, 3, 0, 1) << 16 | \
+AES_INVMIXBYTE(COL, 1, 2, 3, 0) << 8 | \
+AES_INVMIXBYTE(COL, 0, 1, 2, 3) << 0)
+
+static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
+   bool enc, bool mix)
+{
+uint64_t RS1 = rs1;
+uint64_t RS2 = rs2;
+uint64_t result;
+uint64_t temp;
+uint32_t col_0;
+uint32_t col_1;
+
+if (enc) {
+temp = AES_SHIFROWS_LO(RS1, RS2);
+temp = (((uint64_t)AES_sbox[(temp >> 0) & 0xFF] << 0) |
+((uint64_t)AES_sbox[(temp >> 8) & 0xFF] << 8) |
+((uint64_t)AES_sbox[(temp >> 16) & 0xFF] << 16) |
+((uint64_t)AES_sbox[(temp >> 24) & 0xFF] << 24) |
+((uint64_t)AES_sbox[(temp >> 32) & 0xFF] << 32) |
+((uint64_t)AES_sbox[(temp >> 40) & 0xFF] << 40) |
+((uint64_t)AES_sbox[(temp >> 48) & 0xFF] << 48) |
+((uint64_t)AES_sbox[(temp >> 56) & 0xFF] << 56));
+if (mix) {
+col_0 = temp & 0x;
+col_1 = temp >> 32;
+
+col_0 = AES_MIXCOLUMN(col_0);
+col_1 = AES_MIXCOLUMN(col_1);
+
+result = ((uint64_t)col_1 << 32) | col_0;
+} else {
+result = temp;
+}
+} else {
+temp = AES_INVSHIFROWS_LO(RS1, RS2);
+temp = (((uint64_t)AES_isbox[(temp >> 0) & 0xFF] << 0) |
+((uint64_t)AES_isbox[(temp >> 8) & 0xFF] << 8) |
+((uint64_t)AES_isbox[(temp >> 16) & 0xFF] << 16) |
+((uint64_t)AES_isbox[(temp >> 24) & 0xFF] << 24) |
+((uint64_t)AES_isbox[(temp >> 32) & 0xFF] << 32) |
+((uint64_t)AES_isbox[(temp >> 40) & 0xFF] << 40) |
+((uint64_t)AES_isbox[(temp >> 48) & 0xFF] << 48) |
+((uint64_t)AES_isbox[(temp >> 56) & 0xFF] << 56));
+if (mix) {
+col_0 = temp & 0x;
+col_1 = temp >> 32;
+
+col_0 = AES_INVMIXCOLUMN(col_0);
+col_1 = AES_INVMIXCOLUMN(col_1);
+
+result = ((uint64_t)col_1 << 32) | col_0;
+} else {
+result = temp;
+}
+}
+
+return result;
+}
+
+target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
+{
+return aes64_operation(rs1, rs2, true, true);
+}
+
+target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
+{
+return aes64_operation(rs1, rs2, true, false);
+}
+
+target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
+{
+return aes64_operation(rs1, rs2, false, false);
+}
+
+target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)
+{
+return aes64_operation(rs1, rs2, false, true);
+}
+
+target_ulong HELPER(aes64ks2)(target_ulong rs1, target_ulong rs2)
+{
+uint64_t RS1 = rs1;
+uint64_t RS2 = rs2;
+uint32_t rs1_hi = RS1

[PATCH v8 03/14] target/riscv: rvk: add support for zbkc extension

2022-03-01 Thread Weiwei Li
 - reuse partial instructions of zbc extension, update extension check for them

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn32.decode  | 3 ++-
 target/riscv/insn_trans/trans_rvb.c.inc | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fdceaf621a..3a49acab37 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -770,9 +770,10 @@ clzw   011 0 . 001 . 0011011 @r2
 ctzw   011 1 . 001 . 0011011 @r2
 cpopw  011 00010 . 001 . 0011011 @r2
 
-# *** RV32 Zbc Standard Extension ***
+# *** RV32 Zbc/Zbkc Standard Extension ***
 clmul  101 .. 001 . 0110011 @r
 clmulh 101 .. 011 . 0110011 @r
+# *** RV32 extra Zbc Standard Extension ***
 clmulr 101 .. 010 . 0110011 @r
 
 # *** RV32 Zbs Standard Extension ***
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index a6b733d5ff..1980bfe971 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -489,7 +489,7 @@ static bool trans_slli_uw(DisasContext *ctx, arg_slli_uw *a)
 
 static bool trans_clmul(DisasContext *ctx, arg_clmul *a)
 {
-REQUIRE_ZBC(ctx);
+REQUIRE_EITHER_EXT(ctx, zbc, zbkc);
 return gen_arith(ctx, a, EXT_NONE, gen_helper_clmul, NULL);
 }
 
@@ -501,7 +501,7 @@ static void gen_clmulh(TCGv dst, TCGv src1, TCGv src2)
 
 static bool trans_clmulh(DisasContext *ctx, arg_clmulr *a)
 {
-REQUIRE_ZBC(ctx);
+REQUIRE_EITHER_EXT(ctx, zbc, zbkc);
 return gen_arith(ctx, a, EXT_NONE, gen_clmulh, NULL);
 }
 
-- 
2.17.1




[PATCH v8 09/14] target/riscv: rvk: add support for sha512 related instructions for RV32 in zknh extension

2022-03-01 Thread Weiwei Li
 - add sha512sum0r, sha512sig0l, sha512sum1r, sha512sig1l, sha512sig0h and 
sha512sig1h instructions

Co-authored-by: Zewen Ye 
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/insn32.decode  |   6 ++
 target/riscv/insn_trans/trans_rvk.c.inc | 100 
 2 files changed, 106 insertions(+)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index db28ecdd2b..02a0c71890 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -862,3 +862,9 @@ sha256sig0  00 01000 00010 . 001 . 0010011 @r2
 sha256sig1  00 01000 00011 . 001 . 0010011 @r2
 sha256sum0  00 01000 0 . 001 . 0010011 @r2
 sha256sum1  00 01000 1 . 001 . 0010011 @r2
+sha512sum0r 01 01000 . . 000 . 0110011 @r
+sha512sum1r 01 01001 . . 000 . 0110011 @r
+sha512sig0l 01 01010 . . 000 . 0110011 @r
+sha512sig0h 01 01110 . . 000 . 0110011 @r
+sha512sig1l 01 01011 . . 000 . 0110011 @r
+sha512sig1h 01 0 . . 000 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvk.c.inc 
b/target/riscv/insn_trans/trans_rvk.c.inc
index beea7f8e96..bb89a53f52 100644
--- a/target/riscv/insn_trans/trans_rvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvk.c.inc
@@ -167,3 +167,103 @@ static bool trans_sha256sum1(DisasContext *ctx, 
arg_sha256sum1 *a)
 REQUIRE_ZKNH(ctx);
 return gen_sha256(ctx, a, EXT_NONE, tcg_gen_rotri_i32, 6, 11, 25);
 }
+
+static bool gen_sha512_rv32(DisasContext *ctx, arg_r *a, DisasExtend ext,
+void (*func1)(TCGv_i64, TCGv_i64, int64_t),
+void (*func2)(TCGv_i64, TCGv_i64, int64_t),
+int64_t num1, int64_t num2, int64_t num3)
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, ext);
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_temp_new_i64();
+
+tcg_gen_concat_tl_i64(t0, src1, src2);
+func1(t1, t0, num1);
+func2(t2, t0, num2);
+tcg_gen_xor_i64(t1, t1, t2);
+tcg_gen_rotri_i64(t2, t0, num3);
+tcg_gen_xor_i64(t1, t1, t2);
+tcg_gen_trunc_i64_tl(dest, t1);
+
+gen_set_gpr(ctx, a->rd, dest);
+tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);
+return true;
+}
+
+static bool trans_sha512sum0r(DisasContext *ctx, arg_sha512sum0r *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512_rv32(ctx, a, EXT_NONE, tcg_gen_rotli_i64,
+   tcg_gen_rotli_i64, 25, 30, 28);
+}
+
+static bool trans_sha512sum1r(DisasContext *ctx, arg_sha512sum1r *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512_rv32(ctx, a, EXT_NONE, tcg_gen_rotli_i64,
+   tcg_gen_rotri_i64, 23, 14, 18);
+}
+
+static bool trans_sha512sig0l(DisasContext *ctx, arg_sha512sig0l *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512_rv32(ctx, a, EXT_NONE, tcg_gen_rotri_i64,
+   tcg_gen_rotri_i64, 1, 7, 8);
+}
+
+static bool trans_sha512sig1l(DisasContext *ctx, arg_sha512sig1l *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512_rv32(ctx, a, EXT_NONE, tcg_gen_rotli_i64,
+   tcg_gen_rotri_i64, 3, 6, 19);
+}
+
+static bool gen_sha512h_rv32(DisasContext *ctx, arg_r *a, DisasExtend ext,
+ void (*func)(TCGv_i64, TCGv_i64, int64_t),
+ int64_t num1, int64_t num2, int64_t num3)
+{
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, ext);
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_temp_new_i64();
+
+tcg_gen_concat_tl_i64(t0, src1, src2);
+func(t1, t0, num1);
+tcg_gen_ext32u_i64(t2, t0);
+tcg_gen_shri_i64(t2, t2, num2);
+tcg_gen_xor_i64(t1, t1, t2);
+tcg_gen_rotri_i64(t2, t0, num3);
+tcg_gen_xor_i64(t1, t1, t2);
+tcg_gen_trunc_i64_tl(dest, t1);
+
+gen_set_gpr(ctx, a->rd, dest);
+tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);
+return true;
+}
+
+static bool trans_sha512sig0h(DisasContext *ctx, arg_sha512sig0h *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512h_rv32(ctx, a, EXT_NONE, tcg_gen_rotri_i64, 1, 7, 8);
+}
+
+static bool trans_sha512sig1h(DisasContext *ctx, arg_sha512sig1h *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZKNH(ctx);
+return gen_sha512h_rv32(ctx, a, EXT_NONE, tcg_gen_rotli_i64, 3, 6, 19);
+}
-- 
2.17.1




[PATCH v8 01/14] target/riscv: rvk: add cfg properties for zbk* and zk*

2022-03-01 Thread Weiwei Li
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Acked-by: Alistair Francis 
---
 target/riscv/cpu.c | 23 +++
 target/riscv/cpu.h | 13 +
 2 files changed, 36 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ddda4906ff..9e8bbce6f1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -592,6 +592,29 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 cpu->cfg.ext_zfinx = true;
 }
 
+if (cpu->cfg.ext_zk) {
+cpu->cfg.ext_zkn = true;
+cpu->cfg.ext_zkr = true;
+cpu->cfg.ext_zkt = true;
+}
+
+if (cpu->cfg.ext_zkn) {
+cpu->cfg.ext_zbkb = true;
+cpu->cfg.ext_zbkc = true;
+cpu->cfg.ext_zbkx = true;
+cpu->cfg.ext_zkne = true;
+cpu->cfg.ext_zknd = true;
+cpu->cfg.ext_zknh = true;
+}
+
+if (cpu->cfg.ext_zks) {
+cpu->cfg.ext_zbkb = true;
+cpu->cfg.ext_zbkc = true;
+cpu->cfg.ext_zbkx = true;
+cpu->cfg.ext_zksed = true;
+cpu->cfg.ext_zksh = true;
+}
+
 /* Set the ISA extensions, checks should have happened above */
 if (cpu->cfg.ext_i) {
 ext |= RVI;
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 9ba05042ed..ef4de326f2 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -355,7 +355,20 @@ struct RISCVCPUConfig {
 bool ext_zba;
 bool ext_zbb;
 bool ext_zbc;
+bool ext_zbkb;
+bool ext_zbkc;
+bool ext_zbkx;
 bool ext_zbs;
+bool ext_zk;
+bool ext_zkn;
+bool ext_zknd;
+bool ext_zkne;
+bool ext_zknh;
+bool ext_zkr;
+bool ext_zks;
+bool ext_zksed;
+bool ext_zksh;
+bool ext_zkt;
 bool ext_counters;
 bool ext_ifencei;
 bool ext_icsr;
-- 
2.17.1




[PATCH v8 04/14] target/riscv: rvk: add support for zbkx extension

2022-03-01 Thread Weiwei Li
 - add xperm4 and xperm8 instructions

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/bitmanip_helper.c  | 27 +
 target/riscv/helper.h   |  2 ++
 target/riscv/insn32.decode  |  4 
 target/riscv/insn_trans/trans_rvb.c.inc | 18 +
 4 files changed, 51 insertions(+)

diff --git a/target/riscv/bitmanip_helper.c b/target/riscv/bitmanip_helper.c
index e003e8b25b..b99c4a39a1 100644
--- a/target/riscv/bitmanip_helper.c
+++ b/target/riscv/bitmanip_helper.c
@@ -102,3 +102,30 @@ target_ulong HELPER(zip)(target_ulong rs1)
 x = do_shuf_stage(x, shuf_masks[0], shuf_masks[0] >> 1, 1);
 return x;
 }
+
+static inline target_ulong do_xperm(target_ulong rs1, target_ulong rs2,
+uint32_t sz_log2)
+{
+target_ulong r = 0;
+target_ulong sz = 1LL << sz_log2;
+target_ulong mask = (1LL << sz) - 1;
+target_ulong pos;
+
+for (int i = 0; i < TARGET_LONG_BITS; i += sz) {
+pos = ((rs2 >> i) & mask) << sz_log2;
+if (pos < sizeof(target_ulong) * 8) {
+r |= ((rs1 >> pos) & mask) << i;
+}
+}
+return r;
+}
+
+target_ulong HELPER(xperm4)(target_ulong rs1, target_ulong rs2)
+{
+return do_xperm(rs1, rs2, 2);
+}
+
+target_ulong HELPER(xperm8)(target_ulong rs1, target_ulong rs2)
+{
+return do_xperm(rs1, rs2, 3);
+}
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7331d32dbf..a1d28b257f 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -69,6 +69,8 @@ DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_1(brev8, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(unzip, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(zip, TCG_CALL_NO_RWG_SE, tl, tl)
+DEF_HELPER_FLAGS_2(xperm4, TCG_CALL_NO_RWG_SE, tl, tl, tl)
+DEF_HELPER_FLAGS_2(xperm8, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
 /* Floating Point - Half Precision */
 DEF_HELPER_FLAGS_3(fadd_h, TCG_CALL_NO_RWG, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 3a49acab37..75ffac9c81 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -776,6 +776,10 @@ clmulh 101 .. 011 . 0110011 @r
 # *** RV32 extra Zbc Standard Extension ***
 clmulr 101 .. 010 . 0110011 @r
 
+# *** RV32 Zbkx Standard Extension ***
+xperm4 0010100 .. 010 . 0110011 @r
+xperm8 0010100 .. 100 . 0110011 @r
+
 # *** RV32 Zbs Standard Extension ***
 bclr   0100100 .. 001 . 0110011 @r
 bclri  01001. ... 001 . 0010011 @sh
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 1980bfe971..54927ba763 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -48,6 +48,12 @@
 }\
 } while (0)
 
+#define REQUIRE_ZBKX(ctx) do {   \
+if (!ctx->cfg_ptr->ext_zbkx) {   \
+return false;\
+}\
+} while (0)
+
 static void gen_clz(TCGv ret, TCGv arg1)
 {
 tcg_gen_clzi_tl(ret, arg1, TARGET_LONG_BITS);
@@ -574,3 +580,15 @@ static bool trans_zip(DisasContext *ctx, arg_zip *a)
 REQUIRE_ZBKB(ctx);
 return gen_unary(ctx, a, EXT_NONE, gen_helper_zip);
 }
+
+static bool trans_xperm4(DisasContext *ctx, arg_xperm4 *a)
+{
+REQUIRE_ZBKX(ctx);
+return gen_arith(ctx, a, EXT_NONE, gen_helper_xperm4, NULL);
+}
+
+static bool trans_xperm8(DisasContext *ctx, arg_xperm8 *a)
+{
+REQUIRE_ZBKX(ctx);
+return gen_arith(ctx, a, EXT_NONE, gen_helper_xperm8, NULL);
+}
-- 
2.17.1




[PATCH v8 06/14] target/riscv: rvk: add support for zknd/zkne extension in RV32

2022-03-01 Thread Weiwei Li
 - add aes32esmi, aes32esi, aes32dsmi and aes32dsi instructions

Co-authored-by: Zewen Ye 
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/crypto_helper.c| 105 
 target/riscv/helper.h   |   6 ++
 target/riscv/insn32.decode  |  11 +++
 target/riscv/insn_trans/trans_rvk.c.inc |  67 +++
 target/riscv/meson.build|   3 +-
 target/riscv/translate.c|   1 +
 6 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/crypto_helper.c
 create mode 100644 target/riscv/insn_trans/trans_rvk.c.inc

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
new file mode 100644
index 00..220d51c742
--- /dev/null
+++ b/target/riscv/crypto_helper.c
@@ -0,0 +1,105 @@
+/*
+ * RISC-V Crypto Emulation Helpers for QEMU.
+ *
+ * Copyright (c) 2021 Ruibo Lu, luruibo2...@163.com
+ * Copyright (c) 2021 Zewen Ye, lust...@foxmail.com
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "crypto/aes.h"
+#include "crypto/sm4.h"
+
+#define AES_XTIME(a) \
+((a << 1) ^ ((a & 0x80) ? 0x1b : 0))
+
+#define AES_GFMUL(a, b) (( \
+(((b) & 0x1) ? (a) : 0) ^ \
+(((b) & 0x2) ? AES_XTIME(a) : 0) ^ \
+(((b) & 0x4) ? AES_XTIME(AES_XTIME(a)) : 0) ^ \
+(((b) & 0x8) ? AES_XTIME(AES_XTIME(AES_XTIME(a))) : 0)) & 0xFF)
+
+static inline uint32_t aes_mixcolumn_byte(uint8_t x, bool fwd)
+{
+uint32_t u;
+
+if (fwd) {
+u = (AES_GFMUL(x, 3) << 24) | (x << 16) | (x << 8) |
+(AES_GFMUL(x, 2) << 0);
+} else {
+u = (AES_GFMUL(x, 0xb) << 24) | (AES_GFMUL(x, 0xd) << 16) |
+(AES_GFMUL(x, 0x9) << 8) | (AES_GFMUL(x, 0xe) << 0);
+}
+return u;
+}
+
+#define sext32_xlen(x) (target_ulong)(int32_t)(x)
+
+static inline target_ulong aes32_operation(target_ulong shamt,
+   target_ulong rs1, target_ulong rs2,
+   bool enc, bool mix)
+{
+uint8_t si = rs2 >> shamt;
+uint8_t so;
+uint32_t mixed;
+target_ulong res;
+
+if (enc) {
+so = AES_sbox[si];
+if (mix) {
+mixed = aes_mixcolumn_byte(so, true);
+} else {
+mixed = so;
+}
+} else {
+so = AES_isbox[si];
+if (mix) {
+mixed = aes_mixcolumn_byte(so, false);
+} else {
+mixed = so;
+}
+}
+mixed = rol32(mixed, shamt);
+res = rs1 ^ mixed;
+
+return sext32_xlen(res);
+}
+
+target_ulong HELPER(aes32esmi)(target_ulong rs1, target_ulong rs2,
+   target_ulong shamt)
+{
+return aes32_operation(shamt, rs1, rs2, true, true);
+}
+
+target_ulong HELPER(aes32esi)(target_ulong rs1, target_ulong rs2,
+  target_ulong shamt)
+{
+return aes32_operation(shamt, rs1, rs2, true, false);
+}
+
+target_ulong HELPER(aes32dsmi)(target_ulong rs1, target_ulong rs2,
+   target_ulong shamt)
+{
+return aes32_operation(shamt, rs1, rs2, false, true);
+}
+
+target_ulong HELPER(aes32dsi)(target_ulong rs1, target_ulong rs2,
+  target_ulong shamt)
+{
+return aes32_operation(shamt, rs1, rs2, false, false);
+}
+#undef sext32_xlen
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a1d28b257f..d31bfadb3e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1115,3 +1115,9 @@ DEF_HELPER_5(divu_i128, tl, env, tl, tl, tl, tl)
 DEF_HELPER_5(divs_i128, tl, env, tl, tl, tl, tl)
 DEF_HELPER_5(remu_i128, tl, env, tl, tl, tl, tl)
 DEF_HELPER_5(rems_i128, tl, env, tl, tl, tl, tl)
+
+/* Crypto functions */
+DEF_HELPER_FLAGS_3(aes32esmi, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
+DEF_HELPER_FLAGS_3(aes32esi, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
+DEF_HELPER_FLAGS_3(aes32dsmi, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
+DEF_HELPER_FLAGS_3(aes32dsi, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 75ffac9c81..0f2e661583 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -35,6 +35,7 @@
 %imm_b31:s1 7:1 25:6 8:4 !function=ex_shift_1
 %imm_j31:s1 12:8 20:1 21:10  !function=ex_s

  1   2   3   4   >