[PATCH] migration/multifd: Ensure packet->ramblock is null-terminated

2024-09-19 Thread Fabiano Rosas
Coverity points out that the current usage of strncpy to write the
ramblock name allows the field to not have an ending '\0' in case
idstr is already not null-terminated (e.g. if it's larger than 256
bytes).

This is currently harmless because the packet->ramblock field is never
touched again on the source side. The destination side reads only up
to the field's size from the stream and forces the last byte to be 0.

We're still open to a programming error in the future in case this
field is ever passed into a function that expects a null-terminated
string.

Change from strncpy to QEMU's pstrcpy, which puts a '\0' at the end of
the string and doesn't fill the extra space with zeros.

(there's no spillage between iterations of fill_packet because after
commit 87bb9e953e ("migration/multifd: Isolate ram pages packet data")
the packet is always zeroed before filling)

Resolves: Coverity CID 1560071
Reported-by: Peter Maydell 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-nocomp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 07c63f4a72..55191152f9 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -17,6 +17,7 @@
 #include "multifd.h"
 #include "options.h"
 #include "qapi/error.h"
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -201,7 +202,8 @@ void multifd_ram_fill_packet(MultiFDSendParams *p)
 packet->zero_pages = cpu_to_be32(zero_num);
 
 if (pages->block) {
-strncpy(packet->ramblock, pages->block->idstr, 256);
+pstrcpy(packet->ramblock, sizeof(packet->ramblock),
+pages->block->idstr);
 }
 
 for (int i = 0; i < pages->num; i++) {
-- 
2.35.3




Re: [PULL v2 0/6] Migration 20240917 patches

2024-09-19 Thread Fabiano Rosas
Peter Maydell  writes:

> On Thu, 19 Sept 2024 at 12:59, Peter Xu  wrote:
>>
>> On Thu, Sep 19, 2024 at 10:08:25AM +0100, Peter Maydell wrote:
>> > Thanks for looking at the issues with the migration tests.
>> > This run went through first time without my needing to retry any
>> > jobs, so fingers crossed that we have at least improved the reliability.
>> > (I have a feeling there's still something funny with the k8s runners,
>> > but that's not migration-test specific, it's just that test tends
>> > to be the longest running and so most likely to be affected.)
>>
>> Kudos all go to Fabiano for debugging the hard problem.
>>
>> And yes, please let either of us know if it fails again, we can either keep
>> looking, or still can disable it when necessary (if it takes long to debug).
>
> On the subject of potential races in the migration code,
> there's a couple of outstanding Coverity issues that might
> be worth looking at. If they're false-positives let me know
> and I can reclassify them in Coverity.
>
> CID 1527402: In migrate_fd_cleanup() Coverity thinks there's
> a race because we read s->to_dst_file in the "if (s->to_dst_file)"
> check without holding the qemu_file_lock. This might be a
> false-positive because the race Coverity identifies happens
> if two threads both call migrate_fd_cleanup() at the same
> time, which is probably not permitted. (But OTOH taking a
> mutex gets you for free any necessary memory barriers...)

Yes, we shouldn't rely on mental gymnastics to prove that there's no
concurrent access.

@peterx, that RH bug you showed me could very well be caused by this
race, except that I don't see how fd_cleanup could race with
itself. Just having the lock would probably save us time even thinking
about it.

>
> CID 1527413: In postcopy_pause_incoming() we read
> mis->postcopy_qemufile_dst without holding the
> postcopy_prio_thread_mutex which we use to protect the write
> to that field, so Coverity thinks there's a race if two
> threads call this function at once.

At first sight, it seems like a real problem. We did a good pass on
these races on the source side, but the destination side hasn't been
investigated yet.

Unfortunately, these QEMUFile races are not trivial to fix due to
several design pain points, such as:

- the QEMUFile pointer validity being sometimes used to imply no error
  has happened before;

- the various shutdown() calls that serve both as a way to kick a read()
  that's stuck, but also to cause some other part of the code to realise
  there has been an error (due to the point above);

- the yank feature which has weird semantics regarding whether it
  operates on an iochannel or qemufile;

- migrate_fd_cancel() that _can_ run concurrently with anything else;

- the need to ensure the other end of migration also reacts to
  error/cancel on this side;

>
> (The only other migration Coverity issue is CID 1560071,
> which is the "better to use pstrcpy()" not-really-a-bug
> we discussed in another thread.)
>
> thanks
> -- PMM



Re: [PATCH 2/3] migration: Remove unused zero-blocks capability

2024-09-19 Thread Fabiano Rosas
#x27;zero-blocks'"}}
>>> 
>>> If we had somehow rejected the capability when it made no sense,
>>> removing it now it never makes sense would be obviously fine.
>>> 
>>> The straight & narrow path is to deprecate now, remove later.
>>
>> I wonder whether we can make this one simpler, as IIUC this cap depends on
>> the block migration feature, which properly went through the deprecation
>> process and got removed in the previous release.
>>
>> IOW, currently QEMU behaves the same with this cap on/off, ignoring it
>> completely.  I think it means the deprecation message (even if we provide
>> some for two extra releases..) wouldn't be anything helpful as anyone who
>> uses this feature already got affected before this patch.. this feature,
>> together with block migration, are simply all gone already?
>
> We break compatibility for users who supply capability @zero-blocks even
> though they are not using block migration.
>
> Before this patch, the capability is silently ignored.
>
> Afterwards, we reject it.
>
> This harmless misuse was *not* affected by our prior removal of block
> migration.
>
> It *is* affected by the proposed removal of the capability.

How does this policy_skip thing works? Could we automatically warn
whenever a capability has the 'deprecated' feature in migration.json?

Also, some of the incompatibility errors in migrate_caps_check() could
be simplified with something like a new:
'features': [ 'conflicts': [ 'cap1', 'cap2' ] ]
to indicate which caps are incompatible between themselves.

>
> We either treat this in struct accordance to our rules: deprecate now,
> remove later.  Or we bend our them:
>
>>> If we believe nothing relies on it, we can bend the rules and remove
>>> right away.
>
> Not for me to decide.
>

I'm fine either way, but in any case:

-- >8 --
>From 3ff313a52e37b8cb407c900d7a1aa266560aebb7 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas 
Date: Thu, 19 Sep 2024 09:49:44 -0300
Subject: [PATCH] migration: Deprecate zero-blocks capability

The zero-blocks capability was meant to be used along with the block
migration, which has been removed already in commit eef0bae3a7
("migration: Remove block migration").

Setting zero-blocks is currently a noop, but the outright removal of
the capability would cause and error in case some users are still
setting it. Put the capability through the deprecation process.

Signed-off-by: Fabiano Rosas 
---
 docs/about/deprecated.rst | 6 ++
 migration/options.c   | 4 
 qapi/migration.json   | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index ed31d4b0b2..47cabb6fcc 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -476,3 +476,9 @@ usage of providing a file descriptor to a plain file has 
been
 deprecated in favor of explicitly using the ``file:`` URI with the
 file descriptor being passed as an ``fdset``. Refer to the ``add-fd``
 command documentation for details on the ``fdset`` usage.
+
+``zero-blocks`` capability (since 9.2)
+''''''''''''''''''''''''''''''''''''''
+
+The ``zero-blocks`` capability was part of the block migration which
+doesn't exist anymore since it was removed in QEMU v9.1.
diff --git a/migration/options.c b/migration/options.c
index 147cd2b8fd..b828bad0d9 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -457,6 +457,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 ERRP_GUARD();
 MigrationIncomingState *mis = migration_incoming_get_current();
 
+if (new_caps[MIGRATION_CAPABILITY_ZERO_BLOCKS]) {
+warn_report("zero-blocks capability is deprecated");
+}
+
 #ifndef CONFIG_REPLICATION
 if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
 error_setg(errp, "QEMU compiled without replication module"
diff --git a/qapi/migration.json b/qapi/migration.json
index b66cccf107..3af6aa1740 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -479,11 +479,14 @@
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
+# @deprecated: Member @zero-blocks is deprecated as being part of
+# block migration which was already removed.
 #
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
+   { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
'events', 'postcopy-ram',
{ 'name': 'x-colo', 'features': [ 'unstable' ] },
'release-ram',
-- 
2.35.3




Re: [PATCH 3/3] migration: Remove unused socket_send_channel_create_sync

2024-09-19 Thread Fabiano Rosas
d...@treblig.org writes:

> From: "Dr. David Alan Gilbert" 
>
> socket_send_channel_create_sync only use was removed by
>   d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
>
> Remove it.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Fabiano Rosas 



Re: [PATCH 1/3] migration: Remove migrate_cap_set

2024-09-19 Thread Fabiano Rosas
d...@treblig.org writes:

> From: "Dr. David Alan Gilbert" 
>
> migrate_cap_set has been unused since
>   18d154f575 ("migration: Remove 'blk/-b' option from migrate commands")
>
> Remove it.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Fabiano Rosas 



Re: [PATCH v1 3/7] qapi/migration: Introduce the iteration-count

2024-09-18 Thread Fabiano Rosas
Yong Huang  writes:

> On Tue, Sep 17, 2024 at 4:35 AM Fabiano Rosas  wrote:
>
>> Hyman Huang  writes:
>>
>> > The original migration information dirty-sync-count could
>> > no longer reflect iteration count due to the introduction
>> > of background synchronization in the next commit;
>> > add the iteration count to compensate.
>>
>> I agree with the overall idea, but I feel we're lacking some information
>> on what determines whether some of the lines below want to use the
>> iteration count vs. the dirty sync count. Since this patch increments
>> both variables at the same place, they can still be used interchangeably
>> unless we add some words to explain the distinction.
>>
>> So to clarify:
>>
>> What do we call an iteration? A call to save_live_iterate(),
>> migration_iteration_run() or something else?
>>
>> Why dirty-sync-count should ever have reflected "iteration count"? It
>> might have been this way by coincidence, but did we ever used it in that
>> sense (aside from info migrate maybe)?
>>
>
> Unfortunately, I found that Libvirt already regard the "dirty-sync-count"
> as the "iteration count", so if we substitute the "dirty-sync-count"
> with "iteration count" to represent its original meaning, this could
> break the backward compatibility.
>
> To avoid this side effect, we may keep the "dirty-sync-count" as its
> original meaning and introduce a new field like "dirty-sync-count-internal"
> to represent the *real* "dirty-sync-count"?
>
> diff --git a/migration/migration.c b/migration/migration.c
> index f97f6352d2..663315d7e6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1196,8 +1196,9 @@ static void populate_ram_info(MigrationInfo *info,
> MigrationState *s)
>  info->ram->normal_bytes = info->ram->normal * page_size;
>  info->ram->mbps = s->mbps;
>  info->ram->dirty_sync_count =
> +stat64_get(&mig_stats.iteration_count);

ok

> +info->ram->dirty_sync_count_internal =
>  stat64_get(&mig_stats.dirty_sync_count);

Does this need to be exposed at all? If it does then it'll need a name
that doesn't have "internal" in it.

> -info->ram->iteration_count = stat64_get(&mig_stats.iteration_count);
>  info->ram->dirty_sync_missed_zero_copy =
>  stat64_get(&mig_stats.dirty_sync_missed_zero_copy);
>  info->ram->postcopy_requests =
>
>
>>
>> With the new counter, what kind of meaning can a user extract from that
>> number aside from "some undescribed thing happened N times" (this might
>> be included in the migration.json docs)?
>>
>> >
>> > Signed-off-by: Hyman Huang 
>> > ---
>> >  migration/migration-stats.h  |  4 
>> >  migration/migration.c|  1 +
>> >  migration/ram.c  | 12 
>> >  qapi/migration.json  |  6 +-
>> >  tests/qtest/migration-test.c |  2 +-
>> >  5 files changed, 19 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/migration/migration-stats.h b/migration/migration-stats.h
>> > index 05290ade76..43ee0f4f05 100644
>> > --- a/migration/migration-stats.h
>> > +++ b/migration/migration-stats.h
>> > @@ -50,6 +50,10 @@ typedef struct {
>> >   * Number of times we have synchronized guest bitmaps.
>> >   */
>> >  Stat64 dirty_sync_count;
>> > +/*
>> > + * Number of migration iteration processed.
>> > + */
>> > +Stat64 iteration_count;
>> >  /*
>> >   * Number of times zero copy failed to send any page using zero
>> >   * copy.
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index 3dea06d577..055d527ff6 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -1197,6 +1197,7 @@ static void populate_ram_info(MigrationInfo *info,
>> MigrationState *s)
>> >  info->ram->mbps = s->mbps;
>> >  info->ram->dirty_sync_count =
>> >  stat64_get(&mig_stats.dirty_sync_count);
>> > +info->ram->iteration_count = stat64_get(&mig_stats.iteration_count);
>> >  info->ram->dirty_sync_missed_zero_copy =
>> >  stat64_get(&mig_stats.dirty_sync_missed_zero_copy);
>> >  info->ram->postcopy_requests =
>> > diff --git a/migration/ram.c 

Re: [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-17 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Sep 17, 2024 at 03:58:02PM -0300, Fabiano Rosas wrote:
>> Fix a segmentation fault in multifd when rb->receivedmap is cleared
>> too early.
>> 
>> After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
>> multiple page faults"), multifd started using the rb->receivedmap
>> bitmap, which belongs to ram.c and is initialized and *freed* from the
>> ram SaveVMHandlers.
>> 
>> Multifd threads are live until migration_incoming_state_destroy(),
>> which is called after qemu_loadvm_state_cleanup(), leading to a crash
>> when accessing rb->receivedmap.
>> 
>> process_incoming_migration_co()...
>>   qemu_loadvm_state()  multifd_nocomp_recv()
>> qemu_loadvm_state_cleanup()  ramblock_recv_bitmap_set_offset()
>>   rb->receivedmap = NULL   set_bit_atomic(..., 
>> rb->receivedmap)
>>   ...
>>   migration_incoming_state_destroy()
>> multifd_recv_cleanup()
>>   multifd_recv_terminate_threads(NULL)
>> 
>> Move the loadvm cleanup into migration_incoming_state_destroy(), after
>> multifd_recv_cleanup() to ensure multifd threads have already exited
>> when rb->receivedmap is cleared.
>> 
>> Adjust the postcopy listen thread comment to indicate that we still
>> want to skip the cpu synchronization.
>> 
>> CC: qemu-sta...@nongnu.org
>> Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page 
>> faults")
>> Signed-off-by: Fabiano Rosas 
>
> Reviewed-by: Peter Xu 
>
> One trivial question below..
>
>> ---
>>  migration/migration.c | 1 +
>>  migration/savevm.c| 6 --
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3dea06d577..b190a574b1 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
>>  struct MigrationIncomingState *mis = migration_incoming_get_current();
>>  
>>  multifd_recv_cleanup();
>
> Would you mind I add a comment squashed here when queue?
>
>/*
> * RAM state cleanup needs to happen after multifd cleanup, because
> * multifd threads can use some of its states (receivedmap).
> */

Yeah, that's ok.

>
>> +qemu_loadvm_state_cleanup();
>>  
>>  if (mis->to_src_file) {
>>  /* Tell source that we are done */
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d0759694fd..7e1e27182a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2979,7 +2979,10 @@ int qemu_loadvm_state(QEMUFile *f)
>>  trace_qemu_loadvm_state_post_main(ret);
>>  
>>  if (mis->have_listen_thread) {
>> -/* Listen thread still going, can't clean up yet */
>> +/*
>> + * Postcopy listen thread still going, don't synchronize the
>> + * cpus yet.
>> + */
>>  return ret;
>>  }
>>  
>> @@ -3022,7 +3025,6 @@ int qemu_loadvm_state(QEMUFile *f)
>>  }
>>  }
>>  
>> -qemu_loadvm_state_cleanup();
>>  cpu_synchronize_all_post_init();
>>  
>>  return ret;
>> -- 
>> 2.35.3
>> 



Re: [PATCH 1/2] migration/savevm: Remove extra load cleanup calls

2024-09-17 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Sep 17, 2024 at 03:58:01PM -0300, Fabiano Rosas wrote:
>> There are two qemu_loadvm_state_cleanup() calls that were introduced
>> when qemu_loadvm_state_setup() was still called before loading the
>> configuration section, so there was state to be cleaned up if the
>> header checks failed.
>> 
>> However, commit 9e14b84908 ("migration/savevm: load_header before
>> load_setup") has moved that configuration section part to
>> qemu_loadvm_state_header() which now happens before
>> qemu_loadvm_state_setup().
>> 
>> Remove the cleanup calls that are now misplaced.
>> 
>> Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
>
> Considering it's a cleanup, do you mind if I further remove this Fixes but
> just mention it in commit message (so as to make Michael's life easier when
> backport)?

Sure, go ahead.

>
>> Reviewed-by: Peter Xu 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/savevm.c | 2 --
>>  1 file changed, 2 deletions(-)
>> 
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d500eae979..d0759694fd 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
>>  if (migrate_get_current()->send_configuration) {
>>  if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
>>  error_report("Configuration section missing");
>> -qemu_loadvm_state_cleanup();
>>  return -EINVAL;
>>  }
>>  ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 
>> 0);
>>  
>>  if (ret) {
>> -qemu_loadvm_state_cleanup();
>>  return ret;
>>  }
>>  }
>> -- 
>> 2.35.3
>> 



[PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-17 Thread Fabiano Rosas
Fix a segmentation fault in multifd when rb->receivedmap is cleared
too early.

After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
multiple page faults"), multifd started using the rb->receivedmap
bitmap, which belongs to ram.c and is initialized and *freed* from the
ram SaveVMHandlers.

Multifd threads are live until migration_incoming_state_destroy(),
which is called after qemu_loadvm_state_cleanup(), leading to a crash
when accessing rb->receivedmap.

process_incoming_migration_co()...
  qemu_loadvm_state()  multifd_nocomp_recv()
qemu_loadvm_state_cleanup()  ramblock_recv_bitmap_set_offset()
  rb->receivedmap = NULL   set_bit_atomic(..., rb->receivedmap)
  ...
  migration_incoming_state_destroy()
multifd_recv_cleanup()
  multifd_recv_terminate_threads(NULL)

Move the loadvm cleanup into migration_incoming_state_destroy(), after
multifd_recv_cleanup() to ensure multifd threads have already exited
when rb->receivedmap is cleared.

Adjust the postcopy listen thread comment to indicate that we still
want to skip the cpu synchronization.

CC: qemu-sta...@nongnu.org
Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page 
faults")
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 1 +
 migration/savevm.c| 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..b190a574b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 multifd_recv_cleanup();
+qemu_loadvm_state_cleanup();
 
 if (mis->to_src_file) {
 /* Tell source that we are done */
diff --git a/migration/savevm.c b/migration/savevm.c
index d0759694fd..7e1e27182a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2979,7 +2979,10 @@ int qemu_loadvm_state(QEMUFile *f)
 trace_qemu_loadvm_state_post_main(ret);
 
 if (mis->have_listen_thread) {
-/* Listen thread still going, can't clean up yet */
+/*
+ * Postcopy listen thread still going, don't synchronize the
+ * cpus yet.
+ */
 return ret;
 }
 
@@ -3022,7 +3025,6 @@ int qemu_loadvm_state(QEMUFile *f)
 }
 }
 
-qemu_loadvm_state_cleanup();
 cpu_synchronize_all_post_init();
 
 return ret;
-- 
2.35.3




[PATCH 1/2] migration/savevm: Remove extra load cleanup calls

2024-09-17 Thread Fabiano Rosas
There are two qemu_loadvm_state_cleanup() calls that were introduced
when qemu_loadvm_state_setup() was still called before loading the
configuration section, so there was state to be cleaned up if the
header checks failed.

However, commit 9e14b84908 ("migration/savevm: load_header before
load_setup") has moved that configuration section part to
qemu_loadvm_state_header() which now happens before
qemu_loadvm_state_setup().

Remove the cleanup calls that are now misplaced.

Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/savevm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index d500eae979..d0759694fd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
 if (migrate_get_current()->send_configuration) {
 if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
 error_report("Configuration section missing");
-qemu_loadvm_state_cleanup();
 return -EINVAL;
 }
 ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
 
 if (ret) {
-qemu_loadvm_state_cleanup();
 return ret;
 }
 }
-- 
2.35.3




[PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-17 Thread Fabiano Rosas
v2: Keep skipping the cpu_synchronize_all_post_init() call if the
postcopy listen thread is live. Don't copy stable on the first patch.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1457418838

v1:
https://lore.kernel.org/r/20240913220542.18305-1-faro...@suse.de

This fixes the crash we've been seing recently in migration-test. The
first patch is a cleanup to have only one place calling
qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
calls to make multifd_recv_cleanup() run first and stop the recv
threads.

Fabiano Rosas (2):
  migration/savevm: Remove extra load cleanup calls
  migration/multifd: Fix rb->receivedmap cleanup race

 migration/migration.c | 1 +
 migration/savevm.c| 8 
 2 files changed, 5 insertions(+), 4 deletions(-)

-- 
2.35.3




Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array

2024-09-17 Thread Fabiano Rosas
Peter Xu  writes:

> Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
> was a separate discussion, however during that I found a regression of
> dirty sync slowness when profiling.
>
> Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
> statically allocated to be the max supported by the kernel.  However after
> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
> the max supported memslots reported now grows to some number large enough
> so that it may not be wise to always statically allocate with the max
> reported.
>
> What's worse, QEMU kvm code still walks all the allocated memslots entries
> to do any form of lookups.  It can drastically slow down all memslot
> operations because each of such loop can run over 32K times on the new
> kernels.
>
> Fix this issue by making the memslots to be allocated dynamically.
>
> Here the initial size was set to 16 because it should cover the basic VM
> usages, so that the hope is the majority VM use case may not even need to
> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
> it'll consume 9 memslots), however not too large to waste memory.
>
> There can also be even better way to address this, but so far this is the
> simplest and should be already better even than before we grow the max
> supported memslots.  For example, in the case of above issue when VFIO was
> attached on a 32GB system, there are only ~10 memslots used.  So it could
> be good enough as of now.
>
> In the above VFIO context, measurement shows that the precopy dirty sync
> shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
> to any KVM enabled VM even without VFIO.
>
> NOTE: we don't have a FIXES tag for this patch because there's no real
> commit that regressed this in QEMU. Such behavior existed for a long time,
> but only start to be a problem when the kernel reports very large
> nr_slots_max value.  However that's pretty common now (the kernel change
> was merged in 2021) so we attached cc:stable because we'll want this change
> to be backported to stable branches.
>
> Cc: qemu-stable 
> Reported-by: Zhiyi Guo 
> Tested-by: Zhiyi Guo 
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas 



Re: [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-17 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Sep 13, 2024 at 07:05:42PM -0300, Fabiano Rosas wrote:
>> Fix a segmentation fault in multifd when rb->receivedmap is cleared
>> too early.
>> 
>> After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
>> multiple page faults"), multifd started using the rb->receivedmap
>> bitmap, which belongs to ram.c and is initialized and *freed* from the
>> ram SaveVMHandlers.
>> 
>> Multifd threads are live until migration_incoming_state_destroy(),
>> which is called after qemu_loadvm_state_cleanup(), leading to a crash
>> when accessing rb->receivedmap.
>> 
>> process_incoming_migration_co()...
>>   qemu_loadvm_state()  multifd_nocomp_recv()
>> qemu_loadvm_state_cleanup()  ramblock_recv_bitmap_set_offset()
>>   rb->receivedmap = NULL   set_bit_atomic(..., 
>> rb->receivedmap)
>>   ...
>>   migration_incoming_state_destroy()
>> multifd_recv_cleanup()
>>   multifd_recv_terminate_threads(NULL)
>> 
>> Move the loadvm cleanup into migration_incoming_state_destroy(), after
>> multifd_recv_cleanup() to ensure multifd thread have already exited
>> when rb->receivedmap is cleared.
>> 
>> The have_listen_thread logic can now be removed because its purpose
>> was to delay cleanup until postcopy_ram_listen_thread() had finished.
>> 
>> CC: qemu-sta...@nongnu.org
>> Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page 
>> faults")
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/migration.c | 1 +
>>  migration/migration.h | 1 -
>>  migration/savevm.c| 9 -
>>  3 files changed, 1 insertion(+), 10 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3dea06d577..b190a574b1 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
>>  struct MigrationIncomingState *mis = migration_incoming_get_current();
>>  
>>  multifd_recv_cleanup();
>> +qemu_loadvm_state_cleanup();
>>  
>>  if (mis->to_src_file) {
>>  /* Tell source that we are done */
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 38aa1402d5..20b0a5b66e 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -101,7 +101,6 @@ struct MigrationIncomingState {
>>  /* Set this when we want the fault thread to quit */
>>  bool   fault_thread_quit;
>>  
>> -bool   have_listen_thread;
>>  QemuThread listen_thread;
>>  
>>  /* For the kernel to send us notifications */
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d0759694fd..532ee5e4b0 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2076,10 +2076,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>   * got a bad migration state).
>>   */
>>  migration_incoming_state_destroy();
>> -qemu_loadvm_state_cleanup();
>>  
>>  rcu_unregister_thread();
>> -mis->have_listen_thread = false;
>>  postcopy_state_set(POSTCOPY_INCOMING_END);
>>  
>>  object_unref(OBJECT(migr));
>> @@ -2130,7 +2128,6 @@ static int 
>> loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>>  return -1;
>>  }
>>  
>> -mis->have_listen_thread = true;
>>  postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen",
>> postcopy_ram_listen_thread, 
>> QEMU_THREAD_DETACHED);
>>  trace_loadvm_postcopy_handle_listen("return");
>> @@ -2978,11 +2975,6 @@ int qemu_loadvm_state(QEMUFile *f)
>>  
>>  trace_qemu_loadvm_state_post_main(ret);
>>  
>> -if (mis->have_listen_thread) {
>> -/* Listen thread still going, can't clean up yet */
>> -return ret;
>> -}
>
> Hmm, I wonder whether we would still need this.  IIUC it's not only about
> cleanup, but also that when postcopy is involved, dst QEMU postpones doing
> any of the rest in the qemu_loadvm_state_main() call.
>
> E.g. cpu put, aka, cpu_synchronize_all_post_init(), is also done in
> loadvm_postcopy_handle_run_bh() later.
>
> IOW, I'd then expect when this patch applied we'll put cpu twice?
>
> I think the should_send_vmdesc() part is fine, as it returns false for
> postcopy anyway.  However not sure on the cpu post_init above.

I'm not sure either, but there's several ioctls in there, so it's
probably better to skip them. I'll keep the have_listen and adjust the
comment.



Re: [PATCH 1/2] migration/savevm: Remove extra load cleanup calls

2024-09-17 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Sep 13, 2024 at 07:05:41PM -0300, Fabiano Rosas wrote:
>> There are two qemu_loadvm_state_cleanup() calls that were introduced
>> when qemu_loadvm_state_setup() was still called before loading the
>> configuration section, so there was state to be cleaned up if the
>> header checks failed.
>> 
>> However, commit 9e14b84908 ("migration/savevm: load_header before
>> load_setup") has moved that configuration section part to
>> qemu_loadvm_state_header() which now happens before
>> qemu_loadvm_state_setup().
>> 
>> Remove the cleanup calls that are now misplaced.
>> 
>> CC: qemu-sta...@nongnu.org
>> Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
>> Signed-off-by: Fabiano Rosas 
>
> Reviewed-by: Peter Xu 
>
> We don't need to copy stable, am I right?  IIUC it's a good cleanup,
> however not a bug fix, as qemu_loadvm_state_cleanup() can be invoked
> without calling _setup() safely?

Hm, I think you're right. If we fail in the header part the multifd
threads will still be waiting for the ram code to release them.



Re: [PATCH v1 1/7] migration: Introduce structs for background sync

2024-09-16 Thread Fabiano Rosas
Hyman Huang  writes:

> shadow_bmap, iter_bmap and iter_dirty_pages are introduced
> to satisfy the need for background sync.
>
> Meanwhile, introduce enumeration of sync method.
>
> Signed-off-by: Hyman Huang 
> ---
>  include/exec/ramblock.h | 45 +
>  migration/ram.c |  6 ++
>  2 files changed, 51 insertions(+)
>
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 0babd105c0..0e327bc0ae 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -24,6 +24,30 @@
>  #include "qemu/rcu.h"
>  #include "exec/ramlist.h"
>  
> +/* Possible bits for cpu_physical_memory_sync_dirty_bitmap */
> +
> +/*
> + * The old-fashioned sync, which is, in turn, used for CPU
> + * throttle and memory transfer.

I'm not sure I follow what "in turn" is supposed to mean in this
sentence. Could you clarify?

> + */
> +#define RAMBLOCK_SYN_LEGACY_ITER   (1U << 0)

So ITER is as opposed to background? I'm a bit confused with the terms.

> +
> +/*
> + * The modern sync, which is, in turn, used for CPU throttle
> + * and memory transfer.
> + */
> +#define RAMBLOCK_SYN_MODERN_ITER   (1U << 1)
> +
> +/* The modern sync, which is used for CPU throttle only */
> +#define RAMBLOCK_SYN_MODERN_BACKGROUND(1U << 2)

What's the plan for the "legacy" part? To be removed soon? Do we want to
remove it now? Maybe better to not use the modern/legacy terms unless we
want to give the impression that the legacy one is discontinued.

> +
> +#define RAMBLOCK_SYN_MASK  (0x7)
> +
> +typedef enum RAMBlockSynMode {
> +RAMBLOCK_SYN_LEGACY, /* Old-fashined mode */
> +RAMBLOCK_SYN_MODERN, /* Background-sync-supported mode */
> +} RAMBlockSynMode;

I'm also wondering wheter we need this enum + the flags or one of them
would suffice. I'm looking at code like this in the following patches,
for instance:

+if (sync_mode == RAMBLOCK_SYN_MODERN) {
+if (background) {
+flag = RAMBLOCK_SYN_MODERN_BACKGROUND;
+} else {
+flag = RAMBLOCK_SYN_MODERN_ITER;
+}
+}

Couldn't we use LEGACY/BG/ITER?

> +
>  struct RAMBlock {
>  struct rcu_head rcu;
>  struct MemoryRegion *mr;
> @@ -89,6 +113,27 @@ struct RAMBlock {
>   * could not have been valid on the source.
>   */
>  ram_addr_t postcopy_length;
> +
> +/*
> + * Used to backup the bmap during background sync to see whether any 
> dirty
> + * pages were sent during that time.
> + */
> +unsigned long *shadow_bmap;
> +
> +/*
> + * The bitmap "bmap," which was initially used for both sync and memory
> + * transfer, will be replaced by two bitmaps: the previously used "bmap"
> + * and the recently added "iter_bmap." Only the memory transfer is
> + * conducted with the previously used "bmap"; the recently added
> + * "iter_bmap" is utilized for dirty bitmap sync.
> + */
> +unsigned long *iter_bmap;
> +
> +/* Number of new dirty pages during iteration */
> +uint64_t iter_dirty_pages;
> +
> +/* If background sync has shown up during iteration */
> +bool background_sync_shown_up;
>  };
>  #endif
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 67ca3d5d51..f29faa82d6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2362,6 +2362,10 @@ static void ram_bitmaps_destroy(void)
>  block->bmap = NULL;
>  g_free(block->file_bmap);
>  block->file_bmap = NULL;
> +g_free(block->shadow_bmap);
> +block->shadow_bmap = NULL;
> +g_free(block->iter_bmap);
> +block->iter_bmap = NULL;
>  }
>  }
>  
> @@ -2753,6 +2757,8 @@ static void ram_list_init_bitmaps(void)
>  }
>  block->clear_bmap_shift = shift;
>  block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
> +block->shadow_bmap = bitmap_new(pages);
> +block->iter_bmap = bitmap_new(pages);
>  }
>  }
>  }



Re: [PATCH v1 6/7] qapi/migration: Introduce cpu-responsive-throttle parameter

2024-09-16 Thread Fabiano Rosas
Hyman Huang  writes:

> To enable the responsive throttle that will be implemented
> in the next commit, introduce the cpu-responsive-throttle
> parameter.
>
> Signed-off-by: Hyman Huang 
> ---
>  migration/migration-hmp-cmds.c |  8 
>  migration/options.c| 20 
>  migration/options.h|  1 +
>  qapi/migration.json| 16 +++-
>  4 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 28165cfc9e..1fe6c74d66 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -264,6 +264,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>  monitor_printf(mon, "%s: %s\n",
>  
> MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW),
>  params->cpu_throttle_tailslow ? "on" : "off");
> +assert(params->has_cpu_responsive_throttle);
> +monitor_printf(mon, "%s: %s\n",
> +
> MigrationParameter_str(MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE),
> +params->cpu_responsive_throttle ? "on" : "off");
>  assert(params->has_max_cpu_throttle);
>  monitor_printf(mon, "%s: %u\n",
>  MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
> @@ -512,6 +516,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
> *qdict)
>  p->has_cpu_throttle_tailslow = true;
>  visit_type_bool(v, param, &p->cpu_throttle_tailslow, &err);
>  break;
> +case MIGRATION_PARAMETER_CPU_RESPONSIVE_THROTTLE:
> +p->has_cpu_responsive_throttle = true;
> +visit_type_bool(v, param, &p->cpu_responsive_throttle, &err);
> +break;
>  case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
>  p->has_max_cpu_throttle = true;
>  visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
> diff --git a/migration/options.c b/migration/options.c
> index 147cd2b8fd..b4c269bf1d 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -111,6 +111,8 @@ Property migration_properties[] = {
>DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT),
>  DEFINE_PROP_BOOL("x-cpu-throttle-tailslow", MigrationState,
>parameters.cpu_throttle_tailslow, false),
> +DEFINE_PROP_BOOL("x-cpu-responsive-throttle", MigrationState,
> +  parameters.cpu_responsive_throttle, false),
>  DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
>parameters.max_bandwidth, MAX_THROTTLE),
>  DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
> @@ -705,6 +707,13 @@ uint8_t migrate_cpu_throttle_initial(void)
>  return s->parameters.cpu_throttle_initial;
>  }
>  
> +bool migrate_responsive_throttle(void)
> +{
> +MigrationState *s = migrate_get_current();
> +
> +return s->parameters.cpu_responsive_throttle;
> +}
> +
>  bool migrate_cpu_throttle_tailslow(void)
>  {
>  MigrationState *s = migrate_get_current();
> @@ -891,6 +900,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
> **errp)
>  params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
>  params->has_cpu_throttle_tailslow = true;
>  params->cpu_throttle_tailslow = s->parameters.cpu_throttle_tailslow;
> +params->has_cpu_responsive_throttle = true;
> +params->cpu_responsive_throttle = s->parameters.cpu_responsive_throttle;
>  params->tls_creds = g_strdup(s->parameters.tls_creds);
>  params->tls_hostname = g_strdup(s->parameters.tls_hostname);
>  params->tls_authz = g_strdup(s->parameters.tls_authz ?
> @@ -959,6 +970,7 @@ void migrate_params_init(MigrationParameters *params)
>  params->has_cpu_throttle_initial = true;
>  params->has_cpu_throttle_increment = true;
>  params->has_cpu_throttle_tailslow = true;
> +params->has_cpu_responsive_throttle = true;
>  params->has_max_bandwidth = true;
>  params->has_downtime_limit = true;
>  params->has_x_checkpoint_delay = true;
> @@ -1191,6 +1203,10 @@ static void 
> migrate_params_test_apply(MigrateSetParameters *params,
>  dest->cpu_throttle_tailslow = params->cpu_throttle_tailslow;
>  }
>  
> +if (params->has_cpu_responsive_throttle) {
> +dest->cpu_responsive_throttle = params->cpu_responsive_throttle;
> +}
> +
>  if (params->tls_creds) {
>  assert(params->tls_creds->type == QTYPE_QSTRING);
>  dest->tls_creds = params->tls_creds->u.s;
> @@ -1302,6 +1318,10 @@ static void migrate_params_apply(MigrateSetParameters 
> *params, Error **errp)
>  s->parameters.cpu_throttle_tailslow = params->cpu_throttle_tailslow;
>  }
>  
> +if (params->has_cpu_responsive_throttle) {
> +s->parameters.cpu_responsive_throttle = 
> params->cpu_responsive_throttle;
> +}
> +
>  if (params->tls_creds) {
>  g_free(s->parameters.tls_creds);
>  assert(params->

Re: [PATCH v1 5/7] migration: Support background dirty bitmap sync and throttle

2024-09-16 Thread Fabiano Rosas
Hyman Huang  writes:

> When VM is configured with huge memory, the current throttle logic
> doesn't look like to scale, because migration_trigger_throttle()
> is only called for each iteration, so it won't be invoked for a long
> time if one iteration can take a long time.
>
> The background sync and throttle aim to fix the above issue by
> synchronizing the remote dirty bitmap and triggering the throttle
> once detect that iteration lasts a long time.
>
> This is a trade-off between synchronization overhead and CPU throttle
> impact.
>
> Signed-off-by: Hyman Huang 
> ---
>  migration/migration.c| 12 +++
>  tests/qtest/migration-test.c | 39 
>  2 files changed, 51 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 055d527ff6..af8b22fa15 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1416,6 +1416,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>  
>  trace_migrate_fd_cleanup();
>  bql_unlock();
> +migration_background_sync_cleanup();
>  if (s->migration_thread_running) {
>  qemu_thread_join(&s->thread);
>  s->migration_thread_running = false;
> @@ -3263,6 +3264,7 @@ static MigIterateState 
> migration_iteration_run(MigrationState *s)
>  
>  if ((!pending_size || pending_size < s->threshold_size) && 
> can_switchover) {
>  trace_migration_thread_low_pending(pending_size);
> +migration_background_sync_cleanup();

This one is redundant with the migrate_fd_cleanup() call at the end of
migration_iteration_finish().

>  migration_completion(s);
>  return MIG_ITERATE_BREAK;
>  }
> @@ -3508,6 +3510,16 @@ static void *migration_thread(void *opaque)
>  ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>  bql_unlock();
>  
> +if (!migrate_dirty_limit()) {
> +/*
> + * Initiate the background sync watcher in order to guarantee
> + * that the CPU throttling acts appropriately. Dirty Limit
> + * doesn't use CPU throttle to make guest down, so ignore that
> + * case.
> + */
> +migration_background_sync_setup();
> +}
> +
>  qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> MIGRATION_STATUS_ACTIVE);
>  
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index b796a90cad..e0e94d26be 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -281,6 +281,11 @@ static uint64_t get_migration_pass(QTestState *who)
>  return read_ram_property_int(who, "iteration-count");
>  }
>  
> +static uint64_t get_dirty_sync_count(QTestState *who)
> +{
> +return read_ram_property_int(who, "dirty-sync-count");
> +}
> +
>  static void read_blocktime(QTestState *who)
>  {
>  QDict *rsp_return;
> @@ -468,6 +473,12 @@ static void migrate_ensure_converge(QTestState *who)
>  migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
>  }
>  
> +static void migrate_ensure_iteration_last_long(QTestState *who)
> +{
> +/* Set 10Byte/s bandwidth limit to make the iteration last long enough */
> +migrate_set_parameter_int(who, "max-bandwidth", 10);
> +}
> +
>  /*
>   * Our goal is to ensure that we run a single full migration
>   * iteration, and also dirty memory, ensuring that at least
> @@ -2791,6 +2802,10 @@ static void test_migrate_auto_converge(void)
>   * so we need to decrease a bandwidth.
>   */
>  const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> +uint64_t prev_iter_cnt = 0, iter_cnt;
> +uint64_t iter_cnt_changes = 0;
> +uint64_t prev_dirty_sync_cnt = 0, dirty_sync_cnt;
> +uint64_t dirty_sync_cnt_changes = 0;
>  
>  if (test_migrate_start(&from, &to, uri, &args)) {
>  return;
> @@ -2827,6 +2842,30 @@ static void test_migrate_auto_converge(void)
>  } while (true);
>  /* The first percentage of throttling should be at least init_pct */
>  g_assert_cmpint(percentage, >=, init_pct);
> +
> +/* Make sure the iteration take a long time enough */
> +migrate_ensure_iteration_last_long(from);
> +
> +/*
> + * End the loop when the dirty sync count or iteration count changes.
> + */
> +while (iter_cnt_changes < 2 && dirty_sync_cnt_changes < 2) {
> +usleep(1000 * 1000);
> +iter_cnt = get_migration_pass(from);
> +iter_cnt_changes += (iter_cnt != prev_iter_cnt);
> +prev_iter_cnt = iter_cnt;
> +
> +dirty_sync_cnt = get_dirty_sync_count(from);
> +dirty_sync_cnt_changes += (dirty_sync_cnt != prev_dirty_sync_cnt);
> +prev_dirty_sync_cnt = dirty_sync_cnt;
> +}
> +
> +/*
> + * The dirty sync count must have changed because we are in the same
> + * iteration.
> + */
> +g_assert_cmpint(iter_cnt_changes , < , dirty_sync_cnt_changes);
> +
>  /* Now, when we tested that throttling works, let it conve

Re: [PATCH v1 3/7] qapi/migration: Introduce the iteration-count

2024-09-16 Thread Fabiano Rosas
Hyman Huang  writes:

> The original migration information dirty-sync-count could
> no longer reflect iteration count due to the introduction
> of background synchronization in the next commit;
> add the iteration count to compensate.

I agree with the overall idea, but I feel we're lacking some information
on what determines whether some of the lines below want to use the
iteration count vs. the dirty sync count. Since this patch increments
both variables at the same place, they can still be used interchangeably
unless we add some words to explain the distinction.

So to clarify: 

What do we call an iteration? A call to save_live_iterate(),
migration_iteration_run() or something else?

Why dirty-sync-count should ever have reflected "iteration count"? It
might have been this way by coincidence, but did we ever used it in that
sense (aside from info migrate maybe)?

With the new counter, what kind of meaning can a user extract from that
number aside from "some undescribed thing happened N times" (this might
be included in the migration.json docs)?

>
> Signed-off-by: Hyman Huang 
> ---
>  migration/migration-stats.h  |  4 
>  migration/migration.c|  1 +
>  migration/ram.c  | 12 
>  qapi/migration.json  |  6 +-
>  tests/qtest/migration-test.c |  2 +-
>  5 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
> index 05290ade76..43ee0f4f05 100644
> --- a/migration/migration-stats.h
> +++ b/migration/migration-stats.h
> @@ -50,6 +50,10 @@ typedef struct {
>   * Number of times we have synchronized guest bitmaps.
>   */
>  Stat64 dirty_sync_count;
> +/*
> + * Number of migration iteration processed.
> + */
> +Stat64 iteration_count;
>  /*
>   * Number of times zero copy failed to send any page using zero
>   * copy.
> diff --git a/migration/migration.c b/migration/migration.c
> index 3dea06d577..055d527ff6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1197,6 +1197,7 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  info->ram->mbps = s->mbps;
>  info->ram->dirty_sync_count =
>  stat64_get(&mig_stats.dirty_sync_count);
> +info->ram->iteration_count = stat64_get(&mig_stats.iteration_count);
>  info->ram->dirty_sync_missed_zero_copy =
>  stat64_get(&mig_stats.dirty_sync_missed_zero_copy);
>  info->ram->postcopy_requests =
> diff --git a/migration/ram.c b/migration/ram.c
> index e205806a5f..ca5a1b5f16 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -594,7 +594,7 @@ static void xbzrle_cache_zero_page(ram_addr_t 
> current_addr)
>  /* We don't care if this fails to allocate a new cache page
>   * as long as it updated an old one */
>  cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
> - stat64_get(&mig_stats.dirty_sync_count));
> + stat64_get(&mig_stats.iteration_count));
>  }
>  
>  #define ENCODING_FLAG_XBZRLE 0x1
> @@ -620,7 +620,7 @@ static int save_xbzrle_page(RAMState *rs, 
> PageSearchStatus *pss,
>  int encoded_len = 0, bytes_xbzrle;
>  uint8_t *prev_cached_page;
>  QEMUFile *file = pss->pss_channel;
> -uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
> +uint64_t generation = stat64_get(&mig_stats.iteration_count);
>  
>  if (!cache_is_cached(XBZRLE.cache, current_addr, generation)) {
>  xbzrle_counters.cache_miss++;
> @@ -1079,6 +1079,10 @@ static void migration_bitmap_sync(RAMState *rs,
>  RAMBlock *block;
>  int64_t end_time;
>  
> +if (!background) {
> +stat64_add(&mig_stats.iteration_count, 1);
> +}
> +
>  stat64_add(&mig_stats.dirty_sync_count, 1);
>  
>  if (!rs->time_last_bitmap_sync) {
> @@ -1115,8 +1119,8 @@ static void migration_bitmap_sync(RAMState *rs,
>  rs->num_dirty_pages_period = 0;
>  rs->bytes_xfer_prev = migration_transferred_bytes();
>  }
> -if (migrate_events()) {
> -uint64_t generation = stat64_get(&mig_stats.dirty_sync_count);
> +if (!background && migrate_events()) {
> +uint64_t generation = stat64_get(&mig_stats.iteration_count);
>  qapi_event_send_migration_pass(generation);
>  }
>  }
> diff --git a/qapi/migration.json b/qapi/migration.json
> index b66cccf107..95b490706c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -60,6 +60,9 @@
>  # between 0 and @dirty-sync-count * @multifd-channels.  (since
>  # 7.1)
>  #
> +# @iteration-count: The number of iterations since migration started.
> +# (since 9.2)
> +#
>  # Since: 0.14
>  ##
>  { 'struct': 'MigrationStats',
> @@ -72,7 +75,8 @@
> 'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
> 'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
> 'postcopy-bytes': 'uint64',
> -   'dirty-sync-missed-zero-copy': 

Re: [PATCH v3 1/4] KVM: Dynamic sized kvm memslots array

2024-09-16 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Peter Xu  writes:
>
>> Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
>> was a separate discussion, however during that I found a regression of
>> dirty sync slowness when profiling.
>>
>> Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
>> statically allocated to be the max supported by the kernel.  However after
>> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
>> the max supported memslots reported now grows to some number large enough
>> so that it may not be wise to always statically allocate with the max
>> reported.
>>
>> What's worse, QEMU kvm code still walks all the allocated memslots entries
>> to do any form of lookups.  It can drastically slow down all memslot
>> operations because each of such loop can run over 32K times on the new
>> kernels.
>>
>> Fix this issue by making the memslots to be allocated dynamically.
>>
>> Here the initial size was set to 16 because it should cover the basic VM
>> usages, so that the hope is the majority VM use case may not even need to
>> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
>> it'll consume 9 memslots), however not too large to waste memory.
>>
>> There can also be even better way to address this, but so far this is the
>> simplest and should be already better even than before we grow the max
>> supported memslots.  For example, in the case of above issue when VFIO was
>> attached on a 32GB system, there are only ~10 memslots used.  So it could
>> be good enough as of now.
>>
>> In the above VFIO context, measurement shows that the precopy dirty sync
>> shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
>> to any KVM enabled VM even without VFIO.
>>
>> NOTE: we don't have a FIXES tag for this patch because there's no real
>> commit that regressed this in QEMU. Such behavior existed for a long time,
>> but only start to be a problem when the kernel reports very large
>> nr_slots_max value.  However that's pretty common now (the kernel change
>> was merged in 2021) so we attached cc:stable because we'll want this change
>> to be backported to stable branches.
>>
>> Cc: qemu-stable 
>> Reported-by: Zhiyi Guo 
>> Tested-by: Zhiyi Guo 
>> Signed-off-by: Peter Xu 
>> ---
>>  include/sysemu/kvm_int.h |  1 +
>>  accel/kvm/kvm-all.c  | 99 ++--
>>  accel/kvm/trace-events   |  1 +
>>  3 files changed, 86 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
>> index 1d8fb1473b..48e496b3d4 100644
>> --- a/include/sysemu/kvm_int.h
>> +++ b/include/sysemu/kvm_int.h
>> @@ -46,6 +46,7 @@ typedef struct KVMMemoryListener {
>>  MemoryListener listener;
>>  KVMSlot *slots;
>>  unsigned int nr_used_slots;
>> +unsigned int nr_slots_allocated;
>>  int as_id;
>>  QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
>>  QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del;
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 75d11a07b2..c51a3f18db 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -69,6 +69,9 @@
>>  #define KVM_GUESTDBG_BLOCKIRQ 0
>>  #endif
>>  
>> +/* Default num of memslots to be allocated when VM starts */
>> +#define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT  16
>> +
>>  struct KVMParkedVcpu {
>>  unsigned long vcpu_id;
>>  int kvm_fd;
>> @@ -165,6 +168,57 @@ void kvm_resample_fd_notify(int gsi)
>>  }
>>  }
>>  
>> +/**
>> + * kvm_slots_grow(): Grow the slots[] array in the KVMMemoryListener
>> + *
>> + * @kml: The KVMMemoryListener* to grow the slots[] array
>> + * @nr_slots_new: The new size of slots[] array
>> + *
>> + * Returns: True if the array grows larger, false otherwise.
>> + */
>> +static bool kvm_slots_grow(KVMMemoryListener *kml, unsigned int 
>> nr_slots_new)
>> +{
>> +unsigned int i, cur = kml->nr_slots_allocated;
>> +KVMSlot *slots;
>> +
>> +if (nr_slots_new > kvm_state->nr_slots) {
>> +nr_slots_new = kvm_state->nr_slots;
>> +}
>> +
>> +if (cur >= nr_slots_new) {
>> +/* Big enough, no need to grow, or we reached max */
>> +return false;
>> +}
>> +
>> +if (cur == 0) {
>&

Re: [PATCH v3 1/4] KVM: Dynamic sized kvm memslots array

2024-09-16 Thread Fabiano Rosas
Peter Xu  writes:

> Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
> was a separate discussion, however during that I found a regression of
> dirty sync slowness when profiling.
>
> Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
> statically allocated to be the max supported by the kernel.  However after
> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
> the max supported memslots reported now grows to some number large enough
> so that it may not be wise to always statically allocate with the max
> reported.
>
> What's worse, QEMU kvm code still walks all the allocated memslots entries
> to do any form of lookups.  It can drastically slow down all memslot
> operations because each of such loop can run over 32K times on the new
> kernels.
>
> Fix this issue by making the memslots to be allocated dynamically.
>
> Here the initial size was set to 16 because it should cover the basic VM
> usages, so that the hope is the majority VM use case may not even need to
> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
> it'll consume 9 memslots), however not too large to waste memory.
>
> There can also be even better way to address this, but so far this is the
> simplest and should be already better even than before we grow the max
> supported memslots.  For example, in the case of above issue when VFIO was
> attached on a 32GB system, there are only ~10 memslots used.  So it could
> be good enough as of now.
>
> In the above VFIO context, measurement shows that the precopy dirty sync
> shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
> to any KVM enabled VM even without VFIO.
>
> NOTE: we don't have a FIXES tag for this patch because there's no real
> commit that regressed this in QEMU. Such behavior existed for a long time,
> but only start to be a problem when the kernel reports very large
> nr_slots_max value.  However that's pretty common now (the kernel change
> was merged in 2021) so we attached cc:stable because we'll want this change
> to be backported to stable branches.
>
> Cc: qemu-stable 
> Reported-by: Zhiyi Guo 
> Tested-by: Zhiyi Guo 
> Signed-off-by: Peter Xu 
> ---
>  include/sysemu/kvm_int.h |  1 +
>  accel/kvm/kvm-all.c  | 99 ++--
>  accel/kvm/trace-events   |  1 +
>  3 files changed, 86 insertions(+), 15 deletions(-)
>
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 1d8fb1473b..48e496b3d4 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -46,6 +46,7 @@ typedef struct KVMMemoryListener {
>  MemoryListener listener;
>  KVMSlot *slots;
>  unsigned int nr_used_slots;
> +unsigned int nr_slots_allocated;
>  int as_id;
>  QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
>  QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del;
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 75d11a07b2..c51a3f18db 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -69,6 +69,9 @@
>  #define KVM_GUESTDBG_BLOCKIRQ 0
>  #endif
>  
> +/* Default num of memslots to be allocated when VM starts */
> +#define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT  16
> +
>  struct KVMParkedVcpu {
>  unsigned long vcpu_id;
>  int kvm_fd;
> @@ -165,6 +168,57 @@ void kvm_resample_fd_notify(int gsi)
>  }
>  }
>  
> +/**
> + * kvm_slots_grow(): Grow the slots[] array in the KVMMemoryListener
> + *
> + * @kml: The KVMMemoryListener* to grow the slots[] array
> + * @nr_slots_new: The new size of slots[] array
> + *
> + * Returns: True if the array grows larger, false otherwise.
> + */
> +static bool kvm_slots_grow(KVMMemoryListener *kml, unsigned int nr_slots_new)
> +{
> +unsigned int i, cur = kml->nr_slots_allocated;
> +KVMSlot *slots;
> +
> +if (nr_slots_new > kvm_state->nr_slots) {
> +nr_slots_new = kvm_state->nr_slots;
> +}
> +
> +if (cur >= nr_slots_new) {
> +/* Big enough, no need to grow, or we reached max */
> +return false;
> +}
> +
> +if (cur == 0) {
> +slots = g_new0(KVMSlot, nr_slots_new);
> +} else {
> +assert(kml->slots);
> +slots = g_renew(KVMSlot, kml->slots, nr_slots_new);
> +/*
> + * g_renew() doesn't initialize extended buffers, however kvm
> + * memslots require fields to be zero-initialized. E.g. pointers,
> + * memory_size field, etc.
> + */
> +memset(&slots[cur], 0x0, sizeof(slots[0]) * (nr_slots_new - cur));
> +}
> +
> +for (i = cur; i < nr_slots_new; i++) {
> +slots[i].slot = i;
> +}
> +
> +kml->slots = slots;
> +kml->nr_slots_allocated = nr_slots_new;
> +trace_kvm_slots_grow(cur, nr_slots_new);
> +
> +return true;
> +}
> +
> +static bool kvm_slots_double(KVMMemoryListener *kml)
> +{
> +return kvm_slots_grow(kml, kml->nr_slots_allocated * 2);
> +}
> +

Re: [External] Re: [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading.

2024-09-16 Thread Fabiano Rosas
Yichen Wang  writes:

> On Wed, Jul 24, 2024 at 7:50 AM Markus Armbruster  wrote:
>>
>> Fabiano Rosas  writes:
>>
>> > Yichen Wang  writes:
>> >
>> >> On Thu, Jul 11, 2024 at 2:53 PM Yichen Wang  
>> >> wrote:
>> >>
>> >>> diff --git a/migration/options.c b/migration/options.c
>> >>> index 645f55003d..f839493016 100644
>> >>> --- a/migration/options.c
>> >>> +++ b/migration/options.c
>> >>> @@ -29,6 +29,7 @@
>> >>>  #include "ram.h"
>> >>>  #include "options.h"
>> >>>  #include "sysemu/kvm.h"
>> >>> +#include 
>> >>>
>> >>>  /* Maximum migrate downtime set to 2000 seconds */
>> >>>  #define MAX_MIGRATE_DOWNTIME_SECONDS 2000
>> >>> @@ -162,6 +163,10 @@ Property migration_properties[] = {
>> >>>  DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", 
>> >>> MigrationState,
>> >>> parameters.zero_page_detection,
>> >>> ZERO_PAGE_DETECTION_MULTIFD),
>> >>> +/* DEFINE_PROP_ARRAY("dsa-accel-path", MigrationState, x, */
>> >>> +/*parameters.dsa_accel_path, qdev_prop_string, 
>> >>> char *), */
>> >
>> > This is mostly correct, I think, you just need to create a field in
>> > MigrationState to keep the length (instead of x). However, I found out
>> > just now that this only works with QMP. Let me ask for other's
>> > opinions...
>> >
>> >>> +/* DEFINE_PROP_STRING("dsa-accel-path", MigrationState, */
>> >>> +/*parameters.dsa_accel_path), */
>> >>>
>> >>>  /* Migration capabilities */
>> >>>  DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>> >>
>> >> I changed the dsa-accel-path to be a ['str'], i.e. strList* in C.
>> >> However, I am having a hard time about how to define the proper
>> >> properties here. I don't know what MACRO to use and I can't find good
>> >> examples... Need some guidance about how to proceed. Basically I will
>> >> need this to pass something like '-global
>> >> migration.dsa-accel-path="/dev/dsa/wq0.0"' in cmdline, or
>> >> "migrate_set_parameter dsa-accel-path" in QEMU CLI. Don't know how to
>> >> pass strList there.
>> >>
>> >> Thanks very much!
>> >
>> > @Daniel, @Markus, any idea here?
>> >
>> > If I'm reading this commit[1] right, it seems we decided to disallow
>> > passing of arrays without JSON, which affects -global on the
>> > command-line and HMP.
>> >
>> > 1- b06f8b500d (qdev: Rework array properties based on list visitor,
>> > 2023-11-09)
>> >
>> > QMP shell:
>> > (QEMU) migrate-set-parameters dsa-accel-path=['a','b']
>> > {"return": {}}
>> >
>> > HMP:
>> > (qemu) migrate_set_parameter dsa-accel-path "['a','b']"
>> > qemu-system-x86_64: ../qapi/string-input-visitor.c:343: parse_type_str:
>> > Assertion `siv->lm == LM_NONE' failed.
>>
>> HMP migrate_set_parameter doesn't support JSON.  It uses the string
>> input visitor to parse the value, which can only do lists of integers.
>>
>> The string visitors have been thorns in my side since forever.
>>
>> > Any recommendation? I believe all migration parameters so far can be set
>> > via those means, I don't think we can allow only this one to be
>> > QMP-only.
>> >
>> > Or am I just missing something?
>>
>> I don't think the string input visitor can be compatibly extended to
>> arbitrary lists.
>>
>> We could replace HMP migrate_set_parameter by migrate_set_parameters.
>> The new command parses its single argument into a struct
>> MigrateSetParameters with keyval_parse(),
>> qobject_input_visitor_new_keyval(), and
>> visit_type_MigrateSetParameters().
>>
>
> I tried Fabiano's suggestion, and put a unit32_t in MigrateState data
> structure. I got exactly the same: "qemu-system-x86_64.dsa:
> ../../../qapi/string-input-visitor.c:343: parse_type_str: Assertion
> `siv->lm == LM_NONE' failed.". Steve's patch is more to be a read-only
> field from HMP, so probably I can't do that.

What do you mean by read-only field? I thought his usage was the same as
what we want for dsa-accel-path:

(qemu) migrate_set_parameter cpr-exec-command abc def
(qemu) info migrate_parameters 
...
cpr-exec-command: abc def

(gdb) p valuestr
$3 = 0x5766a8d0 "abc def"
(gdb) p *p->cpr_exec_command 
$6 = {next = 0x5823d300, value = 0x5765f690 "abc"}
(gdb) p *p->cpr_exec_command.next
$7 = {next = 0x5805be20, value = 0x57fefc80 "def"}



[PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-13 Thread Fabiano Rosas
Fix a segmentation fault in multifd when rb->receivedmap is cleared
too early.

After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
multiple page faults"), multifd started using the rb->receivedmap
bitmap, which belongs to ram.c and is initialized and *freed* from the
ram SaveVMHandlers.

Multifd threads are live until migration_incoming_state_destroy(),
which is called after qemu_loadvm_state_cleanup(), leading to a crash
when accessing rb->receivedmap.

process_incoming_migration_co()...
  qemu_loadvm_state()  multifd_nocomp_recv()
qemu_loadvm_state_cleanup()  ramblock_recv_bitmap_set_offset()
  rb->receivedmap = NULL   set_bit_atomic(..., rb->receivedmap)
  ...
  migration_incoming_state_destroy()
multifd_recv_cleanup()
  multifd_recv_terminate_threads(NULL)

Move the loadvm cleanup into migration_incoming_state_destroy(), after
multifd_recv_cleanup() to ensure multifd thread have already exited
when rb->receivedmap is cleared.

The have_listen_thread logic can now be removed because its purpose
was to delay cleanup until postcopy_ram_listen_thread() had finished.

CC: qemu-sta...@nongnu.org
Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page 
faults")
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 1 +
 migration/migration.h | 1 -
 migration/savevm.c| 9 -
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..b190a574b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 multifd_recv_cleanup();
+qemu_loadvm_state_cleanup();
 
 if (mis->to_src_file) {
 /* Tell source that we are done */
diff --git a/migration/migration.h b/migration/migration.h
index 38aa1402d5..20b0a5b66e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -101,7 +101,6 @@ struct MigrationIncomingState {
 /* Set this when we want the fault thread to quit */
 bool   fault_thread_quit;
 
-bool   have_listen_thread;
 QemuThread listen_thread;
 
 /* For the kernel to send us notifications */
diff --git a/migration/savevm.c b/migration/savevm.c
index d0759694fd..532ee5e4b0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2076,10 +2076,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
  * got a bad migration state).
  */
 migration_incoming_state_destroy();
-qemu_loadvm_state_cleanup();
 
 rcu_unregister_thread();
-mis->have_listen_thread = false;
 postcopy_state_set(POSTCOPY_INCOMING_END);
 
 object_unref(OBJECT(migr));
@@ -2130,7 +2128,6 @@ static int 
loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 return -1;
 }
 
-mis->have_listen_thread = true;
 postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen",
postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
 trace_loadvm_postcopy_handle_listen("return");
@@ -2978,11 +2975,6 @@ int qemu_loadvm_state(QEMUFile *f)
 
 trace_qemu_loadvm_state_post_main(ret);
 
-if (mis->have_listen_thread) {
-/* Listen thread still going, can't clean up yet */
-return ret;
-}
-
 if (ret == 0) {
 ret = qemu_file_get_error(f);
 }
@@ -3022,7 +3014,6 @@ int qemu_loadvm_state(QEMUFile *f)
 }
 }
 
-qemu_loadvm_state_cleanup();
 cpu_synchronize_all_post_init();
 
 return ret;
-- 
2.35.3




[PATCH 1/2] migration/savevm: Remove extra load cleanup calls

2024-09-13 Thread Fabiano Rosas
There are two qemu_loadvm_state_cleanup() calls that were introduced
when qemu_loadvm_state_setup() was still called before loading the
configuration section, so there was state to be cleaned up if the
header checks failed.

However, commit 9e14b84908 ("migration/savevm: load_header before
load_setup") has moved that configuration section part to
qemu_loadvm_state_header() which now happens before
qemu_loadvm_state_setup().

Remove the cleanup calls that are now misplaced.

CC: qemu-sta...@nongnu.org
Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
Signed-off-by: Fabiano Rosas 
---
 migration/savevm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index d500eae979..d0759694fd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
 if (migrate_get_current()->send_configuration) {
 if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
 error_report("Configuration section missing");
-qemu_loadvm_state_cleanup();
 return -EINVAL;
 }
 ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
 
 if (ret) {
-qemu_loadvm_state_cleanup();
 return ret;
 }
 }
-- 
2.35.3




[PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race

2024-09-13 Thread Fabiano Rosas
This fixes the crash we've been seing recently in migration-test. The
first patch is a cleanup to have only one place calling
qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
calls to make multifd_recv_cleanup() run first and stop the recv
threads.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1453038652

Fabiano Rosas (2):
  migration/savevm: Remove extra load cleanup calls
  migration/multifd: Fix rb->receivedmap cleanup race

 migration/migration.c |  1 +
 migration/migration.h |  1 -
 migration/savevm.c| 11 ---
 3 files changed, 1 insertion(+), 12 deletions(-)

-- 
2.35.3




Re: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side

2024-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Sep 13, 2024 at 12:04:00PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Fri, Sep 13, 2024 at 10:21:39AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> > On Thu, Sep 12, 2024 at 03:43:39PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu  writes:
>> >> >> 
>> >> >> Hi Peter, sorry if I'm not very enthusiastic by this, I'm sure you
>> >> >> understand the rework is a little frustrating.
>> >> >
>> >> > That's OK.
>> >> >
>> >> > [For some reason my email sync program decided to give up working for
>> >> >  hours.  I got more time looking at a tsc bug, which is good, but then I
>> >> >  miss a lot of emails..]
>> >> >
>> >> >> 
>> >> >> > On Wed, Aug 28, 2024 at 09:41:17PM -0300, Fabiano Rosas wrote:
>> >> >> >> > +size_t multifd_device_state_payload_size(void)
>> >> >> >> > +{
>> >> >> >> > +return sizeof(MultiFDDeviceState_t);
>> >> >> >> > +}
>> >> >> >> 
>> >> >> >> This will not be necessary because the payload size is the same as 
>> >> >> >> the
>> >> >> >> data type. We only need it for the special case where the 
>> >> >> >> MultiFDPages_t
>> >> >> >> is smaller than the total ram payload size.
>> >> >> >
>> >> >> > Today I was thinking maybe we should really clean this up, as the 
>> >> >> > current
>> >> >> > multifd_send_data_alloc() is indeed too tricky (blame me.. who 
>> >> >> > requested
>> >> >> > that more or less).  Knowing that VFIO can use dynamic buffers with 
>> >> >> > ->idstr
>> >> >> > and ->buf (I was thinking it could be buf[1M].. but I was wrong...) 
>> >> >> > made
>> >> >> > that feeling stronger.
>> >> >> 
>> >> >> If we're going to commit bad code and then rewrite it a week later, we
>> >> >> could have just let the original series from Maciej merge without any 
>> >> >> of
>> >> >> this.
>> >> >
>> >> > Why it's "bad code"?
>> >> >
>> >> > It runs pretty well, I don't think it's bad code.  You wrote it, and I
>> >> > don't think it's bad at all.
>> >> 
>> >> Code that forces us to do arithmetic in order to properly allocate
>> >> memory and comes with a big comment explaining how we're dodging
>> >> compiler warnings is bad code in my book.
>> >> 
>> >> >
>> >> > But now we're rethinking after reading Maciej's new series.
>> >> >Personally I don't think it's a major problem.
>> >> >
>> >> > Note that we're not changing the design back: what was initially 
>> >> > proposed
>> >> > was the client submitting an array of multifd objects.  I still don't 
>> >> > think
>> >> > that's right.
>> >> >
>> >> > What the change goes so far is we make the union a struct, however 
>> >> > that's
>> >> > still N+2 objects not 2*N, where 2 came from RAM+VFIO.  I think the
>> >> > important bits are still there (from your previous refactor series).
>> >> >
>> >> 
>> >> You fail to appreciate that before the RFC series, multifd already
>> >> allocated N for the pages.
>> >
>> > It depends on how you see it, IMHO.  I think it allocates N not for the
>> > "pages" but for the "threads", because the threads can be busy with those
>> > buffers, no matter if it's "page" or "device data".
>> 
>> Each MultiFD*Params had a p->pages, so N channels, N p->pages. The
>> device state series would add p->device_state, one per channel. So 2N +
>> 2 (for the extra slot).
>
> Then it makes sense to have SendData covering pages+device_state.  I think
> it's what we have now, but maybe I missed the point.

I misunderstood you. You're saying that you see the N as per-thread
instead of per-client-per-thread. T

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Sep 13, 2024 at 12:17:40PM -0300, Fabiano Rosas wrote:
>> Fabiano Rosas  writes:
>> 
>> > Peter Xu  writes:
>> >
>> >> On Thu, Sep 12, 2024 at 07:52:48PM -0300, Fabiano Rosas wrote:
>> >>> Fabiano Rosas  writes:
>> >>> 
>> >>> > Peter Xu  writes:
>> >>> >
>> >>> >> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>> >>> >>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>> >>> >>> > I don't think we're discussing total CI time at this point, so the 
>> >>> >>> > math
>> >>> >>> > doesn't really add up. We're not looking into making the CI finish
>> >>> >>> > faster. We're looking into making migration-test finish faster. 
>> >>> >>> > That
>> >>> >>> > would reduce timeouts in CI, speed-up make check and reduce the 
>> >>> >>> > chance
>> >>> >>> > of random race conditions* affecting other people/staging runs.
>> >>> >>> 
>> >>> >>> Right. The reason migration-test appears on my radar is because
>> >>> >>> it is very frequently the thing that shows up as "this sometimes
>> >>> >>> just fails or just times out and if you hit retry it goes away
>> >>> >>> again". That might not be migration-test's fault specifically,
>> >>> >>> because those retries tend to be certain CI configs (s390,
>> >>> >>> the i686-tci one), and I have some theories about what might be
>> >>> >>> causing it (e.g. build system runs 4 migration-tests in parallel,
>> >>> >>> which means 8 QEMU processes which is too many for the number
>> >>> >>> of host CPUs). But right now I look at CI job failures and my 
>> >>> >>> reaction
>> >>> >>> is "oh, it's the migration-test failing yet again" :-(
>> >>> >>> 
>> >>> >>> For some examples from this week:
>> >>> >>> 
>> >>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>> >>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <[1]
>> >>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <[2]
>> >>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>> >>> >>
>> >>> >> Ah right, the TIMEOUT is unfortunate, especially if tests can be run 
>> >>> >> in
>> >>> >> parallel.  It indeed sounds like no good way to finally solve.. I 
>> >>> >> don't
>> >>> >> also see how speeding up / reducing tests in migration test would 
>> >>> >> help, as
>> >>> >> that's (from some degree..) is the same as tuning the timeout value 
>> >>> >> bigger.
>> >>> >> When the tests are less it'll fit into 480s window, but maybe it's too
>> >>> >> quick now we wonder whether we should shrink it to e.g. 90s, but then 
>> >>> >> it
>> >>> >> can timeout again when on a busy host with less capability of 
>> >>> >> concurrency.
>> >>> >>
>> >>> >> But indeed there're two ERRORs ([1,2] above)..  I collected some more 
>> >>> >> info
>> >>> >> here before the log expires:
>> >>> >>
>> >>> >> =8<
>> >>> >>
>> >>> >> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>> >>> >>
>> >>> >> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>> >>> >>
>> >>> >> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test 
>> >>> >> ERROR  144.32s   killed by signal 6 SIGABRT
>> >>> >>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>> >>> >>>>>  
>> >>> >>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vms

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-13 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Peter Xu  writes:
>
>> On Thu, Sep 12, 2024 at 07:52:48PM -0300, Fabiano Rosas wrote:
>>> Fabiano Rosas  writes:
>>> 
>>> > Peter Xu  writes:
>>> >
>>> >> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>>> >>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>>> >>> > I don't think we're discussing total CI time at this point, so the 
>>> >>> > math
>>> >>> > doesn't really add up. We're not looking into making the CI finish
>>> >>> > faster. We're looking into making migration-test finish faster. That
>>> >>> > would reduce timeouts in CI, speed-up make check and reduce the chance
>>> >>> > of random race conditions* affecting other people/staging runs.
>>> >>> 
>>> >>> Right. The reason migration-test appears on my radar is because
>>> >>> it is very frequently the thing that shows up as "this sometimes
>>> >>> just fails or just times out and if you hit retry it goes away
>>> >>> again". That might not be migration-test's fault specifically,
>>> >>> because those retries tend to be certain CI configs (s390,
>>> >>> the i686-tci one), and I have some theories about what might be
>>> >>> causing it (e.g. build system runs 4 migration-tests in parallel,
>>> >>> which means 8 QEMU processes which is too many for the number
>>> >>> of host CPUs). But right now I look at CI job failures and my reaction
>>> >>> is "oh, it's the migration-test failing yet again" :-(
>>> >>> 
>>> >>> For some examples from this week:
>>> >>> 
>>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <[1]
>>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <[2]
>>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>>> >>
>>> >> Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
>>> >> parallel.  It indeed sounds like no good way to finally solve.. I don't
>>> >> also see how speeding up / reducing tests in migration test would help, 
>>> >> as
>>> >> that's (from some degree..) is the same as tuning the timeout value 
>>> >> bigger.
>>> >> When the tests are less it'll fit into 480s window, but maybe it's too
>>> >> quick now we wonder whether we should shrink it to e.g. 90s, but then it
>>> >> can timeout again when on a busy host with less capability of 
>>> >> concurrency.
>>> >>
>>> >> But indeed there're two ERRORs ([1,2] above)..  I collected some more 
>>> >> info
>>> >> here before the log expires:
>>> >>
>>> >> =8<
>>> >>
>>> >> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>>> >>
>>> >> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>>> >>
>>> >> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test
>>> >>  ERROR  144.32s   killed by signal 6 SIGABRT
>>> >>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> >>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>> >>>>>  
>>> >>>>> PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3
>>> >>>>>  QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144 
>>> >>>>> QTEST_QEMU_BINARY=./qemu-system-i386 
>>> >>>>> /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test
>>> >>>>>  --tap -k
>>> >> ― ✀  
>>> >> ―
>>> >> stderr:
>>> >> warning: fd: migration to a file is deprecated. Use file: instead.
>>> >> warning: fd: migration to a file is deprecated. Use file: instead.
>>> >> ../tests/qtest/libqtest.c:205: kill_qemu() de

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 12, 2024 at 07:52:48PM -0300, Fabiano Rosas wrote:
>> Fabiano Rosas  writes:
>> 
>> > Peter Xu  writes:
>> >
>> >> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>> >>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>> >>> > I don't think we're discussing total CI time at this point, so the math
>> >>> > doesn't really add up. We're not looking into making the CI finish
>> >>> > faster. We're looking into making migration-test finish faster. That
>> >>> > would reduce timeouts in CI, speed-up make check and reduce the chance
>> >>> > of random race conditions* affecting other people/staging runs.
>> >>> 
>> >>> Right. The reason migration-test appears on my radar is because
>> >>> it is very frequently the thing that shows up as "this sometimes
>> >>> just fails or just times out and if you hit retry it goes away
>> >>> again". That might not be migration-test's fault specifically,
>> >>> because those retries tend to be certain CI configs (s390,
>> >>> the i686-tci one), and I have some theories about what might be
>> >>> causing it (e.g. build system runs 4 migration-tests in parallel,
>> >>> which means 8 QEMU processes which is too many for the number
>> >>> of host CPUs). But right now I look at CI job failures and my reaction
>> >>> is "oh, it's the migration-test failing yet again" :-(
>> >>> 
>> >>> For some examples from this week:
>> >>> 
>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <[1]
>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <[2]
>> >>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>> >>
>> >> Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
>> >> parallel.  It indeed sounds like no good way to finally solve.. I don't
>> >> also see how speeding up / reducing tests in migration test would help, as
>> >> that's (from some degree..) is the same as tuning the timeout value 
>> >> bigger.
>> >> When the tests are less it'll fit into 480s window, but maybe it's too
>> >> quick now we wonder whether we should shrink it to e.g. 90s, but then it
>> >> can timeout again when on a busy host with less capability of concurrency.
>> >>
>> >> But indeed there're two ERRORs ([1,2] above)..  I collected some more info
>> >> here before the log expires:
>> >>
>> >> =8<
>> >>
>> >> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>> >>
>> >> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>> >>
>> >> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test 
>> >> ERROR  144.32s   killed by signal 6 SIGABRT
>> >>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>> >>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>> >>>>>  
>> >>>>> PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3
>> >>>>>  QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144 
>> >>>>> QTEST_QEMU_BINARY=./qemu-system-i386 
>> >>>>> /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test
>> >>>>>  --tap -k
>> >> ― ✀  
>> >> ―
>> >> stderr:
>> >> warning: fd: migration to a file is deprecated. Use file: instead.
>> >> warning: fd: migration to a file is deprecated. Use file: instead.
>> >> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from 
>> >> signal 11 (Segmentation fault) (core dumped)
>> >> (test program exited with status code -6)
>> >> TAP parsing error: Too few tests run (expected 53, got 39)
>> >> ――
>> >>
>> >> # Start of plain tests
>> &g

Re: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side

2024-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Sep 13, 2024 at 10:21:39AM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Thu, Sep 12, 2024 at 03:43:39PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> Hi Peter, sorry if I'm not very enthusiastic by this, I'm sure you
>> >> understand the rework is a little frustrating.
>> >
>> > That's OK.
>> >
>> > [For some reason my email sync program decided to give up working for
>> >  hours.  I got more time looking at a tsc bug, which is good, but then I
>> >  miss a lot of emails..]
>> >
>> >> 
>> >> > On Wed, Aug 28, 2024 at 09:41:17PM -0300, Fabiano Rosas wrote:
>> >> >> > +size_t multifd_device_state_payload_size(void)
>> >> >> > +{
>> >> >> > +return sizeof(MultiFDDeviceState_t);
>> >> >> > +}
>> >> >> 
>> >> >> This will not be necessary because the payload size is the same as the
>> >> >> data type. We only need it for the special case where the 
>> >> >> MultiFDPages_t
>> >> >> is smaller than the total ram payload size.
>> >> >
>> >> > Today I was thinking maybe we should really clean this up, as the 
>> >> > current
>> >> > multifd_send_data_alloc() is indeed too tricky (blame me.. who requested
>> >> > that more or less).  Knowing that VFIO can use dynamic buffers with 
>> >> > ->idstr
>> >> > and ->buf (I was thinking it could be buf[1M].. but I was wrong...) made
>> >> > that feeling stronger.
>> >> 
>> >> If we're going to commit bad code and then rewrite it a week later, we
>> >> could have just let the original series from Maciej merge without any of
>> >> this.
>> >
>> > Why it's "bad code"?
>> >
>> > It runs pretty well, I don't think it's bad code.  You wrote it, and I
>> > don't think it's bad at all.
>> 
>> Code that forces us to do arithmetic in order to properly allocate
>> memory and comes with a big comment explaining how we're dodging
>> compiler warnings is bad code in my book.
>> 
>> >
>> > But now we're rethinking after reading Maciej's new series.
>> >Personally I don't think it's a major problem.
>> >
>> > Note that we're not changing the design back: what was initially proposed
>> > was the client submitting an array of multifd objects.  I still don't think
>> > that's right.
>> >
>> > What the change goes so far is we make the union a struct, however that's
>> > still N+2 objects not 2*N, where 2 came from RAM+VFIO.  I think the
>> > important bits are still there (from your previous refactor series).
>> >
>> 
>> You fail to appreciate that before the RFC series, multifd already
>> allocated N for the pages.
>
> It depends on how you see it, IMHO.  I think it allocates N not for the
> "pages" but for the "threads", because the threads can be busy with those
> buffers, no matter if it's "page" or "device data".

Each MultiFD*Params had a p->pages, so N channels, N p->pages. The
device state series would add p->device_state, one per channel. So 2N +
2 (for the extra slot).

>
>> The device state adds another client, so that
>> would be another N anyway. The problem the RFC tried to solve was that
>> multifd channels owned that 2N, so the array was added to move the
>> memory into the client's ownership. IOW, it wasn't even the RFC series
>> that made it 2N, that was the multifd design all along. Now in hindsight
>> I don't think we should have went with the memory saving discussion.
>
> I think I could have made that feeling that I only wanted to save memory,
> if so, I'm sorry.  But do you still remember I mentioned "we can make it a
> struct, too" before your series landed?  Then you think it's ok to keep
> using union, and I'm ok too! At least at that time.  I don't think that's a
> huge deal.  I don't think each route we go must be perfect, but we try the
> best to make it as good.

Yep, I did agree with all of this. I'm just saying I now think I
shouldn't have.

>
> I don't think any discussion must not happen.  I agree memory consumption
> is not the 1st thing to worry, but I don't see why it can't be disc

Re: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side

2024-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 12, 2024 at 03:43:39PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> Hi Peter, sorry if I'm not very enthusiastic by this, I'm sure you
>> understand the rework is a little frustrating.
>
> That's OK.
>
> [For some reason my email sync program decided to give up working for
>  hours.  I got more time looking at a tsc bug, which is good, but then I
>  miss a lot of emails..]
>
>> 
>> > On Wed, Aug 28, 2024 at 09:41:17PM -0300, Fabiano Rosas wrote:
>> >> > +size_t multifd_device_state_payload_size(void)
>> >> > +{
>> >> > +return sizeof(MultiFDDeviceState_t);
>> >> > +}
>> >> 
>> >> This will not be necessary because the payload size is the same as the
>> >> data type. We only need it for the special case where the MultiFDPages_t
>> >> is smaller than the total ram payload size.
>> >
>> > Today I was thinking maybe we should really clean this up, as the current
>> > multifd_send_data_alloc() is indeed too tricky (blame me.. who requested
>> > that more or less).  Knowing that VFIO can use dynamic buffers with ->idstr
>> > and ->buf (I was thinking it could be buf[1M].. but I was wrong...) made
>> > that feeling stronger.
>> 
>> If we're going to commit bad code and then rewrite it a week later, we
>> could have just let the original series from Maciej merge without any of
>> this.
>
> Why it's "bad code"?
>
> It runs pretty well, I don't think it's bad code.  You wrote it, and I
> don't think it's bad at all.

Code that forces us to do arithmetic in order to properly allocate
memory and comes with a big comment explaining how we're dodging
compiler warnings is bad code in my book.

>
> But now we're rethinking after reading Maciej's new series.
>Personally I don't think it's a major problem.
>
> Note that we're not changing the design back: what was initially proposed
> was the client submitting an array of multifd objects.  I still don't think
> that's right.
>
> What the change goes so far is we make the union a struct, however that's
> still N+2 objects not 2*N, where 2 came from RAM+VFIO.  I think the
> important bits are still there (from your previous refactor series).
>

You fail to appreciate that before the RFC series, multifd already
allocated N for the pages. The device state adds another client, so that
would be another N anyway. The problem the RFC tried to solve was that
multifd channels owned that 2N, so the array was added to move the
memory into the client's ownership. IOW, it wasn't even the RFC series
that made it 2N, that was the multifd design all along. Now in hindsight
I don't think we should have went with the memory saving discussion.

>> I already suggested it a couple of times, we shouldn't be doing
>> core refactorings underneath contributors' patches, this is too
>> fragile. Just let people contribute their code and we can change it
>> later.
>
> I sincerely don't think a lot needs changing... only patch 1.  Basically
> patch 1 on top of your previous rework series will be at least what I want,
> but I'm open to comments from you guys.

Don't get me wrong, I'm very much in favor of what you're doing
here. However, I don't think it's ok to be backtracking on our design
while other people have series in flight that depend on it. You
certainly know the feeling of trying to merge a feature and having
maintainers ask you to rewrite a bunch code just to be able to start
working. That's not ideal.

I tried to quickly insert the RFC series before the device state series
progressed too much, but it's 3 months later and we're still discussing
it, maybe we don't need to do it this way.

And ok, let's consider the current situation a special case. But I would
like to avoid in the future this kind of uncertainty. 

>
> Note that patch 2-3 will be on top of Maciej's changes and they're totally
> not relevant to what we merged so far.  Hence, nothing relevant there to
> what you worked.  And this is the diff of patch 1:
>
>  migration/multifd.h  | 16 +++-
>  migration/multifd-device-state.c |  8 ++--
>  migration/multifd-nocomp.c   | 13 ++---
>  migration/multifd.c  | 25 ++---
>  4 files changed, 29 insertions(+), 33 deletions(-)
>
> It's only 33 lines removed (many of which are comments..), it's not a huge
> lot.  I don't know why you feel so bad at this...
>
> It's probably becau

Re: [PATCH] target/ppc: Fix inequality check in do_lstxv_X

2024-09-13 Thread Fabiano Rosas
Harsh Prateek Bora  writes:

> This fix was earlier introduced for do_lstxv_D form with 2cc0e449d173
> however got missed for _X form. This patch fixes the same.
>
> Cc: qemu-sta...@nongnu.org
> Suggested-by: Fabiano Rosas 
> Fixes: 70426b5bb738 ("target/ppc: moved stxvx and lxvx from legacy to 
> decodtree")
> Signed-off-by: Harsh Prateek Bora 
> ---
>  target/ppc/translate/vsx-impl.c.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/ppc/translate/vsx-impl.c.inc 
> b/target/ppc/translate/vsx-impl.c.inc
> index 40a87ddc4a..a869f30e86 100644
> --- a/target/ppc/translate/vsx-impl.c.inc
> +++ b/target/ppc/translate/vsx-impl.c.inc
> @@ -2244,7 +2244,7 @@ static bool do_lstxv_PLS_D(DisasContext *ctx, arg_PLS_D 
> *a,
>  
>  static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
>  {
> -if (paired || a->rt >= 32) {
> +if (paired || a->rt < 32) {
>  REQUIRE_VSX(ctx);
>  } else {
>  REQUIRE_VECTOR(ctx);

Hi Harsh,

Seems I was quicker than you =)

https://lore.kernel.org/r/20240911141651.6914-1-faro...@suse.de

I'll give my RB and leave up to the maintainers which patch to take:

Reviewed-by: Fabiano Rosas 



Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-12 Thread Fabiano Rosas
Fabiano Rosas  writes:

> Peter Xu  writes:
>
>> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>>> > I don't think we're discussing total CI time at this point, so the math
>>> > doesn't really add up. We're not looking into making the CI finish
>>> > faster. We're looking into making migration-test finish faster. That
>>> > would reduce timeouts in CI, speed-up make check and reduce the chance
>>> > of random race conditions* affecting other people/staging runs.
>>> 
>>> Right. The reason migration-test appears on my radar is because
>>> it is very frequently the thing that shows up as "this sometimes
>>> just fails or just times out and if you hit retry it goes away
>>> again". That might not be migration-test's fault specifically,
>>> because those retries tend to be certain CI configs (s390,
>>> the i686-tci one), and I have some theories about what might be
>>> causing it (e.g. build system runs 4 migration-tests in parallel,
>>> which means 8 QEMU processes which is too many for the number
>>> of host CPUs). But right now I look at CI job failures and my reaction
>>> is "oh, it's the migration-test failing yet again" :-(
>>> 
>>> For some examples from this week:
>>> 
>>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <[1]
>>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <[2]
>>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>>
>> Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
>> parallel.  It indeed sounds like no good way to finally solve.. I don't
>> also see how speeding up / reducing tests in migration test would help, as
>> that's (from some degree..) is the same as tuning the timeout value bigger.
>> When the tests are less it'll fit into 480s window, but maybe it's too
>> quick now we wonder whether we should shrink it to e.g. 90s, but then it
>> can timeout again when on a busy host with less capability of concurrency.
>>
>> But indeed there're two ERRORs ([1,2] above)..  I collected some more info
>> here before the log expires:
>>
>> =8<
>>
>> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>>
>> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test
>>  ERROR  144.32s   killed by signal 6 SIGABRT
>>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>>>  
>>>>> PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>>>  QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144 
>>>>> QTEST_QEMU_BINARY=./qemu-system-i386 
>>>>> /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test
>>>>>  --tap -k
>> ― ✀  
>> ―
>> stderr:
>> warning: fd: migration to a file is deprecated. Use file: instead.
>> warning: fd: migration to a file is deprecated. Use file: instead.
>> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 
>> 11 (Segmentation fault) (core dumped)
>> (test program exited with status code -6)
>> TAP parsing error: Too few tests run (expected 53, got 39)
>> ――
>>
>> # Start of plain tests
>> # Running /i386/migration/multifd/tcp/plain/cancel
>> # Using machine type: pc-i440fx-9.2
>> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock 
>> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0 
>> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel 
>> tcg -machine pc-i440fx-9.2, -name source,debug-threads=on -m 150M -serial 
>> file:/tmp/migration-test-4112T2/src_serial -drive 
>> if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device 
>> ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest
>> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtes

Re: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side

2024-09-12 Thread Fabiano Rosas
Peter Xu  writes:

Hi Peter, sorry if I'm not very enthusiastic by this, I'm sure you
understand the rework is a little frustrating.

> On Wed, Aug 28, 2024 at 09:41:17PM -0300, Fabiano Rosas wrote:
>> > +size_t multifd_device_state_payload_size(void)
>> > +{
>> > +return sizeof(MultiFDDeviceState_t);
>> > +}
>> 
>> This will not be necessary because the payload size is the same as the
>> data type. We only need it for the special case where the MultiFDPages_t
>> is smaller than the total ram payload size.
>
> Today I was thinking maybe we should really clean this up, as the current
> multifd_send_data_alloc() is indeed too tricky (blame me.. who requested
> that more or less).  Knowing that VFIO can use dynamic buffers with ->idstr
> and ->buf (I was thinking it could be buf[1M].. but I was wrong...) made
> that feeling stronger.

If we're going to commit bad code and then rewrite it a week later, we
could have just let the original series from Maciej merge without any of
this. I already suggested it a couple of times, we shouldn't be doing
core refactorings underneath contributors' patches, this is too
fragile. Just let people contribute their code and we can change it
later.

This is also why I've been trying hard to separate core multifd
functionality from migration code that uses multifd to transmit their
data.

My original RFC plus the suggestion to extend multifd_ops for device
state would have (almost) made it so that no client code would be left
in multifd. We could have been turning this thing upside down and it
wouldn't affect anyone in terms of code conflicts.

The ship has already sailed, so your patches below are fine, I have just
some small comments.

>
> I think we should change it now perhaps, otherwise we'll need to introduce
> other helpers to e.g. reset the device buffers, and that's not only slow
> but also not good looking, IMO.

I agree that part is kind of a sore thumb.

>
> So I went ahead with the idea in previous discussion, that I managed to
> change the SendData union into struct; the memory consumption is not super
> important yet, IMHO, but we should still stick with the object model where
> multifd enqueue thread switch buffer with multifd, as it still sounds a
> sane way to do.
>
> Then when that patch is ready, I further tried to make VFIO reuse multifd
> buffers just like what we do with MultiFDPages_t->offset[]: in RAM code we
> don't allocate it every time we enqueue.
>
> I hope it'll also work for VFIO.  VFIO has a specialty on being able to
> dump the config space so it's more complex (and I noticed Maciej's current
> design requires the final chunk of VFIO config data be migrated in one
> packet.. that is also part of the complexity there).  So I allowed that
> part to allocate a buffer but only that.  IOW, I made some API (see below)
> that can either reuse preallocated buffer, or use a separate one only for
> the final bulk.
>
> In short, could both of you have a look at what I came up with below?  I
> did that in patches because I think it's too much to comment, so patches
> may work better.  No concern if any of below could be good changes to you,
> then either Maciej can squash whatever into existing patches (and I feel
> like some existing patches in this series can go away with below design),
> or I can post pre-requisite patch but only if any of you prefer that.
>
> Anyway, let me know, the patches apply on top of this whole series applied
> first.
>
> I also wonder whether there can be any perf difference already (I tested
> all multifd qtest with below, but no VFIO I can run), perhaps not that
> much, but just to mention below should avoid both buffer allocations and
> one round of copy (so VFIO read() directly writes to the multifd buffers
> now).
>
> Thanks,
>
> ==8<==
> From a6cbcf692b2376e72cc053219d67bb32eabfb7a6 Mon Sep 17 00:00:00 2001
> From: Peter Xu 
> Date: Tue, 10 Sep 2024 12:10:59 -0400
> Subject: [PATCH 1/3] migration/multifd: Make MultiFDSendData a struct
>
> The newly introduced device state buffer can be used for either storing
> VFIO's read() raw data, but already also possible to store generic device
> states.  After noticing that device states may not easily provide a max
> buffer size (also the fact that RAM MultiFDPages_t after all also want to
> have flexibility on managing offset[] array), it may not be a good idea to
> stick with union on MultiFDSendData.. as it won't play well with such
> flexibility.
>
> Switch MultiFDSendData to a struct.
>
> It won't consume a lot more space in reality, after all the real buffers
> were already dynamically allocated, so it&#

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-12 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>> > I don't think we're discussing total CI time at this point, so the math
>> > doesn't really add up. We're not looking into making the CI finish
>> > faster. We're looking into making migration-test finish faster. That
>> > would reduce timeouts in CI, speed-up make check and reduce the chance
>> > of random race conditions* affecting other people/staging runs.
>> 
>> Right. The reason migration-test appears on my radar is because
>> it is very frequently the thing that shows up as "this sometimes
>> just fails or just times out and if you hit retry it goes away
>> again". That might not be migration-test's fault specifically,
>> because those retries tend to be certain CI configs (s390,
>> the i686-tci one), and I have some theories about what might be
>> causing it (e.g. build system runs 4 migration-tests in parallel,
>> which means 8 QEMU processes which is too many for the number
>> of host CPUs). But right now I look at CI job failures and my reaction
>> is "oh, it's the migration-test failing yet again" :-(
>> 
>> For some examples from this week:
>> 
>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373  <[1]
>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152  <[2]
>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>
> Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
> parallel.  It indeed sounds like no good way to finally solve.. I don't
> also see how speeding up / reducing tests in migration test would help, as
> that's (from some degree..) is the same as tuning the timeout value bigger.
> When the tests are less it'll fit into 480s window, but maybe it's too
> quick now we wonder whether we should shrink it to e.g. 90s, but then it
> can timeout again when on a busy host with less capability of concurrency.
>
> But indeed there're two ERRORs ([1,2] above)..  I collected some more info
> here before the log expires:
>
> =8<
>
> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>
> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test 
> ERROR  144.32s   killed by signal 6 SIGABRT
>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>>  
>>>> PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>>  QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144 
>>>> QTEST_QEMU_BINARY=./qemu-system-i386 
>>>> /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test
>>>>  --tap -k
> ― ✀  ―
> stderr:
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11 
> (Segmentation fault) (core dumped)
> (test program exited with status code -6)
> TAP parsing error: Too few tests run (expected 53, got 39)
> ――
>
> # Start of plain tests
> # Running /i386/migration/multifd/tcp/plain/cancel
> # Using machine type: pc-i440fx-9.2
> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock 
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0 
> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel 
> tcg -machine pc-i440fx-9.2, -name source,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-4112T2/src_serial -drive 
> if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=12>/dev/null -accel qtest
> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock 
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0 
> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel 
> tcg -machine pc-i440fx-9.2, -name target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-4112T2/dest_serial -incoming defer -drive 
> if=non

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-12 Thread Fabiano Rosas
Peter Maydell  writes:

> On Thu, 12 Sept 2024 at 14:48, Fabiano Rosas  wrote:
>> Peter Maydell  writes:
>> > For some examples from this week:
>> >
>> > https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>> > https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>> > https://gitlab.com/qemu-project/qemu/-/jobs/7786579152
>> > https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>>
>> About these:
>>
>> There are 2 instances of plain-old-SIGSEGV here. Both happen in
>> non-x86_64 runs and on the /multifd/tcp/plain/cancel test, which means
>> they're either races or memory ordering issues. Having i386 crashing
>> points to the former. So having the CI loaded and causing timeouts is
>> probably what exposed the issue.
>
> They're also both TCI. Would these tests be relying on
> specific atomic-access behaviour in the guest code that's
> running, or is all the avoidance-of-races in the migration
> code in QEMU itself?

I misspoke about memory ordering, this is all just the x86 host and the
multifd threads in QEMU having synchronization issues.

>
> (I don't know of any particular problems with TCI's
> implementation of atomic accesses, so this is just a stab
> in the dark.)
>
> thanks
> -- PMM



Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-12 Thread Fabiano Rosas
Peter Maydell  writes:

> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas  wrote:
>> I don't think we're discussing total CI time at this point, so the math
>> doesn't really add up. We're not looking into making the CI finish
>> faster. We're looking into making migration-test finish faster. That
>> would reduce timeouts in CI, speed-up make check and reduce the chance
>> of random race conditions* affecting other people/staging runs.
>
> Right. The reason migration-test appears on my radar is because
> it is very frequently the thing that shows up as "this sometimes
> just fails or just times out and if you hit retry it goes away
> again". That might not be migration-test's fault specifically,
> because those retries tend to be certain CI configs (s390,
> the i686-tci one), and I have some theories about what might be
> causing it (e.g. build system runs 4 migration-tests in parallel,
> which means 8 QEMU processes which is too many for the number
> of host CPUs). But right now I look at CI job failures and my reaction
> is "oh, it's the migration-test failing yet again" :-(

And then I go: "oh, people complaining about migration-test again, I
thought we had fixed all the issues this time". It's frustrating for
everyone, as I said previously.

>
> For some examples from this week:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152
> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155

About these:

There are 2 instances of plain-old-SIGSEGV here. Both happen in
non-x86_64 runs and on the /multifd/tcp/plain/cancel test, which means
they're either races or memory ordering issues. Having i386 crashing
points to the former. So having the CI loaded and causing timeouts is
probably what exposed the issue.

The thread is mig/dst/recv_7 and grepping the objdump output shows:
 55 48 89 e5 48 89 7d e8 48 89 75 e0 48 8b 45 e8 83 e0
3f ba 01 00 00 00 89 c1 48 d3 e2 48 89 d0 48 89 45 f0 48 8b 45 e8 48 c1
e8 06 48 8d 14 c5 00 00 00 00 48 8b 45 e0 48 01 d0 48 89 45 f8 48 8b 45
f8 48 8b 55 f0  48 09 10 90 5d c3

I tried a bisect overnight, but it seems the issue has been there since
before 9.0. I'll try to repro with gdb attached or get a core dump.



Re: [PATCH v2 09/17] migration/multifd: Device state transfer support - receive side

2024-09-12 Thread Fabiano Rosas
Avihai Horon  writes:

> On 09/09/2024 21:05, Maciej S. Szmigiero wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 5.09.2024 18:47, Avihai Horon wrote:
>>>
>>> On 27/08/2024 20:54, Maciej S. Szmigiero wrote:
 External email: Use caution opening links or attachments


 From: "Maciej S. Szmigiero" 

 Add a basic support for receiving device state via multifd channels -
 channels that are shared with RAM transfers.

 To differentiate between a device state and a RAM packet the packet
 header is read first.

 Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not 
 in the
 packet header either device state (MultiFDPacketDeviceState_t) or RAM
 data (existing MultiFDPacket_t) is then read.

 The received device state data is provided to
 qemu_loadvm_load_state_buffer() function for processing in the
 device's load_state_buffer handler.

 Signed-off-by: Maciej S. Szmigiero 
 ---
   migration/multifd.c | 127 
 +---
   migration/multifd.h |  31 ++-
   2 files changed, 138 insertions(+), 20 deletions(-)

 diff --git a/migration/multifd.c b/migration/multifd.c
 index b06a9fab500e..d5a8e5a9c9b5 100644
 --- a/migration/multifd.c
 +++ b/migration/multifd.c
 @@ -21,6 +21,7 @@
   #include "file.h"
   #include "migration.h"
   #include "migration-stats.h"
 +#include "savevm.h"
   #include "socket.h"
   #include "tls.h"
   #include "qemu-file.h"
 @@ -209,10 +210,10 @@ void 
 multifd_send_fill_packet(MultiFDSendParams *p)

   memset(packet, 0, p->packet_len);

 -    packet->magic = cpu_to_be32(MULTIFD_MAGIC);
 -    packet->version = cpu_to_be32(MULTIFD_VERSION);
 +    packet->hdr.magic = cpu_to_be32(MULTIFD_MAGIC);
 +    packet->hdr.version = cpu_to_be32(MULTIFD_VERSION);

 -    packet->flags = cpu_to_be32(p->flags);
 +    packet->hdr.flags = cpu_to_be32(p->flags);
   packet->next_packet_size = cpu_to_be32(p->next_packet_size);

   packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
 @@ -228,31 +229,49 @@ void 
 multifd_send_fill_packet(MultiFDSendParams *p)
   p->flags, p->next_packet_size);
   }

 -static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error 
 **errp)
 +static int multifd_recv_unfill_packet_header(MultiFDRecvParams *p,
 + MultiFDPacketHdr_t *hdr,
 + Error **errp)
   {
 -    MultiFDPacket_t *packet = p->packet;
 -    int ret = 0;
 -
 -    packet->magic = be32_to_cpu(packet->magic);
 -    if (packet->magic != MULTIFD_MAGIC) {
 +    hdr->magic = be32_to_cpu(hdr->magic);
 +    if (hdr->magic != MULTIFD_MAGIC) {
   error_setg(errp, "multifd: received packet "
  "magic %x and expected magic %x",
 -   packet->magic, MULTIFD_MAGIC);
 +   hdr->magic, MULTIFD_MAGIC);
   return -1;
   }

 -    packet->version = be32_to_cpu(packet->version);
 -    if (packet->version != MULTIFD_VERSION) {
 +    hdr->version = be32_to_cpu(hdr->version);
 +    if (hdr->version != MULTIFD_VERSION) {
   error_setg(errp, "multifd: received packet "
  "version %u and expected version %u",
 -   packet->version, MULTIFD_VERSION);
 +   hdr->version, MULTIFD_VERSION);
   return -1;
   }

 -    p->flags = be32_to_cpu(packet->flags);
 +    p->flags = be32_to_cpu(hdr->flags);
 +
 +    return 0;
 +}
 +
 +static int 
 multifd_recv_unfill_packet_device_state(MultiFDRecvParams *p,
 +   Error **errp)
 +{
 +    MultiFDPacketDeviceState_t *packet = p->packet_dev_state;
 +
 +    packet->instance_id = be32_to_cpu(packet->instance_id);
 +    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
 +
 +    return 0;
 +}
 +
 +static int multifd_recv_unfill_packet_ram(MultiFDRecvParams *p, 
 Error **errp)
 +{
 +    MultiFDPacket_t *packet = p->packet;
 +    int ret = 0;
 +
   p->next_packet_size = be32_to_cpu(packet->next_packet_size);
   p->packet_num = be64_to_cpu(packet->packet_num);
 -    p->packets_recved++;

   if (!(p->flags & MULTIFD_FLAG_SYNC)) {
   ret = multifd_ram_unfill_packet(p, errp);
 @@ -264,6 +283,19 @@ static int 
 multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
   return ret;
   }

 +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error 
 **errp)
 +{
 +    p->packets_recved++;
 +
 +    if (p->fl

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-11 Thread Fabiano Rosas
Peter Xu  writes:

> On Wed, Sep 11, 2024 at 04:48:21PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Tue, Sep 10, 2024 at 07:23:43PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> > On Mon, Sep 09, 2024 at 06:54:46PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu  writes:
>> >> >> 
>> >> >> > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
>> >> >> >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> > Despite the fact that the responsive CPU throttle is enabled,
>> >> >> >> > the dirty sync count may not always increase because this is
>> >> >> >> > an optimization that might not happen in any situation.
>> >> >> >> >
>> >> >> >> > This test case just making sure it doesn't interfere with any
>> >> >> >> > current functionality.
>> >> >> >> >
>> >> >> >> > Signed-off-by: Hyman Huang 
>> >> >> >> 
>> >> >> >> tests/qtest/migration-test already runs 75 different
>> >> >> >> subtests, takes up a massive chunk of our "make check"
>> >> >> >> time, and is very commonly a "times out" test on some
>> >> >> >> of our CI jobs. It runs on five different guest CPU
>> >> >> >> architectures, each one of which takes between 2 and
>> >> >> >> 5 minutes to complete the full migration-test.
>> >> >> >> 
>> >> >> >> Do we really need to make it even bigger?
>> >> >> >
>> >> >> > I'll try to find some time in the next few weeks looking into this 
>> >> >> > to see
>> >> >> > whether we can further shrink migration test times after previous 
>> >> >> > attemps
>> >> >> > from Dan.  At least a low hanging fruit is we should indeed put some 
>> >> >> > more
>> >> >> > tests into g_test_slow(), and this new test could also be a 
>> >> >> > candidate (then
>> >> >> > we can run "-m slow" for migration PRs only).
>> >> >> 
>> >> >> I think we could (using -m slow or any other method) separate tests
>> >> >> that are generic enough that every CI run should benefit from them
>> >> >> vs. tests that are only useful once someone starts touching migration
>> >> >> code. I'd say very few in the former category and most of them in the
>> >> >> latter.
>> >> >> 
>> >> >> For an idea of where migration bugs lie, I took a look at what was
>> >> >> fixed since 2022:
>> >> >> 
>> >> >> # bugs | device/subsystem/arch
>> >> >> --
>> >> >> 54 | migration
>> >> >> 10 | vfio
>> >> >>  6 | ppc
>> >> >>  3 | virtio-gpu
>> >> >>  2 | pcie_sriov, tpm_emulator,
>> >> >>   vdpa, virtio-rng-pci
>> >> >>  1 | arm, block, gpio, lasi,
>> >> >>   pci, s390, scsi-disk,
>> >> >>   virtio-mem, TCG
>> >> >
>> >> > Just curious; how did you collect these?
>> >> 
>> >> git log --since=2022 and then squinted at it. I wrote a warning to take
>> >> this with a grain of salt, but it missed the final version.
>> >> 
>> >> >
>> >> >> 
>> >> >> From these, ignoring the migration bugs, the migration-tests cover some
>> >> >> of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, 
>> >> >> but
>> >> >> once it is, then virtio-gpu would be covered and we could investigate
>> >> >> adding some of the others.
>> >> >> 
>> >> >> For actual migration code issues:
>> >> >> 
>> >> >> # bugs | (sub)subsystem | kind
>> >> >> --
>> >> >> 13 | multifd| correctness/races
>> >> >>  8

Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-11 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Sep 10, 2024 at 07:23:43PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Mon, Sep 09, 2024 at 06:54:46PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
>> >> >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  
>> >> >> wrote:
>> >> >> >
>> >> >> > Despite the fact that the responsive CPU throttle is enabled,
>> >> >> > the dirty sync count may not always increase because this is
>> >> >> > an optimization that might not happen in any situation.
>> >> >> >
>> >> >> > This test case just making sure it doesn't interfere with any
>> >> >> > current functionality.
>> >> >> >
>> >> >> > Signed-off-by: Hyman Huang 
>> >> >> 
>> >> >> tests/qtest/migration-test already runs 75 different
>> >> >> subtests, takes up a massive chunk of our "make check"
>> >> >> time, and is very commonly a "times out" test on some
>> >> >> of our CI jobs. It runs on five different guest CPU
>> >> >> architectures, each one of which takes between 2 and
>> >> >> 5 minutes to complete the full migration-test.
>> >> >> 
>> >> >> Do we really need to make it even bigger?
>> >> >
>> >> > I'll try to find some time in the next few weeks looking into this to 
>> >> > see
>> >> > whether we can further shrink migration test times after previous 
>> >> > attemps
>> >> > from Dan.  At least a low hanging fruit is we should indeed put some 
>> >> > more
>> >> > tests into g_test_slow(), and this new test could also be a candidate 
>> >> > (then
>> >> > we can run "-m slow" for migration PRs only).
>> >> 
>> >> I think we could (using -m slow or any other method) separate tests
>> >> that are generic enough that every CI run should benefit from them
>> >> vs. tests that are only useful once someone starts touching migration
>> >> code. I'd say very few in the former category and most of them in the
>> >> latter.
>> >> 
>> >> For an idea of where migration bugs lie, I took a look at what was
>> >> fixed since 2022:
>> >> 
>> >> # bugs | device/subsystem/arch
>> >> --
>> >> 54 | migration
>> >> 10 | vfio
>> >>  6 | ppc
>> >>  3 | virtio-gpu
>> >>  2 | pcie_sriov, tpm_emulator,
>> >>   vdpa, virtio-rng-pci
>> >>  1 | arm, block, gpio, lasi,
>> >>   pci, s390, scsi-disk,
>> >>   virtio-mem, TCG
>> >
>> > Just curious; how did you collect these?
>> 
>> git log --since=2022 and then squinted at it. I wrote a warning to take
>> this with a grain of salt, but it missed the final version.
>> 
>> >
>> >> 
>> >> From these, ignoring the migration bugs, the migration-tests cover some
>> >> of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but
>> >> once it is, then virtio-gpu would be covered and we could investigate
>> >> adding some of the others.
>> >> 
>> >> For actual migration code issues:
>> >> 
>> >> # bugs | (sub)subsystem | kind
>> >> --
>> >> 13 | multifd| correctness/races
>> >>  8 | ram| correctness
>> >>  8 | rdma:  | general programming
>> >
>> > 8 rdma bugs??? ouch..
>> 
>> Mostly caught by a cleanup from Markus. Silly stuff like of mixed signed
>> integer comparisons and bugs in error handling. I don't even want to
>> look too much at it.
>> 
>> ...hopefully this release we'll manage to resolve that situation.
>> 
>> >
>> >>  7 | qmp| new api bugs
>> >>  5 | postcopy   | races
>> >>  4 | file:  | leaks
>> >>  3 | return path| races
>> >>  3 | fd_cleanup | races
>> >>  2 | savevm, aio/coroutines
>> >>  1 | xbzrle, colo, dirtyrate, ex

Re: [PATCH] migration/multifd: Fix build for qatzip

2024-09-11 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Sep 10, 2024 at 07:32:19PM -0300, Fabiano Rosas wrote:
>> I'm trying to find a way of having more code compiled by default and
>> only a minimal amount of code put under the CONFIG_FOO options. So if
>> some multifd code depends on a library call, say deflateInit, we make
>> that a multifd_deflate_init and add a stub for when !ZLIB (just an
>> example). I'm not sure it's feasible though, I'm just bouncing the idea
>> off of you.
>
> Not sure how much it helps.  It adds more work, add slightly more code to
> maintain (then we will then need to maintain the shim layer, and that's
> per-compressor), while I am not sure it'll be good enough either..  For
> example, even if it compiles it can still run into constant failure when
> with the real library / hardware underneath.
>
> This not so bad to me yet: do you still remember or aware of the "joke" on
> how people remove a feature in Linux?  One can introduce a bug that can
> directly crash when some feature enabled, then after two years the
> developer can say "see, this feature is not used by anyone, let's remove
> it".
>
> I think it's a joke (which might come from reality..) but it's kind of a
> way that how we should treat these compressors as a start, IMHO.  AFAIU
> many of these compressors start with PoC-type projects where it's used to
> justify the hardware features.  The next step is in production use but that
> requires software vendors to involve, IIUC.  I think that's what we're
> waiting for, on company use it in more serious way that sign these features
> off.
>
> I don't think all such compressors will reach that point.  Meanwhile I
> don't think we (as qemu migration maintainers) can maintain that code well,
> if we don't get sponsored by people with hardwares to test.
>
> I think it means it's not our job to maintain it at 100%, yet so far.  We
> will still try our best, but that's always limited.  As we discussed
> before, we always need to rely on vendors so far for most of them.
>
> If after a few releases we found it's broken so bad, it may mean it
> finished its job as PoC or whatever purpose it services.  It means we could
> choose to move on, with no joking.
>
> That's why I think it's not so urgent, and maybe we don't need extra effort
> to make it harder for us to notice nobody is using it - we keep everything
> we know productions are actively using seriously (like multifd, postcopy,
> etc.).  Either some compressors become part of the serious use case, or we
> move on.  I recently do find more that the only way to make QEMU keep
> living well is to sometimes throw things away..

Ok, that's all fair. I agree we can continue with that policy. Thanks
for sharing your thoughts.



[PATCH] tests/qtest/migration: Move a couple of slow tests under g_test_slow

2024-09-11 Thread Fabiano Rosas
The xbzrel and vcpu_dirty_limit are the two slowest tests from
migration-test. Move them under g_test_slow() to save about 40s per
run.

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d6768d5d71..814ec109a6 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3803,8 +3803,10 @@ int main(int argc, char **argv)
 
 migration_test_add("/migration/precopy/unix/plain",
test_precopy_unix_plain);
-migration_test_add("/migration/precopy/unix/xbzrle",
-   test_precopy_unix_xbzrle);
+if (g_test_slow()) {
+migration_test_add("/migration/precopy/unix/xbzrle",
+   test_precopy_unix_xbzrle);
+}
 migration_test_add("/migration/precopy/file",
test_precopy_file);
 migration_test_add("/migration/precopy/file/offset",
@@ -3979,7 +3981,7 @@ int main(int argc, char **argv)
 if (g_str_equal(arch, "x86_64") && has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_ring",
test_precopy_unix_dirty_ring);
-if (qtest_has_machine("pc")) {
+if (qtest_has_machine("pc") && g_test_slow()) {
 migration_test_add("/migration/vcpu_dirty_limit",
test_vcpu_dirty_limit);
 }
-- 
2.35.3




[PATCH] target/ppc: Fix lxvx/stxvx facility check

2024-09-11 Thread Fabiano Rosas
The XT check for the lxvx/stxvx instructions is currently
inverted. This was introduced during the move to decodetree.

>From the ISA:
  Chapter 7. Vector-Scalar Extension Facility
  Load VSX Vector Indexed X-form

  lxvx XT,RA,RB
  if TX=0 & MSR.VSX=0 then VSX_Unavailable()
  if TX=1 & MSR.VEC=0 then Vector_Unavailable()
  ...
  Let XT be the value 32×TX + T.

The code currently does the opposite:

if (paired || a->rt >= 32) {
REQUIRE_VSX(ctx);
} else {
REQUIRE_VECTOR(ctx);
}

This was already fixed for lxv/stxv at commit "2cc0e449d1 (target/ppc:
Fix lxv/stxv MSR facility check)", but the indexed forms were missed.

Cc: qemu-sta...@nongnu.org
Fixes: 70426b5bb7 ("target/ppc: moved stxvx and lxvx from legacy to decodtree")
Signed-off-by: Fabiano Rosas 
---
 target/ppc/translate/vsx-impl.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index 40a87ddc4a..a869f30e86 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2244,7 +2244,7 @@ static bool do_lstxv_PLS_D(DisasContext *ctx, arg_PLS_D 
*a,
 
 static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
 {
-if (paired || a->rt >= 32) {
+if (paired || a->rt < 32) {
 REQUIRE_VSX(ctx);
 } else {
 REQUIRE_VECTOR(ctx);
-- 
2.35.3




Re: [PATCH] migration/multifd: Fix build for qatzip

2024-09-10 Thread Fabiano Rosas
Peter Xu  writes:

> On Tue, Sep 10, 2024 at 06:35:50PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > The qatzip series was based on an older commit, it applied cleanly even
>> > though it has conflicts.  Neither CI nor myself found the build will break
>> > as it's skipped by default when qatzip library was missing.
>> 
>> It took longer than I expected.
>
> What took longer?

For a change that breaks the build to be committed in one of these parts
of the code that are disabled by default. You might remember I told you
in one of our meetings that I was concerned about that.

>
>> 
>> Do you think it would work if we wrapped all calls to external functions
>> in a header and stubbed them out when there's no accelerator support?
>
> I didn't catch the major benefit v.s. multifd_register_ops().  Any further
> elaborations?

I'm trying to find a way of having more code compiled by default and
only a minimal amount of code put under the CONFIG_FOO options. So if
some multifd code depends on a library call, say deflateInit, we make
that a multifd_deflate_init and add a stub for when !ZLIB (just an
example). I'm not sure it's feasible though, I'm just bouncing the idea
off of you.



Re: [PATCH 21/39] migration: replace assert(false) with g_assert_not_reached()

2024-09-10 Thread Fabiano Rosas
Pierrick Bouvier  writes:

> Signed-off-by: Pierrick Bouvier 
> ---
>  migration/dirtyrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index 1d9db812990..a28c07327e8 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -228,7 +228,7 @@ static int time_unit_to_power(TimeUnit time_unit)
>  case TIME_UNIT_MILLISECOND:
>  return -3;
>  default:
> -assert(false); /* unreachable */
> +g_assert_not_reached(); /* unreachable */
>  return 0;
>  }
>  }

You could drop the comment that's now redundant.

Reviewed-by: Fabiano Rosas 



Re: [PATCH 08/39] migration: replace assert(0) with g_assert_not_reached()

2024-09-10 Thread Fabiano Rosas
Pierrick Bouvier  writes:

> Signed-off-by: Pierrick Bouvier 
> ---
>  migration/migration-hmp-cmds.c |  2 +-
>  migration/postcopy-ram.c   | 14 +++---
>  migration/ram.c|  6 +++---
>  3 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7d608d26e19..e6e96aa6288 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -636,7 +636,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
> *qdict)
>  visit_type_bool(v, param, &p->direct_io, &err);
>  break;
>  default:
> -assert(0);
> +g_assert_not_reached();
>  }
>  
>  if (err) {
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 1c374b7ea1e..f431bbc0d4f 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1411,40 +1411,40 @@ int postcopy_ram_incoming_init(MigrationIncomingState 
> *mis)
>  
>  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>   uint64_t client_addr, uint64_t rb_offset)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  RAMBlock *rb)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>  RAMBlock *rb)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
> @@ -1452,7 +1452,7 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
>   uint64_t client_addr,
>   RAMBlock *rb)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index 67ca3d5d51a..0aa5d347439 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1765,19 +1765,19 @@ bool ram_write_tracking_available(void)
>  
>  bool ram_write_tracking_compatible(void)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return false;
>  }
>  
>  int ram_write_tracking_start(void)
>  {
> -assert(0);
> +g_assert_not_reached();
>  return -1;
>  }
>  
>  void ram_write_tracking_stop(void)
>  {
> -assert(0);
> +g_assert_not_reached();
>  }
>  #endif /* defined(__linux__) */

Reviewed-by: Fabiano Rosas 



Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-10 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Sep 09, 2024 at 06:54:46PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
>> >> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  wrote:
>> >> >
>> >> > Despite the fact that the responsive CPU throttle is enabled,
>> >> > the dirty sync count may not always increase because this is
>> >> > an optimization that might not happen in any situation.
>> >> >
>> >> > This test case just making sure it doesn't interfere with any
>> >> > current functionality.
>> >> >
>> >> > Signed-off-by: Hyman Huang 
>> >> 
>> >> tests/qtest/migration-test already runs 75 different
>> >> subtests, takes up a massive chunk of our "make check"
>> >> time, and is very commonly a "times out" test on some
>> >> of our CI jobs. It runs on five different guest CPU
>> >> architectures, each one of which takes between 2 and
>> >> 5 minutes to complete the full migration-test.
>> >> 
>> >> Do we really need to make it even bigger?
>> >
>> > I'll try to find some time in the next few weeks looking into this to see
>> > whether we can further shrink migration test times after previous attemps
>> > from Dan.  At least a low hanging fruit is we should indeed put some more
>> > tests into g_test_slow(), and this new test could also be a candidate (then
>> > we can run "-m slow" for migration PRs only).
>> 
>> I think we could (using -m slow or any other method) separate tests
>> that are generic enough that every CI run should benefit from them
>> vs. tests that are only useful once someone starts touching migration
>> code. I'd say very few in the former category and most of them in the
>> latter.
>> 
>> For an idea of where migration bugs lie, I took a look at what was
>> fixed since 2022:
>> 
>> # bugs | device/subsystem/arch
>> --
>> 54 | migration
>> 10 | vfio
>>  6 | ppc
>>  3 | virtio-gpu
>>  2 | pcie_sriov, tpm_emulator,
>>   vdpa, virtio-rng-pci
>>  1 | arm, block, gpio, lasi,
>>   pci, s390, scsi-disk,
>>   virtio-mem, TCG
>
> Just curious; how did you collect these?

git log --since=2022 and then squinted at it. I wrote a warning to take
this with a grain of salt, but it missed the final version.

>
>> 
>> From these, ignoring the migration bugs, the migration-tests cover some
>> of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but
>> once it is, then virtio-gpu would be covered and we could investigate
>> adding some of the others.
>> 
>> For actual migration code issues:
>> 
>> # bugs | (sub)subsystem | kind
>> --
>> 13 | multifd| correctness/races
>>  8 | ram| correctness
>>  8 | rdma:  | general programming
>
> 8 rdma bugs??? ouch..

Mostly caught by a cleanup from Markus. Silly stuff like of mixed signed
integer comparisons and bugs in error handling. I don't even want to
look too much at it.

...hopefully this release we'll manage to resolve that situation.

>
>>  7 | qmp| new api bugs
>>  5 | postcopy   | races
>>  4 | file:  | leaks
>>  3 | return path| races
>>  3 | fd_cleanup | races
>>  2 | savevm, aio/coroutines
>>  1 | xbzrle, colo, dirtyrate, exec:,
>>   windows, iochannel, qemufile,
>>   arch (ppc64le)
>> 
>> Here, the migration-tests cover well: multifd, ram, qmp, postcopy,
>> file, rp, fd_cleanup, iochannel, qemufile, xbzrle.
>> 
>> My suggestion is we run per arch:
>> 
>> "/precopy/tcp/plain"
>> "/precopy/tcp/tls/psk/match",
>> "/postcopy/plain"
>> "/postcopy/preempt/plain"
>> "/postcopy/preempt/recovery/plain"
>> "/multifd/tcp/plain/cancel"
>> "/multifd/tcp/uri/plain/none"
>
> Don't you want to still keep a few multifd / file tests?

Not really, but I won't object if you want to add some more in there. To
be honest, I want to get out of people's way as much as I can because
having to revisit this every couple of months is stressful to me.

My rationale for those is:

"/precopy/tcp/plain":
 Smoke test, the most common migration

&q

Re: [PATCH] migration/multifd: Fix build for qatzip

2024-09-10 Thread Fabiano Rosas
Peter Xu  writes:

> The qatzip series was based on an older commit, it applied cleanly even
> though it has conflicts.  Neither CI nor myself found the build will break
> as it's skipped by default when qatzip library was missing.

It took longer than I expected.

Do you think it would work if we wrapped all calls to external functions
in a header and stubbed them out when there's no accelerator support?



Re: [PATCH] target/ppc: Fix lxv/stxv MSR facility check

2024-09-09 Thread Fabiano Rosas
Nicholas Piggin  writes:

> The move to decodetree flipped the inequality test for the VEC / VSX
> MSR facility check.
>
> This caused application crashes under Linux, where these facility
> unavailable interrupts are used for lazy-switching of VEC/VSX register
> sets. Getting the incorrect interrupt would result in wrong registers
> being loaded, potentially overwriting live values and/or exposing
> stale ones.
>
> Cc: qemu-sta...@nongnu.org
> Reported-by: Joel Stanley 
> Fixes: 70426b5bb738 ("target/ppc: moved stxvx and lxvx from legacy to 
> decodtree")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1769
> Tested-by: Harsh Prateek Bora 
> Signed-off-by: Nicholas Piggin 
> ---
>  target/ppc/translate/vsx-impl.c.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/ppc/translate/vsx-impl.c.inc 
> b/target/ppc/translate/vsx-impl.c.inc
> index 6db87ab336..0266f09119 100644
> --- a/target/ppc/translate/vsx-impl.c.inc
> +++ b/target/ppc/translate/vsx-impl.c.inc
> @@ -2268,7 +2268,7 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
> displ,
>  
>  static bool do_lstxv_D(DisasContext *ctx, arg_D *a, bool store, bool paired)
>  {
> -if (paired || a->rt >= 32) {
> +if (paired || a->rt < 32) {
>  REQUIRE_VSX(ctx);
>  } else {
>  REQUIRE_VECTOR(ctx);

What about the X-form down below?

static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
{
if (paired || a->rt >= 32) {
REQUIRE_VSX(ctx);
} else {
REQUIRE_VECTOR(ctx);
}

return do_lstxv(ctx, a->ra, cpu_gpr[a->rb], a->rt, store, paired);
}



Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle

2024-09-09 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Sep 09, 2024 at 03:02:57PM +0100, Peter Maydell wrote:
>> On Mon, 9 Sept 2024 at 14:51, Hyman Huang  wrote:
>> >
>> > Despite the fact that the responsive CPU throttle is enabled,
>> > the dirty sync count may not always increase because this is
>> > an optimization that might not happen in any situation.
>> >
>> > This test case just making sure it doesn't interfere with any
>> > current functionality.
>> >
>> > Signed-off-by: Hyman Huang 
>> 
>> tests/qtest/migration-test already runs 75 different
>> subtests, takes up a massive chunk of our "make check"
>> time, and is very commonly a "times out" test on some
>> of our CI jobs. It runs on five different guest CPU
>> architectures, each one of which takes between 2 and
>> 5 minutes to complete the full migration-test.
>> 
>> Do we really need to make it even bigger?
>
> I'll try to find some time in the next few weeks looking into this to see
> whether we can further shrink migration test times after previous attemps
> from Dan.  At least a low hanging fruit is we should indeed put some more
> tests into g_test_slow(), and this new test could also be a candidate (then
> we can run "-m slow" for migration PRs only).

I think we could (using -m slow or any other method) separate tests
that are generic enough that every CI run should benefit from them
vs. tests that are only useful once someone starts touching migration
code. I'd say very few in the former category and most of them in the
latter.

For an idea of where migration bugs lie, I took a look at what was
fixed since 2022:

# bugs | device/subsystem/arch
--
54 | migration
10 | vfio
 6 | ppc
 3 | virtio-gpu
 2 | pcie_sriov, tpm_emulator,
  vdpa, virtio-rng-pci
 1 | arm, block, gpio, lasi,
  pci, s390, scsi-disk,
  virtio-mem, TCG

>From these, ignoring the migration bugs, the migration-tests cover some
of: arm, ppc, s390, TCG. The device_opts[1] patch hasn't merged yet, but
once it is, then virtio-gpu would be covered and we could investigate
adding some of the others.

For actual migration code issues:

# bugs | (sub)subsystem | kind
--
13 | multifd| correctness/races
 8 | ram| correctness
 8 | rdma:  | general programming
 7 | qmp| new api bugs
 5 | postcopy   | races
 4 | file:  | leaks
 3 | return path| races
 3 | fd_cleanup | races
 2 | savevm, aio/coroutines
 1 | xbzrle, colo, dirtyrate, exec:,
  windows, iochannel, qemufile,
  arch (ppc64le)

Here, the migration-tests cover well: multifd, ram, qmp, postcopy,
file, rp, fd_cleanup, iochannel, qemufile, xbzrle.

My suggestion is we run per arch:

"/precopy/tcp/plain"
"/precopy/tcp/tls/psk/match",
"/postcopy/plain"
"/postcopy/preempt/plain"
"/postcopy/preempt/recovery/plain"
"/multifd/tcp/plain/cancel"
"/multifd/tcp/uri/plain/none"

and x86 gets extra:

"/precopy/unix/suspend/live"
"/precopy/unix/suspend/notlive"
"/dirty_ring"

(the other dirty_* tests are too slow)

All the rest go behind a knob that people touching migration code will
enable.

wdyt?

1- allows adding devices to QEMU cmdline for migration-test
https://lore.kernel.org/r/20240523201922.28007-4-faro...@suse.de



Re: [PULL 27/34] migration/multifd: Move nocomp code into multifd-nocomp.c

2024-09-09 Thread Fabiano Rosas
Peter Maydell  writes:

> On Wed, 4 Sept 2024 at 13:48, Fabiano Rosas  wrote:
>>
>> In preparation for adding new payload types to multifd, move most of
>> the no-compression code into multifd-nocomp.c. Let's try to keep a
>> semblance of layering by not mixing general multifd control flow with
>> the details of transmitting pages of ram.
>>
>> There are still some pieces leftover, namely the p->normal, p->zero,
>> etc variables that we use for zero page tracking and the packet
>> allocation which is heavily dependent on the ram code.
>>
>> Reviewed-by: Peter Xu 
>> Signed-off-by: Fabiano Rosas 
>
> I know Coverity has only flagged this up because the
> code has moved, but it seems like a good place to ask
> the question:
>
>> +void multifd_ram_fill_packet(MultiFDSendParams *p)
>> +{
>> +MultiFDPacket_t *packet = p->packet;
>> +MultiFDPages_t *pages = &p->data->u.ram;
>> +uint32_t zero_num = pages->num - pages->normal_num;
>> +
>> +packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
>> +packet->normal_pages = cpu_to_be32(pages->normal_num);
>> +packet->zero_pages = cpu_to_be32(zero_num);
>> +
>> +if (pages->block) {
>> +strncpy(packet->ramblock, pages->block->idstr, 256);
>
> Coverity points out that when we fill in the RAMBlock::idstr
> here, if packet->ramblock is not NUL terminated then we won't
> NUL-terminate idstr either (CID 1560071).
>
> Is this really what is intended?

This is probably an oversight, although the migration destination
truncates it before reading:

/* make sure that ramblock is 0 terminated */
packet->ramblock[255] = 0;
p->block = qemu_ram_block_by_name(packet->ramblock);

If we ever start reading packet->ramblock on the source side in the
future, then there might be a problem.

>
> Perhaps
>  pstrncpy(packet->ramblock, sizeof(packet->ramblock),
>   pages->block->idstr);
>
> would be better?

Yep, thanks. I'll send a patch.

>
> (pstrncpy will always NUL-terminate, and won't pointlessly
> zero-fill the space after the string in the destination.)
>
>> +}
>> +
>> +for (int i = 0; i < pages->num; i++) {
>> +/* there are architectures where ram_addr_t is 32 bit */
>> +uint64_t temp = pages->offset[i];
>> +
>> +packet->offset[i] = cpu_to_be64(temp);
>> +}
>> +
>> +trace_multifd_send_ram_fill(p->id, pages->normal_num,
>> +zero_num);
>> +}
>
> thanks
> -- PMM



Re: [PATCH v7 2/4] migration/multifd: Fix p->iov leak in multifd-uadk.c

2024-09-06 Thread Fabiano Rosas
Michael Tokarev  writes:

> 28.08.2024 17:56, Fabiano Rosas wrote:
>> The send_cleanup() hook should free the p->iov that was allocated at
>> send_setup(). This was missed because the UADK code is conditional on
>> the presence of the accelerator, so it's not tested by default.
>> 
>> Fixes: 819dd20636 ("migration/multifd: Add UADK initialization")
>> Reported-by: Peter Xu 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>   migration/multifd-uadk.c | 2 ++
>>   1 file changed, 2 insertions(+)
>> 
>> diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
>> index 89f6a72f0e..6e6a290ae9 100644
>> --- a/migration/multifd-uadk.c
>> +++ b/migration/multifd-uadk.c
>> @@ -132,6 +132,8 @@ static void multifd_uadk_send_cleanup(MultiFDSendParams 
>> *p, Error **errp)
>>   
>>   multifd_uadk_uninit_sess(wd);
>>   p->compress_data = NULL;
>> +g_free(p->iov);
>> +p->iov = NULL;
>>   }
>
> This sounds like something for stable-9.1.x, is it not?

Right, it is.

>
> Thanks,
>
> /mjt



Re: [RFC PATCH 0/2] qtest: Log verbosity changes

2024-09-06 Thread Fabiano Rosas
Peter Maydell  writes:

> On Fri, 6 Sept 2024 at 09:14, Daniel P. Berrangé  wrote:
>>
>> On Fri, Sep 06, 2024 at 08:16:31AM +0200, Thomas Huth wrote:
>> > On 05/09/2024 23.03, Fabiano Rosas wrote:
>> > > Hi,
>> > >
>> > > This series silences QEMU stderr unless the QTEST_LOG variable is set
>> > > and silences -qtest-log unless both QTEST_LOG and gtest's --verbose
>> > > flag is passed.
>> > >
>> > > This was motivated by Peter Maydell's ask to suppress deprecation
>> > > warn_report messages from the migration-tests and by my own
>> > > frustration over noisy output from qtest.
>
> This isn't what I want, though -- what I want is that a
> qtest run should not print "warning:" messages for things
> that we expect to happen when we run that test. I *do* want
> warnings for things that we do not expect to happen when
> we run the test.
>
>> > Not sure whether we want to ignore stderr by default... we might also miss
>> > important warnings or error messages that way...?
>>
>> I would prefer if our tests were quiet by default, and just printed
>> clear pass/fail notices without extraneous fluff. Having an opt-in
>> to see full messages from stderr feels good enough for debugging cases
>> where you need more info from a particular test.
>
> I find it is not uncommon that something fails and
> you don't necessarily have the option to re-run it with
> the "give me the error message this time" flag turn on.
> CI is just the most obvious example; other kinds of
> intermittent failure can be similar.
>
>> Probably we should set verbose mode in CI though, since it is tedious
>> to re-run CI on failure to gather more info
>>
>> > If you just want to suppress one certain warning, I think it's maybe best 
>> > to
>> > fence it with "if (!qtest_enabled()) { ... }" on the QEMU side - at least
>> > that's what we did in similar cases a couple of times, IIRC.
>>
>> We're got a surprisingly large mumber of if(qtest_enabled()) conditions
>> in the code. I can't help feeling this is a bad idea in the long term,
>> as its making us take different codepaths when testing from production.
>
> What I want from CI and from tests in general:
>  * if something fails, I want all the information
>  * if something unexpected happens I want the warning even
>if the test passes (this is why I grep the logs for
>"warning:" in the first place -- it is to catch the case
>of "something went wrong but it didn't result in QEMU or
>the test case exiting with a failure status")
>  * if something is expected, it should be silent
>
> That means there's a class of messages where we want to warn
> the user that they're doing something that might not be what
> they intended or which is deprecated and will go away soon,
> but where we do not want to "warn" in the test logging because
> the test is deliberately setting up that oddball corner case.
>
> It might be useful to have a look at where we're using
> if (qtest_enabled()) to see if we can make some subcategories
> avoid the explicit if(), e.g. by having a warn_deprecated(...)
> and hide the "don't print if qtest" inside the function.
>

I could add error/warn variants that are noop in case qtest is
enabled. It would, however, lead to this pattern which is discouraged by
the error.h documentation (+Cc Markus for advice):

before:
if (!dinfo && !qtest_enabled()) {
error_report("A flash image must be given with the "
 "'pflash' parameter");
exit(1);
}

after:
if (!dinfo) {
error_report_noqtest(&error_fatal,
 "A flash image must be given with the "
 "'pflash' parameter");
}

For both error/warn, we'd reduce the amount of qtest_enabled() to only
the special cases not related to printing. We'd remove ~35/83 instances,
not counting the 7 printfs.

> Some categories as a starter:
>  * some board models will error-and-exit if the user
>didn't provide any guest code (eg no -kernel option),
>like hw/m68k/an5206.c. When we're running with the
>qtest accelerator it's fine and expected that there's
>no guest code loaded because we'll never run any guest code
>  * in some places (eg target/arm/cpu.c) we treat qtest as
>another accelerator type, so we might say
>if (tcg_enabled() || qtest_enabled()) to mean "not
>hvf or kvm"
>  * sometimes we print a deprecation message only if
>not qtest, eg hw/core/numa.c or hw/core/machine.c
>  * the clock related code needs to be qtest aware because
>under qtest it's the qtest protocol that advances the
>clock
>  * sometimes we warn about possible user error if not
>qtest, eg hw/ppc/pnv.c or target/mips/cpu.c
>
> thanks
> -- PMM



[RFC PATCH 1/2] tests/qtest: Mute QEMU stderr

2024-09-05 Thread Fabiano Rosas
Make QEMU stderr conditional on the QTEST_LOG variable.

For the /x86/cpuid/parsing-plus-minus test, which traps the stderr, to
continue working set the QTEST_LOG variable from inside the
subprocess.

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/libqtest.c  | 6 --
 tests/qtest/test-x86-cpuid-compat.c | 6 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 1326e34291..347664cea6 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -476,11 +476,13 @@ static QTestState *qtest_init_internal(const char 
*qemu_bin,
  "-display none "
  "-audio none "
  "%s"
- " -accel qtest",
+ " -accel qtest"
+ " 2> %s",
  socket_path,
  getenv("QTEST_LOG") ? DEV_STDERR : DEV_NULL,
  qmp_socket_path,
- extra_args ?: "");
+ extra_args ?: "",
+ getenv("QTEST_LOG") ? DEV_STDERR : DEV_NULL);
 
 qtest_client_set_rx_handler(s, qtest_client_socket_recv_line);
 qtest_client_set_tx_handler(s, qtest_client_socket_send);
diff --git a/tests/qtest/test-x86-cpuid-compat.c 
b/tests/qtest/test-x86-cpuid-compat.c
index b9e7e5ef7b..641d1f8740 100644
--- a/tests/qtest/test-x86-cpuid-compat.c
+++ b/tests/qtest/test-x86-cpuid-compat.c
@@ -204,6 +204,9 @@ static void test_plus_minus_subprocess(void)
 return;
 }
 
+const char *log = g_getenv("QTEST_LOG");
+g_setenv("QTEST_LOG", "1", true);
+
 /* Rules:
  * 1)"-foo" overrides "+foo"
  * 2) "[+-]foo" overrides "foo=..."
@@ -227,6 +230,9 @@ static void test_plus_minus_subprocess(void)
 g_assert_true(qom_get_bool(path, "sse4-2"));
 g_assert_true(qom_get_bool(path, "sse4.2"));
 
+if (log) {
+g_setenv("QTEST_LOG", log, true);
+}
 qtest_end();
 g_free(path);
 }
-- 
2.35.3




[RFC PATCH 0/2] qtest: Log verbosity changes

2024-09-05 Thread Fabiano Rosas
Hi,

This series silences QEMU stderr unless the QTEST_LOG variable is set
and silences -qtest-log unless both QTEST_LOG and gtest's --verbose
flag is passed.

This was motivated by Peter Maydell's ask to suppress deprecation
warn_report messages from the migration-tests and by my own
frustration over noisy output from qtest.

I'm open to suggestions on how to better implement this. One option
would be to ignore g_test_verbose() and have QTEST_LOG levels
(1,2,3,...), but before I get too deep into that, here are the raw
patches for discussion.

Note that it's not possible to use glib's g_test_trap_assert_stderr()
to silence the warnings because when using any verbose option the
output of QEMU, libqmp and qtest-log gets interleaved on stderr,
failing the match.

Thanks

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1442270209

Fabiano Rosas (2):
  tests/qtest: Mute QEMU stderr
  tests/qtest: Mute -qtest-log

 tests/qtest/libqtest.c  | 8 +---
 tests/qtest/test-x86-cpuid-compat.c | 6 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

-- 
2.35.3




[RFC PATCH 2/2] tests/qtest: Mute -qtest-log

2024-09-05 Thread Fabiano Rosas
Make the -qtest-log be conditional on the --verbose flag, along with
the existing QTEST_LOG to avoid noisy qtest_memread() messages. Those
are particularly annoying for migration-test because all tests read
guest memory at the end and the QMP messages get lost in a flood of:

[R +1.096069] read 0x63ce000 0x1
[S +1.096071] OK 0xb8
[R +1.096077] read 0x63cf000 0x1
[S +1.096079] OK 0xb8
...

Signed-off-by: Fabiano Rosas 
---
 tests/qtest/libqtest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 347664cea6..9fca9c7260 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -479,7 +479,7 @@ static QTestState *qtest_init_internal(const char *qemu_bin,
  " -accel qtest"
  " 2> %s",
  socket_path,
- getenv("QTEST_LOG") ? DEV_STDERR : DEV_NULL,
+ g_test_verbose() ? DEV_STDERR : DEV_NULL,
  qmp_socket_path,
  extra_args ?: "",
  getenv("QTEST_LOG") ? DEV_STDERR : DEV_NULL);
-- 
2.35.3




Re: [PATCH] ci: migration: Don't run python tests in the compat job

2024-09-05 Thread Fabiano Rosas
Peter Xu  writes:

> On Thu, Sep 05, 2024 at 03:54:45PM -0300, Fabiano Rosas wrote:
>> The vmstate-checker-script test has a bug that makes it flaky. It was
>> also committed by mistake and will be removed.
>> 
>> Since the migration-compat job takes the tests from the build-previous
>> job instead of the current HEAD, neither a fix or a removal of the
>> test will take effect for this release.
>> 
>> Disable the faulty/undesirable test by taking advantage that it only
>> runs if the PYTHON environment variable is set. This also disables the
>> analyze-migration-script test, but this is fine because that test
>> doesn't have migration compatibility implications.
>> 
>> Signed-off-by: Fabiano Rosas 
>
> Reviewed-by: Peter Xu 
>
> We should still merge your previous pull, right?  Looks like that's the
> easiest indeed.

As I mentioned there, that pull is not to blame for this situation, so
my recommendation is to merge. However, there is still the suppression
of the deprecation messages that Peter asked about. I'll send a series
for that in a moment, but it requires qtest changes and probably a lot
of discussion.

>
> But still, just to double check with both you and Peter on the merge plan.
> If that's the case, I can send the 1st 9.2 pull earlier so we can have this
> in.
>
> Thanks,



[PATCH] ci: migration: Don't run python tests in the compat job

2024-09-05 Thread Fabiano Rosas
The vmstate-checker-script test has a bug that makes it flaky. It was
also committed by mistake and will be removed.

Since the migration-compat job takes the tests from the build-previous
job instead of the current HEAD, neither a fix or a removal of the
test will take effect for this release.

Disable the faulty/undesirable test by taking advantage that it only
runs if the PYTHON environment variable is set. This also disables the
analyze-migration-script test, but this is fine because that test
doesn't have migration compatibility implications.

Signed-off-by: Fabiano Rosas 
---
 .gitlab-ci.d/buildtest.yml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index aa32782405..e52456c371 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -212,6 +212,12 @@ build-previous-qemu:
 # testing an old QEMU against new features/tests that it is not
 # compatible with.
 - cd build-previous
+# Don't allow python-based tests to run. The
+# vmstate-checker-script test has a race that causes it to fail
+# sometimes. It cannot be fixed it because this job runs the test
+# from the old QEMU version. The test will be removed on master,
+# but this job will only see the change in the next release.
+- unset PYTHON
 # old to new
 - QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
   QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
-- 
2.35.3




Re: [PULL 00/34] Migration patches for 2024-09-04

2024-09-05 Thread Fabiano Rosas
Peter Maydell  writes:

> On Wed, 4 Sept 2024 at 13:49, Fabiano Rosas  wrote:
>>
>> The following changes since commit e638d685ec2a0700fb9529cbd1b2823ac4120c53:
>>
>>   Open 9.2 development tree (2024-09-03 09:18:43 -0700)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/farosas/qemu.git tags/migration-20240904-pull-request
>>
>> for you to fetch changes up to d41c9896f49076d1eaaa32214bd2296bd36d866c:
>>
>>   tests/qtest/migration: Add a check for the availability of the "pc" 
>> machine (2024-09-03 16:24:37 -0300)
>>
>> 
>> Migration pull request
>>
>> - Steve's cleanup of unused variable
>> - Peter Maydell's fixes for several leaks in migration-test
>> - Fabiano's flexibilization of multifd data structures for device
>>   state migration
>> - Arman Nabiev's fix for ppc e500 migration
>> - Thomas' fix for migration-test vs. --without-default-devices
>
> Hi. This generates a bunch of new warning messages when running
> "make check":
>
> 105/845 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>OK 256.17s   51
>  subtests passed
> ― ✀  ―
> stderr:
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ――
>
> Can you investigate or suppress these, please?

We did deprecate the feature. Not sure if qtest has a proper way of
silencing these. I'll take a look.

>
> I also see a complaint from the migration-compat-x86_64 job:
> https://gitlab.com/qemu-project/qemu/-/jobs/7752621835
>
> Traceback (most recent call last):
> File 
> "/builds/qemu-project/qemu/build-previous/scripts/vmstate-static-checker.py",
> line 438, in 
> sys.exit(main())
> ^^
> File 
> "/builds/qemu-project/qemu/build-previous/scripts/vmstate-static-checker.py",
> line 395, in main
> dest_data = json.load(args.dest)
> 
> File "/usr/lib64/python3.11/json/__init__.py", line 293, in load
> return loads(fp.read(),
> 
> File "/usr/lib64/python3.11/json/__init__.py", line 346, in loads
> return _default_decoder.decode(s)
> ^^
> File "/usr/lib64/python3.11/json/decoder.py", line 337, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> ^^
> File "/usr/lib64/python3.11/json/decoder.py", line 353, in raw_decode
> obj, end = self.scan_once(s, idx)
> ^^
> json.decoder.JSONDecodeError: Unterminated string starting at: line
> 5085 column 7 (char 131064)
> # Failed to run vmstate-static-checker.py
> not ok 3 /x86_64/migration/vmstate-checker-script
> Bail out!

This is a test that was committed by mistake. I removed it in this PR,
but the migration-compat job uses the previous QEMU version of the code,
so the test won't go away until the next release.

This test should not have been picked up as part of the migration-compat
job because we don't set the PYTHON variable there. The test has
something like:

const char *python = g_getenv("PYTHON");
if (!python) {
g_test_skip("PYTHON variable not set");
return;
}

In my fork the CI is green:
https://gitlab.com/farosas/qemu/-/pipelines/1438640697

I'll probably have to unset PYTHON for that job.

>
> I think this is probably a pre-existing failure, as
> I also saw it on the previous pullreq:
> https://gitlab.com/qemu-project/qemu/-/jobs/7751785881
>
> But since this is a migration pullreq, could you have a look?

Yes, the problem is not with this pull request. We'd be better off
merging this because it removes the faulty test.

>
> thanks
> -- PMM


Re: [PATCH v9 3/5] migration: Add migration parameters for QATzip

2024-09-04 Thread Fabiano Rosas
Yichen Wang  writes:

> From: Bryan Zhang 
>
> Adds support for migration parameters to control QATzip compression
> level.
>
> Acked-by: Markus Armbruster 
> Signed-off-by: Bryan Zhang 
> Signed-off-by: Hao Xiang 
> Signed-off-by: Yichen Wang 

Reviewed-by: Fabiano Rosas 



[PULL 15/34] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov

2024-09-04 Thread Fabiano Rosas
We want to stop dereferencing 'pages' so it can be replaced by an
opaque pointer in the next patches.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/file.c| 3 ++-
 migration/file.h| 2 +-
 migration/multifd.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 6451a21c86..7f11e26f5c 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -196,12 +196,13 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 }
 
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
-int niov, RAMBlock *block, Error **errp)
+int niov, MultiFDPages_t *pages, Error **errp)
 {
 ssize_t ret = 0;
 int i, slice_idx, slice_num;
 uintptr_t base, next, offset;
 size_t len;
+RAMBlock *block = pages->block;
 
 slice_idx = 0;
 slice_num = 1;
diff --git a/migration/file.h b/migration/file.h
index 9f71e87f74..1a1115f7f1 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -21,6 +21,6 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, 
Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
-int niov, RAMBlock *block, Error **errp);
+int niov, MultiFDPages_t *pages, Error **errp);
 int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 30e5c687d3..640e4450ff 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -956,7 +956,7 @@ static void *multifd_send_thread(void *opaque)
 
 if (migrate_mapped_ram()) {
 ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
-  pages->block, &local_err);
+  pages, &local_err);
 } else {
 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
   NULL, 0, p->write_flags,
-- 
2.35.3




[PULL 19/34] migration/multifd: Move pages accounting into multifd_send_zero_page_detect()

2024-09-04 Thread Fabiano Rosas
All references to pages are being removed from the multifd worker
threads in order to allow multifd to deal with different payload
types.

multifd_send_zero_page_detect() is called by all multifd migration
paths that deal with pages and is the last spot where zero pages and
normal page amounts are adjusted. Move the pages accounting into that
function.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-zero-page.c | 7 ++-
 migration/multifd.c   | 2 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index 6506a4aa89..f1e988a959 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -14,6 +14,7 @@
 #include "qemu/cutils.h"
 #include "exec/ramblock.h"
 #include "migration.h"
+#include "migration-stats.h"
 #include "multifd.h"
 #include "options.h"
 #include "ram.h"
@@ -53,7 +54,7 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
 
 if (!multifd_zero_page_enabled()) {
 pages->normal_num = pages->num;
-return;
+goto out;
 }
 
 /*
@@ -74,6 +75,10 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
 }
 
 pages->normal_num = i;
+
+out:
+stat64_add(&mig_stats.normal_pages, pages->normal_num);
+stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 }
 
 void multifd_recv_zero_page_process(MultiFDRecvParams *p)
diff --git a/migration/multifd.c b/migration/multifd.c
index c310d28532..410b7e12cc 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -989,8 +989,6 @@ static void *multifd_send_thread(void *opaque)
 
 stat64_add(&mig_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
-stat64_add(&mig_stats.normal_pages, pages->normal_num);
-stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 
 multifd_pages_reset(pages);
 p->next_packet_size = 0;
-- 
2.35.3




[PULL 00/34] Migration patches for 2024-09-04

2024-09-04 Thread Fabiano Rosas
The following changes since commit e638d685ec2a0700fb9529cbd1b2823ac4120c53:

  Open 9.2 development tree (2024-09-03 09:18:43 -0700)

are available in the Git repository at:

  https://gitlab.com/farosas/qemu.git tags/migration-20240904-pull-request

for you to fetch changes up to d41c9896f49076d1eaaa32214bd2296bd36d866c:

  tests/qtest/migration: Add a check for the availability of the "pc" machine 
(2024-09-03 16:24:37 -0300)


Migration pull request

- Steve's cleanup of unused variable
- Peter Maydell's fixes for several leaks in migration-test
- Fabiano's flexibilization of multifd data structures for device
  state migration
- Arman Nabiev's fix for ppc e500 migration
- Thomas' fix for migration-test vs. --without-default-devices



Arman Nabiev (1):
  target/ppc: Fix migration of CPUs with TLB_EMB TLB type

Fabiano Rosas (22):
  tests/qtest/migration: Remove vmstate-static-checker test
  migration/multifd: Reduce access to p->pages
  migration/multifd: Inline page_size and page_count
  migration/multifd: Remove pages->allocated
  migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
  migration/multifd: Introduce MultiFDSendData
  migration/multifd: Make MultiFDPages_t:offset a flexible array member
  migration/multifd: Replace p->pages with an union pointer
  migration/multifd: Move pages accounting into
multifd_send_zero_page_detect()
  migration/multifd: Remove total pages tracing
  migration/multifd: Isolate ram pages packet data
  migration/multifd: Don't send ram data during SYNC
  migration/multifd: Replace multifd_send_state->pages with client data
  migration/multifd: Allow multifd sync without flush
  migration/multifd: Standardize on multifd ops names
  migration/multifd: Register nocomp ops dynamically
  migration/multifd: Move nocomp code into multifd-nocomp.c
  migration/multifd: Make MultiFDMethods const
  migration/multifd: Stop changing the packet on recv side
  migration/multifd: Fix p->iov leak in multifd-uadk.c
  migration/multifd: Add a couple of asserts for p->iov
  migration/multifd: Add documentation for multifd methods

Peter Maydell (9):
  tests/qtest/migration-test: Fix bootfile cleanup handling
  tests/qtest/migration-test: Don't leak resp in
multifd_mapped_ram_fdset_end()
  tests/qtest/migration-test: Fix leaks in calc_dirtyrate_ready()
  tests/qtest/migration-helpers: Fix migrate_get_socket_address() leak
  tests/qtest/migration-test: Free QCRyptoTLSTestCertReq objects
  tests/unit/crypto-tls-x509-helpers: deinit privkey in test_tls_cleanup
  tests/qtest/migration-helpers: Don't dup argument to qdict_put_str()
  tests/qtest/migration-test: Don't strdup in get_dirty_rate()
  tests/qtest/migration-test: Don't leak QTestState in
test_multifd_tcp_cancel()

Steve Sistare (1):
  migration: delete unused parameter mis

Thomas Huth (1):
  tests/qtest/migration: Add a check for the availability of the "pc"
machine

 migration/file.c |   3 +-
 migration/file.h |   2 +-
 migration/meson.build|   1 +
 migration/multifd-nocomp.c   | 389 +++
 migration/multifd-qpl.c  |  79 +---
 migration/multifd-uadk.c | 104 ++---
 migration/multifd-zero-page.c|  13 +-
 migration/multifd-zlib.c |  99 ++---
 migration/multifd-zstd.c |  98 +
 migration/multifd.c  | 559 +--
 migration/multifd.h  | 152 ++--
 migration/ram.c  |  10 +-
 migration/savevm.c   |  10 +-
 migration/trace-events   |   9 +-
 target/ppc/machine.c |   2 +-
 tests/qtest/libqtest.c   |  17 +-
 tests/qtest/libqtest.h   |   2 -
 tests/qtest/migration-helpers.c  |  20 +-
 tests/qtest/migration-test.c | 114 +-
 tests/unit/crypto-tls-x509-helpers.c |  13 +-
 tests/unit/crypto-tls-x509-helpers.h |   6 +
 21 files changed, 772 insertions(+), 930 deletions(-)
 create mode 100644 migration/multifd-nocomp.c

-- 
2.35.3




[PULL 17/34] migration/multifd: Make MultiFDPages_t:offset a flexible array member

2024-09-04 Thread Fabiano Rosas
We're about to use MultiFDPages_t from inside the MultiFDSendData
payload union, which means we cannot have pointers to allocated data
inside the pages structure, otherwise we'd lose the reference to that
memory once another payload type touches the union. Move the offset
array into the end of the structure and turn it into a flexible array
member, so it is allocated along with the rest of MultiFDSendData in
the next patches.

Note that other pointers, such as the ramblock pointer are still fine
as long as the storage for them is not owned by the migration code and
can be correctly released at some point.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 19 ---
 migration/multifd.h |  4 ++--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 640e4450ff..717e71f539 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -98,6 +98,17 @@ struct {
 MultiFDMethods *ops;
 } *multifd_recv_state;
 
+static size_t multifd_ram_payload_size(void)
+{
+uint32_t n = multifd_ram_page_count();
+
+/*
+ * We keep an array of page offsets at the end of MultiFDPages_t,
+ * add space for it in the allocation.
+ */
+return sizeof(MultiFDPages_t) + n * sizeof(ram_addr_t);
+}
+
 static bool multifd_use_packets(void)
 {
 return !migrate_mapped_ram();
@@ -394,18 +405,12 @@ static int multifd_recv_initial_packet(QIOChannel *c, 
Error **errp)
 
 static MultiFDPages_t *multifd_pages_init(uint32_t n)
 {
-MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
-
-pages->offset = g_new0(ram_addr_t, n);
-
-return pages;
+return g_malloc0(multifd_ram_payload_size());
 }
 
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
 multifd_pages_reset(pages);
-g_free(pages->offset);
-pages->offset = NULL;
 g_free(pages);
 }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 7bb4a2cbc4..a7fdd97f70 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -77,9 +77,9 @@ typedef struct {
 uint32_t num;
 /* number of normal pages */
 uint32_t normal_num;
+RAMBlock *block;
 /* offset of each page */
-ram_addr_t *offset;
-RAMBlock *block;
+ram_addr_t offset[];
 } MultiFDPages_t;
 
 struct MultiFDRecvData {
-- 
2.35.3




[PULL 34/34] tests/qtest/migration: Add a check for the availability of the "pc" machine

2024-09-04 Thread Fabiano Rosas
From: Thomas Huth 

The test_vcpu_dirty_limit is the only test that does not check for the
availability of the machine before starting the test, so it fails when
QEMU has been configured with --without-default-devices. Add a check for
the "pc" machine type to fix it.

Signed-off-by: Thomas Huth 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 6aca6760ef..9d08101643 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3952,8 +3952,10 @@ int main(int argc, char **argv)
 if (g_str_equal(arch, "x86_64") && has_kvm && kvm_dirty_ring_supported()) {
 migration_test_add("/migration/dirty_ring",
test_precopy_unix_dirty_ring);
-migration_test_add("/migration/vcpu_dirty_limit",
-   test_vcpu_dirty_limit);
+if (qtest_has_machine("pc")) {
+migration_test_add("/migration/vcpu_dirty_limit",
+   test_vcpu_dirty_limit);
+}
 }
 
 ret = g_test_run();
-- 
2.35.3




[PULL 09/34] tests/qtest/migration-helpers: Don't dup argument to qdict_put_str()

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In migrate_set_ports() we call qdict_put_str() with a value string
which we g_strdup(). However qdict_put_str() takes a copy of the
value string, it doesn't take ownership of it, so the g_strdup()
only results in a leak:

Direct leak of 6 byte(s) in 1 object(s) allocated from:
#0 0x56298023713e in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f13e)
 (BuildId: b2b9174a5a54707a7f76bca51cdc95d2aa08bac1)
#1 0x7fba0ad39738 in g_malloc debian/build/deb/../../../glib/gmem.c:128:13
#2 0x7fba0ad4e583 in g_strdup 
debian/build/deb/../../../glib/gstrfuncs.c:361:17
#3 0x56298036b16e in migrate_set_ports 
tests/qtest/migration-helpers.c:145:49
#4 0x56298036ad1c in migrate_qmp tests/qtest/migration-helpers.c:228:9
#5 0x56298035b3dd in test_precopy_common tests/qtest/migration-test.c:1820:5
#6 0x5629803549dc in test_multifd_tcp_channels_none 
tests/qtest/migration-test.c:3077:5
#7 0x56298036d427 in migration_test_wrapper 
tests/qtest/migration-helpers.c:456:5

Drop the unnecessary g_strdup() call.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 7cbb9831e7..a43d180c80 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -142,7 +142,7 @@ static void migrate_set_ports(QTestState *to, QList 
*channel_list)
 qdict_haskey(addr, "port") &&
 (strcmp(qdict_get_str(addrdict, "port"), "0") == 0)) {
 addr_port = qdict_get_str(addr, "port");
-qdict_put_str(addrdict, "port", g_strdup(addr_port));
+qdict_put_str(addrdict, "port", addr_port);
 }
 }
 
-- 
2.35.3




[PULL 14/34] migration/multifd: Remove pages->allocated

2024-09-04 Thread Fabiano Rosas
This value never changes and is always the same as page_count. We
don't need a copy of it per-channel plus one in the extra slot. Remove
it.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 6 ++
 migration/multifd.h | 2 --
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3dfed8a005..30e5c687d3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -396,7 +396,6 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
 {
 MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
 
-pages->allocated = n;
 pages->offset = g_new0(ram_addr_t, n);
 
 return pages;
@@ -405,7 +404,6 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
 multifd_pages_reset(pages);
-pages->allocated = 0;
 g_free(pages->offset);
 pages->offset = NULL;
 g_free(pages);
@@ -420,7 +418,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 int i;
 
 packet->flags = cpu_to_be32(p->flags);
-packet->pages_alloc = cpu_to_be32(pages->allocated);
+packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
 packet->normal_pages = cpu_to_be32(pages->normal_num);
 packet->zero_pages = cpu_to_be32(zero_num);
 packet->next_packet_size = cpu_to_be32(p->next_packet_size);
@@ -651,7 +649,7 @@ static inline bool multifd_queue_empty(MultiFDPages_t 
*pages)
 
 static inline bool multifd_queue_full(MultiFDPages_t *pages)
 {
-return pages->num == pages->allocated;
+return pages->num == multifd_ram_page_count();
 }
 
 static inline void multifd_enqueue(MultiFDPages_t *pages, ram_addr_t offset)
diff --git a/migration/multifd.h b/migration/multifd.h
index a2bba23af9..660a9882c2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -76,8 +76,6 @@ typedef struct {
 uint32_t num;
 /* number of normal pages */
 uint32_t normal_num;
-/* number of allocated pages */
-uint32_t allocated;
 /* offset of each page */
 ram_addr_t *offset;
 RAMBlock *block;
-- 
2.35.3




[PULL 13/34] migration/multifd: Inline page_size and page_count

2024-09-04 Thread Fabiano Rosas
The MultiFD*Params structures are for per-channel data. Constant
values should not be there because that needlessly wastes cycles and
storage. The page_size and page_count fall into this category so move
them inline in multifd.h.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-qpl.c   | 10 +++---
 migration/multifd-uadk.c  | 36 ---
 migration/multifd-zero-page.c |  4 ++--
 migration/multifd-zlib.c  | 14 --
 migration/multifd-zstd.c  | 11 ++-
 migration/multifd.c   | 33 
 migration/multifd.h   | 18 ++
 7 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index f8c84c52cf..db60c05795 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -233,8 +233,10 @@ static void multifd_qpl_deinit(QplData *qpl)
 static int multifd_qpl_send_setup(MultiFDSendParams *p, Error **errp)
 {
 QplData *qpl;
+uint32_t page_size = multifd_ram_page_size();
+uint32_t page_count = multifd_ram_page_count();
 
-qpl = multifd_qpl_init(p->page_count, p->page_size, errp);
+qpl = multifd_qpl_init(page_count, page_size, errp);
 if (!qpl) {
 return -1;
 }
@@ -245,7 +247,7 @@ static int multifd_qpl_send_setup(MultiFDSendParams *p, 
Error **errp)
  * additional two IOVs are used to store packet header and compressed data
  * length
  */
-p->iov = g_new0(struct iovec, p->page_count + 2);
+p->iov = g_new0(struct iovec, page_count + 2);
 return 0;
 }
 
@@ -534,8 +536,10 @@ out:
 static int multifd_qpl_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
 QplData *qpl;
+uint32_t page_size = multifd_ram_page_size();
+uint32_t page_count = multifd_ram_page_count();
 
-qpl = multifd_qpl_init(p->page_count, p->page_size, errp);
+qpl = multifd_qpl_init(page_count, page_size, errp);
 if (!qpl) {
 return -1;
 }
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index b8ba3cd9c1..1ed1c6afe6 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -114,8 +114,10 @@ static void multifd_uadk_uninit_sess(struct wd_data *wd)
 static int multifd_uadk_send_setup(MultiFDSendParams *p, Error **errp)
 {
 struct wd_data *wd;
+uint32_t page_size = multifd_ram_page_size();
+uint32_t page_count = multifd_ram_page_count();
 
-wd = multifd_uadk_init_sess(p->page_count, p->page_size, true, errp);
+wd = multifd_uadk_init_sess(page_count, page_size, true, errp);
 if (!wd) {
 return -1;
 }
@@ -128,7 +130,7 @@ static int multifd_uadk_send_setup(MultiFDSendParams *p, 
Error **errp)
  * length
  */
 
-p->iov = g_new0(struct iovec, p->page_count + 2);
+p->iov = g_new0(struct iovec, page_count + 2);
 return 0;
 }
 
@@ -172,6 +174,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, 
Error **errp)
 {
 struct wd_data *uadk_data = p->compress_data;
 uint32_t hdr_size;
+uint32_t page_size = multifd_ram_page_size();
 uint8_t *buf = uadk_data->buf;
 int ret = 0;
 MultiFDPages_t *pages = p->pages;
@@ -188,7 +191,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, 
Error **errp)
 struct wd_comp_req creq = {
 .op_type = WD_DIR_COMPRESS,
 .src = pages->block->host + pages->offset[i],
-.src_len = p->page_size,
+.src_len = page_size,
 .dst = buf,
 /* Set dst_len to double the src in case compressed out >= 
page_size */
 .dst_len = p->page_size * 2,
@@ -201,7 +204,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, 
Error **errp)
p->id, ret, creq.status);
 return -1;
 }
-if (creq.dst_len < p->page_size) {
+if (creq.dst_len < page_size) {
 uadk_data->buf_hdr[i] = cpu_to_be32(creq.dst_len);
 prepare_next_iov(p, buf, creq.dst_len);
 buf += creq.dst_len;
@@ -213,11 +216,11 @@ static int multifd_uadk_send_prepare(MultiFDSendParams 
*p, Error **errp)
  * than page_size as well because at the receive end we can skip the
  * decompression. But it is tricky to find the right number here.
  */
-if (!uadk_data->handle || creq.dst_len >= p->page_size) {
-uadk_data->buf_hdr[i] = cpu_to_be32(p->page_size);
+if (!uadk_data->handle || creq.dst_len >= page_size) {
+uadk_data->buf_hdr[i] = cpu_to_be32(page_size);
 prepare_next_iov(p, pages->block->host + pages->offset[i],
- p->page_size);
-buf += p->page_size;
+ page_size);
+buf += page_size;
   

[PULL 25/34] migration/multifd: Standardize on multifd ops names

2024-09-04 Thread Fabiano Rosas
Add the multifd_ prefix to all functions and remove the useless
docstrings.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-qpl.c  | 57 
 migration/multifd-uadk.c | 55 ---
 migration/multifd-zlib.c | 81 ++--
 migration/multifd-zstd.c | 81 ++--
 migration/multifd.c  | 78 ++
 5 files changed, 36 insertions(+), 316 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index 21153f1987..75041a4c4d 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -220,16 +220,6 @@ static void multifd_qpl_deinit(QplData *qpl)
 }
 }
 
-/**
- * multifd_qpl_send_setup: set up send side
- *
- * Set up the channel with QPL compression.
- *
- * Returns 0 on success or -1 on error
- *
- * @p: Params for the channel being used
- * @errp: pointer to an error
- */
 static int multifd_qpl_send_setup(MultiFDSendParams *p, Error **errp)
 {
 QplData *qpl;
@@ -251,14 +241,6 @@ static int multifd_qpl_send_setup(MultiFDSendParams *p, 
Error **errp)
 return 0;
 }
 
-/**
- * multifd_qpl_send_cleanup: clean up send side
- *
- * Close the channel and free memory.
- *
- * @p: Params for the channel being used
- * @errp: pointer to an error
- */
 static void multifd_qpl_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
 multifd_qpl_deinit(p->compress_data);
@@ -487,17 +469,6 @@ static void multifd_qpl_compress_pages(MultiFDSendParams 
*p)
 }
 }
 
-/**
- * multifd_qpl_send_prepare: prepare data to be able to send
- *
- * Create a compressed buffer with all the pages that we are going to
- * send.
- *
- * Returns 0 on success or -1 on error
- *
- * @p: Params for the channel being used
- * @errp: pointer to an error
- */
 static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 QplData *qpl = p->compress_data;
@@ -523,16 +494,6 @@ out:
 return 0;
 }
 
-/**
- * multifd_qpl_recv_setup: set up receive side
- *
- * Create the compressed channel and buffer.
- *
- * Returns 0 on success or -1 on error
- *
- * @p: Params for the channel being used
- * @errp: pointer to an error
- */
 static int multifd_qpl_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
 QplData *qpl;
@@ -547,13 +508,6 @@ static int multifd_qpl_recv_setup(MultiFDRecvParams *p, 
Error **errp)
 return 0;
 }
 
-/**
- * multifd_qpl_recv_cleanup: set up receive side
- *
- * Close the channel and free memory.
- *
- * @p: Params for the channel being used
- */
 static void multifd_qpl_recv_cleanup(MultiFDRecvParams *p)
 {
 multifd_qpl_deinit(p->compress_data);
@@ -694,17 +648,6 @@ static int multifd_qpl_decompress_pages(MultiFDRecvParams 
*p, Error **errp)
 }
 return 0;
 }
-/**
- * multifd_qpl_recv: read the data from the channel into actual pages
- *
- * Read the compressed buffer, and uncompress it into the actual
- * pages.
- *
- * Returns 0 on success or -1 on error
- *
- * @p: Params for the channel being used
- * @errp: pointer to an error
- */
 static int multifd_qpl_recv(MultiFDRecvParams *p, Error **errp)
 {
 QplData *qpl = p->compress_data;
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index 9d99807af5..db2549f59b 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -103,14 +103,6 @@ static void multifd_uadk_uninit_sess(struct wd_data *wd)
 g_free(wd);
 }
 
-/**
- * multifd_uadk_send_setup: setup send side
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
 static int multifd_uadk_send_setup(MultiFDSendParams *p, Error **errp)
 {
 struct wd_data *wd;
@@ -134,14 +126,6 @@ static int multifd_uadk_send_setup(MultiFDSendParams *p, 
Error **errp)
 return 0;
 }
 
-/**
- * multifd_uadk_send_cleanup: cleanup send side
- *
- * Close the channel and return memory.
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
 static void multifd_uadk_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
 struct wd_data *wd = p->compress_data;
@@ -159,17 +143,6 @@ static inline void prepare_next_iov(MultiFDSendParams *p, 
void *base,
 p->iovs_num++;
 }
 
-/**
- * multifd_uadk_send_prepare: prepare data to be able to send
- *
- * Create a compressed buffer with all the pages that we are going to
- * send.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
 static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 struct wd_data *uadk_data = p->compress_data;
@@ -229,16 +202,6 @@ out:
 return 0;
 }
 
-/**
- * multifd_uadk_recv_setup: setup receive side
- *
- * Create the compressed channel and buffer.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- *

[PULL 06/34] tests/qtest/migration-helpers: Fix migrate_get_socket_address() leak

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In migrate_get_socket_address() we leak the SocketAddressList:
 (cd build/asan && \
  
ASAN_OPTIONS="fast_unwind_on_malloc=0:strip_path_prefix=/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../"
  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
  ./tests/qtest/migration-test --tap -k -p 
/x86_64/migration/multifd/tcp/tls/psk/match )

[...]
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x563d7f22f318 in __interceptor_calloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f318)
 (BuildId: 2ad6282fb5d076c863ab87f41a345d46dc965ded)
#1 0x7f9de3b39c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13
#2 0x563d7f3a119c in qobject_input_start_list 
qapi/qobject-input-visitor.c:336:17
#3 0x563d7f390fbf in visit_start_list qapi/qapi-visit-core.c:80:10
#4 0x563d7f3882ef in visit_type_SocketAddressList 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qapi/qapi-visit-sockets.c:519:10
#5 0x563d7f3658c9 in migrate_get_socket_address 
tests/qtest/migration-helpers.c:97:5
#6 0x563d7f362e24 in migrate_get_connect_uri 
tests/qtest/migration-helpers.c:111:13
#7 0x563d7f362bb2 in migrate_qmp tests/qtest/migration-helpers.c:222:23
#8 0x563d7f3533cd in test_precopy_common tests/qtest/migration-test.c:1817:5
#9 0x563d7f34dc1c in test_multifd_tcp_tls_psk_match 
tests/qtest/migration-test.c:3185:5
#10 0x563d7f365337 in migration_test_wrapper 
tests/qtest/migration-helpers.c:458:5

The code fishes out the SocketAddress from the list to return it, and the
callers are freeing that, but nothing frees the list.

Since this function is called in only two places, the simple fix is to
make it return the SocketAddressList rather than just a SocketAddress,
and then the callers can easily access the SocketAddress, and free
the whole SocketAddressList when they're done.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 84f49db85e..7cbb9831e7 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -82,11 +82,10 @@ static QDict *SocketAddress_to_qdict(SocketAddress *addr)
 return dict;
 }
 
-static SocketAddress *migrate_get_socket_address(QTestState *who)
+static SocketAddressList *migrate_get_socket_address(QTestState *who)
 {
 QDict *rsp;
 SocketAddressList *addrs;
-SocketAddress *addr;
 Visitor *iv = NULL;
 QObject *object;
 
@@ -95,36 +94,35 @@ static SocketAddress *migrate_get_socket_address(QTestState 
*who)
 
 iv = qobject_input_visitor_new(object);
 visit_type_SocketAddressList(iv, NULL, &addrs, &error_abort);
-addr = addrs->value;
 visit_free(iv);
 
 qobject_unref(rsp);
-return addr;
+return addrs;
 }
 
 static char *
 migrate_get_connect_uri(QTestState *who)
 {
-SocketAddress *addrs;
+SocketAddressList *addrs;
 char *connect_uri;
 
 addrs = migrate_get_socket_address(who);
-connect_uri = SocketAddress_to_str(addrs);
+connect_uri = SocketAddress_to_str(addrs->value);
 
-qapi_free_SocketAddress(addrs);
+qapi_free_SocketAddressList(addrs);
 return connect_uri;
 }
 
 static QDict *
 migrate_get_connect_qdict(QTestState *who)
 {
-SocketAddress *addrs;
+SocketAddressList *addrs;
 QDict *connect_qdict;
 
 addrs = migrate_get_socket_address(who);
-connect_qdict = SocketAddress_to_qdict(addrs);
+connect_qdict = SocketAddress_to_qdict(addrs->value);
 
-qapi_free_SocketAddress(addrs);
+qapi_free_SocketAddressList(addrs);
 return connect_qdict;
 }
 
-- 
2.35.3




[PULL 31/34] migration/multifd: Add a couple of asserts for p->iov

2024-09-04 Thread Fabiano Rosas
Check that p->iov is indeed always allocated and freed by the
MultiFDMethods hooks.

Suggested-by: Peter Xu 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 2a8cd9174c..9b200f4ad9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -466,6 +466,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 g_free(p->packet);
 p->packet = NULL;
 multifd_send_state->ops->send_cleanup(p, errp);
+assert(!p->iov);
 
 return *errp == NULL;
 }
@@ -871,6 +872,7 @@ bool multifd_send_setup(void)
 migrate_set_error(s, local_err);
 goto err;
 }
+assert(p->iov);
 }
 
 return true;
-- 
2.35.3




[PULL 02/34] tests/qtest/migration: Remove vmstate-static-checker test

2024-09-04 Thread Fabiano Rosas
I fumbled one of my last pull requests when fixing in-tree an issue
with commit 87d67fadb9 ("monitor: Stop removing non-duplicated
fds"). Basically mixed-up my `git add -p` and `git checkout -p` and
committed a piece of test infra that has not been reviewed yet.

This has not caused any bad symptoms because the test is not enabled
by default anywhere: make check doesn't use two qemu binaries and the
CI doesn't have PYTHON set for the compat tests. Besides, the test
works fine anyway, it would not break anything.

Remove this because it was never intended to be merged.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/libqtest.c   | 17 +++-
 tests/qtest/libqtest.h   |  2 -
 tests/qtest/migration-test.c | 82 
 3 files changed, 6 insertions(+), 95 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 1326e34291..9d07de1fbd 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -514,7 +514,12 @@ static QTestState *qtest_init_internal(const char 
*qemu_bin,
 kill(s->qemu_pid, SIGSTOP);
 }
 #endif
-return s;
+
+/* ask endianness of the target */
+
+s->big_endian = qtest_query_target_endianness(s);
+
+   return s;
 }
 
 QTestState *qtest_init_without_qmp_handshake(const char *extra_args)
@@ -522,21 +527,11 @@ QTestState *qtest_init_without_qmp_handshake(const char 
*extra_args)
 return qtest_init_internal(qtest_qemu_binary(NULL), extra_args);
 }
 
-QTestState *qtest_init_with_env_no_handshake(const char *var,
- const char *extra_args)
-{
-return qtest_init_internal(qtest_qemu_binary(var), extra_args);
-}
-
 QTestState *qtest_init_with_env(const char *var, const char *extra_args)
 {
 QTestState *s = qtest_init_internal(qtest_qemu_binary(var), extra_args);
 QDict *greeting;
 
-/* ask endianness of the target */
-
-s->big_endian = qtest_query_target_endianness(s);
-
 /* Read the QMP greeting and then do the handshake */
 greeting = qtest_qmp_receive(s);
 qobject_unref(greeting);
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index c261b7e0b3..beb96b18eb 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -68,8 +68,6 @@ QTestState *qtest_init(const char *extra_args);
  */
 QTestState *qtest_init_with_env(const char *var, const char *extra_args);
 
-QTestState *qtest_init_with_env_no_handshake(const char *var,
- const char *extra_args);
 /**
  * qtest_init_without_qmp_handshake:
  * @extra_args: other arguments to pass to QEMU.  CAUTION: these
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 6c06100d91..334b63cbaa 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -64,7 +64,6 @@ static QTestMigrationState dst_state;
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
-#define VMSTATE_CHECKER_SCRIPT "scripts/vmstate-static-checker.py"
 
 #define QEMU_VM_FILE_MAGIC 0x5145564d
 #define FILE_TEST_FILENAME "migfile"
@@ -1692,85 +1691,6 @@ static void test_analyze_script(void)
 test_migrate_end(from, to, false);
 cleanup("migfile");
 }
-
-static void test_vmstate_checker_script(void)
-{
-g_autofree gchar *cmd_src = NULL;
-g_autofree gchar *cmd_dst = NULL;
-g_autofree gchar *vmstate_src = NULL;
-g_autofree gchar *vmstate_dst = NULL;
-const char *machine_alias, *machine_opts = "";
-g_autofree char *machine = NULL;
-const char *arch = qtest_get_arch();
-int pid, wstatus;
-const char *python = g_getenv("PYTHON");
-
-if (!getenv(QEMU_ENV_SRC) && !getenv(QEMU_ENV_DST)) {
-g_test_skip("Test needs two different QEMU versions");
-return;
-}
-
-if (!python) {
-g_test_skip("PYTHON variable not set");
-return;
-}
-
-if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
-if (g_str_equal(arch, "i386")) {
-machine_alias = "pc";
-} else {
-machine_alias = "q35";
-}
-} else if (g_str_equal(arch, "s390x")) {
-machine_alias = "s390-ccw-virtio";
-} else if (strcmp(arch, "ppc64") == 0) {
-machine_alias = "pseries";
-} else if (strcmp(arch, "aarch64") == 0) {
-machine_alias = "virt";
-} else {
-g_assert_not_reached();
-}
-
-if (!qtest_has_machine(machine_alias)) {
-g_autofree char *msg = g_strdup_printf("machine %s not supported", 
machine_alias);
-g_test_skip(msg);
-return;
-}
-
-machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
- 

[PULL 11/34] tests/qtest/migration-test: Don't leak QTestState in test_multifd_tcp_cancel()

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In test_multifd_tcp_cancel() we create three QEMU processes: 'from',
'to' and 'to2'.  We clean up (via qtest_quit()) 'from' and 'to2' when
we call test_migrate_end(), but never clean up 'to', which results in
this leak:

Direct leak of 336 byte(s) in 1 object(s) allocated from:
#0 0x55e984fcd328 in __interceptor_calloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f328)
 (BuildId: 710d409b68bb04427009e9ca6e1b63ff8af785d3)
#1 0x7f0878b39c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13
#2 0x55e98503a172 in qtest_spawn_qemu tests/qtest/libqtest.c:397:21
#3 0x55e98502bc4a in qtest_init_internal tests/qtest/libqtest.c:471:9
#4 0x55e98502c5b7 in qtest_init_with_env tests/qtest/libqtest.c:533:21
#5 0x55e9850eef0f in test_migrate_start tests/qtest/migration-test.c:857:11
#6 0x55e9850eb01d in test_multifd_tcp_cancel 
tests/qtest/migration-test.c:3297:9
#7 0x55e985103407 in migration_test_wrapper 
tests/qtest/migration-helpers.c:456:5

Call qtest_quit() on 'to' to clean it up once it has exited.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 3818595040..6aca6760ef 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3242,6 +3242,7 @@ static void test_multifd_tcp_cancel(void)
 /* Make sure QEMU process "to" exited */
 qtest_set_expected_status(to, EXIT_FAILURE);
 qtest_wait_qemu(to);
+qtest_quit(to);
 
 args = (MigrateStart){
 .only_target = true,
-- 
2.35.3




Re: [PATCH] target/i386: Expose IBPB-BRTYPE and SBPB CPUID bits to the guest

2024-09-04 Thread Fabiano Rosas
Fabiano Rosas  writes:

> According to AMD's Speculative Return Stack Overflow whitepaper (link
> below), the hypervisor should synthesize the value of IBPB_BRTYPE and
> SBPB CPUID bits to the guest.
>
> Support for this is already present in the kernel with commit
> e47d86083c66 ("KVM: x86: Add SBPB support") and commit 6f0f23ef76be
> ("KVM: x86: Add IBPB_BRTYPE support").
>
> Add support in QEMU to expose the bits to the guest OS.
>
> host:
>   # cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
>   Mitigation: Safe RET
>
> before (guest):
>   $ cpuid -l 0x8021 -1 -r
>   0x8021 0x00: eax=0x0045 ebx=0x ecx=0x edx=0x
> ^
>   $ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
>   Vulnerable: Safe RET, no microcode
>
> after (guest):
>   $ cpuid -l 0x8021 -1 -r
>   0x8021 0x00: eax=0x1845 ebx=0x ecx=0x edx=0x
> ^
>   $ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
>   Mitigation: Safe RET
>
> Reported-by: Fabian Vogt 
> Link: 
> https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
> Signed-off-by: Fabiano Rosas 
> ---
> More info on this thread:
> https://lore.kernel.org/r/68f8b8b1ca1bf58b059f52afbd1c9c51108a074a.ca...@suse.com
> ---
>  target/i386/cpu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 85ef7452c0..d33401c922 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1221,8 +1221,8 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
> -NULL, NULL, NULL, NULL,
> -NULL, NULL, NULL, NULL,
> +NULL, NULL, NULL, "sbpb",
> +"ibpb-brtype", NULL, NULL, NULL,
>  },
>  .cpuid = { .eax = 0x8021, .reg = R_EAX, },
>  .tcg_features = 0,

Ping, any thoughts on this one?



[PULL 18/34] migration/multifd: Replace p->pages with an union pointer

2024-09-04 Thread Fabiano Rosas
We want multifd to be able to handle more types of data than just ram
pages. To start decoupling multifd from pages, replace p->pages
(MultiFDPages_t) with the new type MultiFDSendData that hides the
client payload inside an union.

The general idea here is to isolate functions that *need* to handle
MultiFDPages_t and move them in the future to multifd-ram.c, while
multifd.c will stay with only the core functions that handle
MultiFDSendData/MultiFDRecvData.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-qpl.c   |  6 +--
 migration/multifd-uadk.c  |  2 +-
 migration/multifd-zero-page.c |  2 +-
 migration/multifd-zlib.c  |  2 +-
 migration/multifd-zstd.c  |  2 +-
 migration/multifd.c   | 83 +--
 migration/multifd.h   |  7 +--
 7 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index db60c05795..21153f1987 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -406,7 +406,7 @@ retry:
 static void multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 {
 QplData *qpl = p->compress_data;
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 uint32_t size = p->page_size;
 qpl_job *job = qpl->sw_job;
 uint8_t *zbuf = qpl->zbuf;
@@ -437,7 +437,7 @@ static void 
multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 {
 QplData *qpl = p->compress_data;
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 uint32_t size = p->page_size;
 QplHwJob *hw_job;
 uint8_t *buf;
@@ -501,7 +501,7 @@ static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 QplData *qpl = p->compress_data;
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 uint32_t len = 0;
 
 if (!multifd_send_prepare_common(p)) {
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index 1ed1c6afe6..9d99807af5 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -177,7 +177,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, 
Error **errp)
 uint32_t page_size = multifd_ram_page_size();
 uint8_t *buf = uadk_data->buf;
 int ret = 0;
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 
 if (!multifd_send_prepare_common(p)) {
 goto out;
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index cc624e36b3..6506a4aa89 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -46,7 +46,7 @@ static void swap_page_offset(ram_addr_t *pages_offset, int a, 
int b)
  */
 void multifd_send_zero_page_detect(MultiFDSendParams *p)
 {
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 RAMBlock *rb = pages->block;
 int i = 0;
 int j = pages->num - 1;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index e47d7f70dc..66517c1067 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,7 +123,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 struct zlib_data *z = p->compress_data;
 z_stream *zs = &z->zs;
 uint32_t out_size = 0;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 1812fd1b48..04ac711cf4 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -119,7 +119,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-MultiFDPages_t *pages = p->pages;
+MultiFDPages_t *pages = &p->data->u.ram;
 struct zstd_data *z = p->compress_data;
 int ret;
 uint32_t i;
diff --git a/migration/multifd.c b/migration/multifd.c
index 717e71f539..c310d28532 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -49,8 +49,7 @@ typedef struct {
 
 struct {
 MultiFDSendParams *params;
-/* array of pages to sent */
-MultiFDPages_t *pages;
+MultiFDSendData *data;
 /*
  * Global number of generated multifd packets.
  *
@@ -109,6 +108,28 @@ static size_t multifd_ram_payload_size(void)
 return sizeof(MultiFDPages_t) + n * sizeof(ram_addr_t);
 }
 
+static MultiFDSendData *multifd_send_data_alloc(void)
+{
+size_t max_payload_size, size_minus_payload;
+
+/*
+ * MultiFDPages_t has a flexible array at the end, account for it
+ * when allocating MultiFDSendData. Use max() in case other t

[PULL 22/34] migration/multifd: Don't send ram data during SYNC

2024-09-04 Thread Fabiano Rosas
Skip saving and loading any ram data in the packet in the case of a
SYNC. This fixes a shortcoming of the current code which requires a
reset of the MultiFDPages_t fields right after the previous
pending_job finishes, otherwise the very next job might be a SYNC and
multifd_send_fill_packet() will put the stale values in the packet.

By not calling multifd_ram_fill_packet(), we can stop resetting
MultiFDPages_t in the multifd core and leave that to the client code.

Actually moving the reset function is not yet done because
pages->num==0 is used by the client code to determine whether the
MultiFDPages_t needs to be flushed. The subsequent patches will
replace that with a generic flag that is not dependent on
MultiFDPages_t.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d64fcdf4ac..3a164c124d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -452,6 +452,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 {
 MultiFDPacket_t *packet = p->packet;
 uint64_t packet_num;
+bool sync_packet = p->flags & MULTIFD_FLAG_SYNC;
 
 memset(packet, 0, p->packet_len);
 
@@ -466,7 +467,9 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 
 p->packets_sent++;
 
-multifd_ram_fill_packet(p);
+if (!sync_packet) {
+multifd_ram_fill_packet(p);
+}
 
 trace_multifd_send_fill(p->id, packet_num,
 p->flags, p->next_packet_size);
@@ -574,7 +577,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 p->packet_num = be64_to_cpu(packet->packet_num);
 p->packets_recved++;
 
-ret = multifd_ram_unfill_packet(p, errp);
+if (!(p->flags & MULTIFD_FLAG_SYNC)) {
+ret = multifd_ram_unfill_packet(p, errp);
+}
 
 trace_multifd_recv_unfill(p->id, p->packet_num, p->flags,
   p->next_packet_size);
@@ -1536,7 +1541,9 @@ static void *multifd_recv_thread(void *opaque)
 flags = p->flags;
 /* recv methods don't know how to handle the SYNC flag */
 p->flags &= ~MULTIFD_FLAG_SYNC;
-has_data = p->normal_num || p->zero_num;
+if (!(flags & MULTIFD_FLAG_SYNC)) {
+has_data = p->normal_num || p->zero_num;
+}
 qemu_mutex_unlock(&p->mutex);
 } else {
 /*
-- 
2.35.3




[PULL 23/34] migration/multifd: Replace multifd_send_state->pages with client data

2024-09-04 Thread Fabiano Rosas
Multifd currently has a simple scheduling mechanism that distributes
work to the various channels by keeping storage space within each
channel and an extra space that is given to the client. Each time the
client fills the space with data and calls into multifd, that space is
given to the next idle channel and a free storage space is taken from
the channel and given to client for the next iteration.

This means we always need (#multifd_channels + 1) memory slots to
operate multifd.

This is fine, except that the presence of this one extra memory slot
doesn't allow different types of payloads to be processed at the same
time in different channels, i.e. the data type of
multifd_send_state->pages needs to be the same as p->pages.

For each new data type different from MultiFDPage_t that is to be
handled, this logic would need to be duplicated by adding new fields
to multifd_send_state, to the channels and to multifd_send_pages().

Fix this situation by moving the extra slot into the client and using
only the generic type MultiFDSendData in the multifd core.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 79 ++---
 migration/multifd.h |  3 ++
 migration/ram.c |  2 ++
 3 files changed, 50 insertions(+), 34 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3a164c124d..cb7a121eb0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -49,7 +49,6 @@ typedef struct {
 
 struct {
 MultiFDSendParams *params;
-MultiFDSendData *data;
 /*
  * Global number of generated multifd packets.
  *
@@ -97,6 +96,8 @@ struct {
 MultiFDMethods *ops;
 } *multifd_recv_state;
 
+static MultiFDSendData *multifd_ram_send;
+
 static size_t multifd_ram_payload_size(void)
 {
 uint32_t n = multifd_ram_page_count();
@@ -130,6 +131,17 @@ static MultiFDSendData *multifd_send_data_alloc(void)
 return g_malloc0(size_minus_payload + max_payload_size);
 }
 
+void multifd_ram_save_setup(void)
+{
+multifd_ram_send = multifd_send_data_alloc();
+}
+
+void multifd_ram_save_cleanup(void)
+{
+g_free(multifd_ram_send);
+multifd_ram_send = NULL;
+}
+
 static bool multifd_use_packets(void)
 {
 return !migrate_mapped_ram();
@@ -610,25 +622,20 @@ static void multifd_send_kick_main(MultiFDSendParams *p)
 }
 
 /*
- * How we use multifd_send_state->pages and channel->pages?
+ * multifd_send() works by exchanging the MultiFDSendData object
+ * provided by the caller with an unused MultiFDSendData object from
+ * the next channel that is found to be idle.
  *
- * We create a pages for each channel, and a main one.  Each time that
- * we need to send a batch of pages we interchange the ones between
- * multifd_send_state and the channel that is sending it.  There are
- * two reasons for that:
- *- to not have to do so many mallocs during migration
- *- to make easier to know what to free at the end of migration
+ * The channel owns the data until it finishes transmitting and the
+ * caller owns the empty object until it fills it with data and calls
+ * this function again. No locking necessary.
  *
- * This way we always know who is the owner of each "pages" struct,
- * and we don't need any locking.  It belongs to the migration thread
- * or to the channel thread.  Switching is safe because the migration
- * thread is using the channel mutex when changing it, and the channel
- * have to had finish with its own, otherwise pending_job can't be
- * false.
+ * Switching is safe because both the migration thread and the channel
+ * thread have barriers in place to serialize access.
  *
  * Returns true if succeed, false otherwise.
  */
-static bool multifd_send_pages(void)
+static bool multifd_send(MultiFDSendData **send_data)
 {
 int i;
 static int next_channel;
@@ -669,11 +676,16 @@ static bool multifd_send_pages(void)
  */
 smp_mb_acquire();
 
-assert(!p->data->u.ram.num);
+assert(multifd_payload_empty(p->data));
 
-tmp = multifd_send_state->data;
-multifd_send_state->data = p->data;
+/*
+ * Swap the pointers. The channel gets the client data for
+ * transferring and the client gets back an unused data slot.
+ */
+tmp = *send_data;
+*send_data = p->data;
 p->data = tmp;
+
 /*
  * Making sure p->data is setup before marking pending_job=true. Pairs
  * with the qatomic_load_acquire() in multifd_send_thread().
@@ -705,7 +717,12 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 MultiFDPages_t *pages;
 
 retry:
-pages = &multifd_send_state->data->u.ram;
+pages = &multifd_ram_send->u.ram;
+
+if (multifd_payload_empty(multifd_ram_send)) {
+multifd_pages_reset(pages);
+multifd_set_payload_type(multifd_ram_send, MULTIFD_PAYLOAD_RAM);
+}
 
 /* If the queue is empty, we can already enqueue now */
 if (mul

[PULL 16/34] migration/multifd: Introduce MultiFDSendData

2024-09-04 Thread Fabiano Rosas
Add a new data structure to replace p->pages in the multifd
channel. This new structure will hide the multifd payload type behind
an union, so we don't need to add a new field to the channel each time
we want to handle a different data type.

This also allow us to keep multifd_send_pages() as is, without needing
to complicate the pointer switching.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/migration/multifd.h b/migration/multifd.h
index 660a9882c2..7bb4a2cbc4 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -17,6 +17,7 @@
 #include "ram.h"
 
 typedef struct MultiFDRecvData MultiFDRecvData;
+typedef struct MultiFDSendData MultiFDSendData;
 
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
@@ -88,6 +89,31 @@ struct MultiFDRecvData {
 off_t file_offset;
 };
 
+typedef enum {
+MULTIFD_PAYLOAD_NONE,
+MULTIFD_PAYLOAD_RAM,
+} MultiFDPayloadType;
+
+typedef union MultiFDPayload {
+MultiFDPages_t ram;
+} MultiFDPayload;
+
+struct MultiFDSendData {
+MultiFDPayloadType type;
+MultiFDPayload u;
+};
+
+static inline bool multifd_payload_empty(MultiFDSendData *data)
+{
+return data->type == MULTIFD_PAYLOAD_NONE;
+}
+
+static inline void multifd_set_payload_type(MultiFDSendData *data,
+MultiFDPayloadType type)
+{
+data->type = type;
+}
+
 typedef struct {
 /* Fields are only written at creating/deletion time */
 /* No lock required for them, they are read only */
-- 
2.35.3




[PULL 21/34] migration/multifd: Isolate ram pages packet data

2024-09-04 Thread Fabiano Rosas
While we cannot yet disentangle the multifd packet from page data, we
can make the code a bit cleaner by setting the page-related fields in
a separate function.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c| 99 +-
 migration/trace-events |  5 ++-
 2 files changed, 63 insertions(+), 41 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index df8dfcc98f..d64fcdf4ac 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -424,65 +424,61 @@ static int multifd_recv_initial_packet(QIOChannel *c, 
Error **errp)
 return msg.id;
 }
 
-void multifd_send_fill_packet(MultiFDSendParams *p)
+static void multifd_ram_fill_packet(MultiFDSendParams *p)
 {
 MultiFDPacket_t *packet = p->packet;
 MultiFDPages_t *pages = &p->data->u.ram;
-uint64_t packet_num;
 uint32_t zero_num = pages->num - pages->normal_num;
-int i;
 
-packet->flags = cpu_to_be32(p->flags);
 packet->pages_alloc = cpu_to_be32(multifd_ram_page_count());
 packet->normal_pages = cpu_to_be32(pages->normal_num);
 packet->zero_pages = cpu_to_be32(zero_num);
-packet->next_packet_size = cpu_to_be32(p->next_packet_size);
-
-packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
-packet->packet_num = cpu_to_be64(packet_num);
 
 if (pages->block) {
 strncpy(packet->ramblock, pages->block->idstr, 256);
 }
 
-for (i = 0; i < pages->num; i++) {
+for (int i = 0; i < pages->num; i++) {
 /* there are architectures where ram_addr_t is 32 bit */
 uint64_t temp = pages->offset[i];
 
 packet->offset[i] = cpu_to_be64(temp);
 }
 
+trace_multifd_send_ram_fill(p->id, pages->normal_num, zero_num);
+}
+
+void multifd_send_fill_packet(MultiFDSendParams *p)
+{
+MultiFDPacket_t *packet = p->packet;
+uint64_t packet_num;
+
+memset(packet, 0, p->packet_len);
+
+packet->magic = cpu_to_be32(MULTIFD_MAGIC);
+packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+packet->flags = cpu_to_be32(p->flags);
+packet->next_packet_size = cpu_to_be32(p->next_packet_size);
+
+packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
+packet->packet_num = cpu_to_be64(packet_num);
+
 p->packets_sent++;
 
-trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
-   p->flags, p->next_packet_size);
+multifd_ram_fill_packet(p);
+
+trace_multifd_send_fill(p->id, packet_num,
+p->flags, p->next_packet_size);
 }
 
-static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+static int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
 MultiFDPacket_t *packet = p->packet;
 uint32_t page_count = multifd_ram_page_count();
 uint32_t page_size = multifd_ram_page_size();
 int i;
 
-packet->magic = be32_to_cpu(packet->magic);
-if (packet->magic != MULTIFD_MAGIC) {
-error_setg(errp, "multifd: received packet "
-   "magic %x and expected magic %x",
-   packet->magic, MULTIFD_MAGIC);
-return -1;
-}
-
-packet->version = be32_to_cpu(packet->version);
-if (packet->version != MULTIFD_VERSION) {
-error_setg(errp, "multifd: received packet "
-   "version %u and expected version %u",
-   packet->version, MULTIFD_VERSION);
-return -1;
-}
-
-p->flags = be32_to_cpu(packet->flags);
-
 packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
 /*
  * If we received a packet that is 100 times bigger than expected
@@ -511,13 +507,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 return -1;
 }
 
-p->next_packet_size = be32_to_cpu(packet->next_packet_size);
-p->packet_num = be64_to_cpu(packet->packet_num);
-p->packets_recved++;
-
-trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
-   p->flags, p->next_packet_size);
-
 if (p->normal_num == 0 && p->zero_num == 0) {
 return 0;
 }
@@ -559,6 +548,40 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 return 0;
 }
 
+static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+{
+MultiFDPacket_t *packet = p->packet;
+int ret = 0;
+
+packet->magic = be32_to_cpu(packet->magic);
+if (packet->magic != MULTIFD_MAGIC) {
+error_setg(errp, "multifd: received packet "
+   "magic %x and expected magic %x",
+   packet->magic, MULTIFD_MAGIC);
+ 

[PULL 12/34] migration/multifd: Reduce access to p->pages

2024-09-04 Thread Fabiano Rosas
I'm about to replace the p->pages pointer with an opaque pointer, so
do a cleanup now to reduce direct accesses to p->page, which makes the
next diffs cleaner.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-qpl.c  |  8 +---
 migration/multifd-uadk.c |  9 +
 migration/multifd-zlib.c |  2 +-
 migration/multifd-zstd.c |  2 +-
 migration/multifd.c  | 13 +++--
 5 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index 9265098ee7..f8c84c52cf 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -404,13 +404,14 @@ retry:
 static void multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 {
 QplData *qpl = p->compress_data;
+MultiFDPages_t *pages = p->pages;
 uint32_t size = p->page_size;
 qpl_job *job = qpl->sw_job;
 uint8_t *zbuf = qpl->zbuf;
 uint8_t *buf;
 
-for (int i = 0; i < p->pages->normal_num; i++) {
-buf = p->pages->block->host + p->pages->offset[i];
+for (int i = 0; i < pages->normal_num; i++) {
+buf = pages->block->host + pages->offset[i];
 multifd_qpl_prepare_comp_job(job, buf, zbuf, size);
 if (qpl_execute_job(job) == QPL_STS_OK) {
 multifd_qpl_fill_packet(i, p, zbuf, job->total_out);
@@ -498,6 +499,7 @@ static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 QplData *qpl = p->compress_data;
+MultiFDPages_t *pages = p->pages;
 uint32_t len = 0;
 
 if (!multifd_send_prepare_common(p)) {
@@ -505,7 +507,7 @@ static int multifd_qpl_send_prepare(MultiFDSendParams *p, 
Error **errp)
 }
 
 /* The first IOV is used to store the compressed page lengths */
-len = p->pages->normal_num * sizeof(uint32_t);
+len = pages->normal_num * sizeof(uint32_t);
 multifd_qpl_fill_iov(p, (uint8_t *) qpl->zlen, len);
 if (qpl->hw_avail) {
 multifd_qpl_compress_pages(p);
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index d12353fb21..b8ba3cd9c1 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -174,19 +174,20 @@ static int multifd_uadk_send_prepare(MultiFDSendParams 
*p, Error **errp)
 uint32_t hdr_size;
 uint8_t *buf = uadk_data->buf;
 int ret = 0;
+MultiFDPages_t *pages = p->pages;
 
 if (!multifd_send_prepare_common(p)) {
 goto out;
 }
 
-hdr_size = p->pages->normal_num * sizeof(uint32_t);
+hdr_size = pages->normal_num * sizeof(uint32_t);
 /* prepare the header that stores the lengths of all compressed data */
 prepare_next_iov(p, uadk_data->buf_hdr, hdr_size);
 
-for (int i = 0; i < p->pages->normal_num; i++) {
+for (int i = 0; i < pages->normal_num; i++) {
 struct wd_comp_req creq = {
 .op_type = WD_DIR_COMPRESS,
-.src = p->pages->block->host + p->pages->offset[i],
+.src = pages->block->host + pages->offset[i],
 .src_len = p->page_size,
 .dst = buf,
 /* Set dst_len to double the src in case compressed out >= 
page_size */
@@ -214,7 +215,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, 
Error **errp)
  */
 if (!uadk_data->handle || creq.dst_len >= p->page_size) {
 uadk_data->buf_hdr[i] = cpu_to_be32(p->page_size);
-prepare_next_iov(p, p->pages->block->host + p->pages->offset[i],
+prepare_next_iov(p, pages->block->host + pages->offset[i],
  p->page_size);
 buf += p->page_size;
 }
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 2ced69487e..65f8aba5c8 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -147,7 +147,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
  * with compression. zlib does not guarantee that this is safe,
  * therefore copy the page before calling deflate().
  */
-memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
+memcpy(z->buf, pages->block->host + pages->offset[i], p->page_size);
 zs->avail_in = p->page_size;
 zs->next_in = z->buf;
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index ca17b7e310..cb6075a9a5 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -138,7 +138,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 if (i == pages->normal_num - 1) {
 flush = ZSTD_e_flush;
 }
-z->in.src = p->pages->block->host + pages->offset[i];
+z->in.src = pages->bloc

[PULL 01/34] migration: delete unused parameter mis

2024-09-04 Thread Fabiano Rosas
From: Steve Sistare 

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/savevm.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 6bb404b9c8..d500eae979 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2578,8 +2578,7 @@ static bool check_section_footer(QEMUFile *f, 
SaveStateEntry *se)
 }
 
 static int
-qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis,
-   uint8_t type)
+qemu_loadvm_section_start_full(QEMUFile *f, uint8_t type)
 {
 bool trace_downtime = (type == QEMU_VM_SECTION_FULL);
 uint32_t instance_id, version_id, section_id;
@@ -2657,8 +2656,7 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis,
 }
 
 static int
-qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis,
- uint8_t type)
+qemu_loadvm_section_part_end(QEMUFile *f, uint8_t type)
 {
 bool trace_downtime = (type == QEMU_VM_SECTION_END);
 int64_t start_ts, end_ts;
@@ -2893,14 +2891,14 @@ retry:
 switch (section_type) {
 case QEMU_VM_SECTION_START:
 case QEMU_VM_SECTION_FULL:
-ret = qemu_loadvm_section_start_full(f, mis, section_type);
+ret = qemu_loadvm_section_start_full(f, section_type);
 if (ret < 0) {
 goto out;
 }
 break;
 case QEMU_VM_SECTION_PART:
 case QEMU_VM_SECTION_END:
-ret = qemu_loadvm_section_part_end(f, mis, section_type);
+ret = qemu_loadvm_section_part_end(f, section_type);
 if (ret < 0) {
 goto out;
 }
-- 
2.35.3




[PULL 28/34] migration/multifd: Make MultiFDMethods const

2024-09-04 Thread Fabiano Rosas
The methods are defined at module_init time and don't ever
change. Make them const.

Suggested-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-nocomp.c | 2 +-
 migration/multifd-qpl.c| 2 +-
 migration/multifd-uadk.c   | 2 +-
 migration/multifd-zlib.c   | 2 +-
 migration/multifd-zstd.c   | 2 +-
 migration/multifd.c| 8 
 migration/multifd.h| 2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 53ea9f9c83..f294d1b0b2 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -377,7 +377,7 @@ bool multifd_send_prepare_common(MultiFDSendParams *p)
 return true;
 }
 
-static MultiFDMethods multifd_nocomp_ops = {
+static const MultiFDMethods multifd_nocomp_ops = {
 .send_setup = multifd_nocomp_send_setup,
 .send_cleanup = multifd_nocomp_send_cleanup,
 .send_prepare = multifd_nocomp_send_prepare,
diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index 75041a4c4d..b0f1e2ba46 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -694,7 +694,7 @@ static int multifd_qpl_recv(MultiFDRecvParams *p, Error 
**errp)
 return multifd_qpl_decompress_pages_slow_path(p, errp);
 }
 
-static MultiFDMethods multifd_qpl_ops = {
+static const MultiFDMethods multifd_qpl_ops = {
 .send_setup = multifd_qpl_send_setup,
 .send_cleanup = multifd_qpl_send_cleanup,
 .send_prepare = multifd_qpl_send_prepare,
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index db2549f59b..89f6a72f0e 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -305,7 +305,7 @@ static int multifd_uadk_recv(MultiFDRecvParams *p, Error 
**errp)
 return 0;
 }
 
-static MultiFDMethods multifd_uadk_ops = {
+static const MultiFDMethods multifd_uadk_ops = {
 .send_setup = multifd_uadk_send_setup,
 .send_cleanup = multifd_uadk_send_cleanup,
 .send_prepare = multifd_uadk_send_prepare,
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 6787538762..8cf8a26bb4 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -277,7 +277,7 @@ static int multifd_zlib_recv(MultiFDRecvParams *p, Error 
**errp)
 return 0;
 }
 
-static MultiFDMethods multifd_zlib_ops = {
+static const MultiFDMethods multifd_zlib_ops = {
 .send_setup = multifd_zlib_send_setup,
 .send_cleanup = multifd_zlib_send_cleanup,
 .send_prepare = multifd_zlib_send_prepare,
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 1576b1e2ad..53da33e048 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -265,7 +265,7 @@ static int multifd_zstd_recv(MultiFDRecvParams *p, Error 
**errp)
 return 0;
 }
 
-static MultiFDMethods multifd_zstd_ops = {
+static const MultiFDMethods multifd_zstd_ops = {
 .send_setup = multifd_zstd_send_setup,
 .send_cleanup = multifd_zstd_send_cleanup,
 .send_prepare = multifd_zstd_send_prepare,
diff --git a/migration/multifd.c b/migration/multifd.c
index 0c07a2040b..b89715fdc2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -76,7 +76,7 @@ struct {
  */
 int exiting;
 /* multifd ops */
-MultiFDMethods *ops;
+const MultiFDMethods *ops;
 } *multifd_send_state;
 
 struct {
@@ -93,7 +93,7 @@ struct {
 uint64_t packet_num;
 int exiting;
 /* multifd ops */
-MultiFDMethods *ops;
+const MultiFDMethods *ops;
 } *multifd_recv_state;
 
 MultiFDSendData *multifd_send_data_alloc(void)
@@ -128,9 +128,9 @@ void multifd_send_channel_created(void)
 qemu_sem_post(&multifd_send_state->channels_created);
 }
 
-static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {};
+static const MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {};
 
-void multifd_register_ops(int method, MultiFDMethods *ops)
+void multifd_register_ops(int method, const MultiFDMethods *ops)
 {
 assert(0 <= method && method < MULTIFD_COMPRESSION__MAX);
 assert(!multifd_ops[method]);
diff --git a/migration/multifd.h b/migration/multifd.h
index a3e35196d1..13e7a88c01 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -243,7 +243,7 @@ typedef struct {
 int (*recv)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
-void multifd_register_ops(int method, MultiFDMethods *ops);
+void multifd_register_ops(int method, const MultiFDMethods *ops);
 void multifd_send_fill_packet(MultiFDSendParams *p);
 bool multifd_send_prepare_common(MultiFDSendParams *p);
 void multifd_send_zero_page_detect(MultiFDSendParams *p);
-- 
2.35.3




[PULL 33/34] target/ppc: Fix migration of CPUs with TLB_EMB TLB type

2024-09-04 Thread Fabiano Rosas
From: Arman Nabiev 

In vmstate_tlbemb a cut-and-paste error meant we gave
this vmstate subsection the same "cpu/tlb6xx" name as
the vmstate_tlb6xx subsection. This breaks migration load
for any CPU using the TLB_EMB CPU type, because when we
see the "tlb6xx" name in the incoming data we try to
interpret it as a vmstate_tlb6xx subsection, which it
isn't the right format for:

 $ qemu-system-ppc -drive
 if=none,format=qcow2,file=/home/petmay01/test-images/virt/dummy.qcow2
 -monitor stdio -M bamboo
 QEMU 9.0.92 monitor - type 'help' for more information
 (qemu) savevm foo
 (qemu) loadvm foo
 Missing section footer for cpu
 Error: Error -22 while loading VM state

Correct the incorrect vmstate section name. Since migration
for these CPU types was completely broken before, we don't
need to care that this is a migration compatibility break.

This affects the PPC 405, 440, 460 and e200 CPU families.

Cc: qemu-sta...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2522
Reviewed-by: Peter Maydell 
Signed-off-by: Arman Nabiev 
Signed-off-by: Fabiano Rosas 
---
 target/ppc/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 731dd8df35..d433fd45fc 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -621,7 +621,7 @@ static bool tlbemb_needed(void *opaque)
 }
 
 static const VMStateDescription vmstate_tlbemb = {
-.name = "cpu/tlb6xx",
+.name = "cpu/tlbemb",
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = tlbemb_needed,
-- 
2.35.3




[PULL 20/34] migration/multifd: Remove total pages tracing

2024-09-04 Thread Fabiano Rosas
The total_normal_pages and total_zero_pages elements are used only for
the end tracepoints of the multifd threads. These are not super useful
since they record per-channel numbers and are just the sum of all the
pages that are transmitted per-packet, for which we already have
tracepoints. Remove the totals from the tracing.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c| 12 ++--
 migration/multifd.h|  8 
 migration/trace-events |  4 ++--
 3 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 410b7e12cc..df8dfcc98f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -453,8 +453,6 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 }
 
 p->packets_sent++;
-p->total_normal_pages += pages->normal_num;
-p->total_zero_pages += zero_num;
 
 trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
p->flags, p->next_packet_size);
@@ -516,8 +514,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 p->next_packet_size = be32_to_cpu(packet->next_packet_size);
 p->packet_num = be64_to_cpu(packet->packet_num);
 p->packets_recved++;
-p->total_normal_pages += p->normal_num;
-p->total_zero_pages += p->zero_num;
 
 trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
p->flags, p->next_packet_size);
@@ -1036,8 +1032,7 @@ out:
 
 rcu_unregister_thread();
 migration_threads_remove(thread);
-trace_multifd_send_thread_end(p->id, p->packets_sent, 
p->total_normal_pages,
-  p->total_zero_pages);
+trace_multifd_send_thread_end(p->id, p->packets_sent);
 
 return NULL;
 }
@@ -1561,7 +1556,6 @@ static void *multifd_recv_thread(void *opaque)
 qemu_sem_wait(&p->sem_sync);
 }
 } else {
-p->total_normal_pages += p->data->size / qemu_target_page_size();
 p->data->size = 0;
 /*
  * Order data->size update before clearing
@@ -1578,9 +1572,7 @@ static void *multifd_recv_thread(void *opaque)
 }
 
 rcu_unregister_thread();
-trace_multifd_recv_thread_end(p->id, p->packets_recved,
-  p->total_normal_pages,
-  p->total_zero_pages);
+trace_multifd_recv_thread_end(p->id, p->packets_recved);
 
 return NULL;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index c2ba4cad13..9175104aea 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -162,10 +162,6 @@ typedef struct {
 uint32_t next_packet_size;
 /* packets sent through this channel */
 uint64_t packets_sent;
-/* non zero pages sent through this channel */
-uint64_t total_normal_pages;
-/* zero pages sent through this channel */
-uint64_t total_zero_pages;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
@@ -218,10 +214,6 @@ typedef struct {
 RAMBlock *block;
 /* ramblock host address */
 uint8_t *host;
-/* non zero pages recv through this channel */
-uint64_t total_normal_pages;
-/* zero pages recv through this channel */
-uint64_t total_zero_pages;
 /* buffers to recv */
 struct iovec *iov;
 /* Pages that are not zero */
diff --git a/migration/trace-events b/migration/trace-events
index 0b7c3324fb..0887cef912 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -134,7 +134,7 @@ multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %u"
 multifd_recv_sync_main_wait(uint8_t id) "iter %u"
 multifd_recv_terminate_threads(bool error) "error %d"
-multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, 
uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " 
zero pages %" PRIu64
+multifd_recv_thread_end(uint8_t id, uint64_t packets) "channel %u packets %" 
PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
 multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal_pages, uint32_t 
zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num 
%" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_send_error(uint8_t id) "channel %u"
@@ -142,7 +142,7 @@ multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(void) ""
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, 
uint64_t zero_pages) &qu

[PULL 05/34] tests/qtest/migration-test: Fix leaks in calc_dirtyrate_ready()

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In calc_dirtyrate_ready() we g_strdup() a string but then never free it:

Direct leak of 19 byte(s) in 2 object(s) allocated from:
#0 0x55ead613413e in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f13e)
 (BuildId: e7cd5c37b2987a1af682b43ee5240b98bb316737)
#1 0x7f7a13d39738 in g_malloc debian/build/deb/../../../glib/gmem.c:128:13
#2 0x7f7a13d4e583 in g_strdup 
debian/build/deb/../../../glib/gstrfuncs.c:361:17
#3 0x55ead6266f48 in calc_dirtyrate_ready 
tests/qtest/migration-test.c:3409:14
#4 0x55ead62669fe in wait_for_calc_dirtyrate_complete 
tests/qtest/migration-test.c:3422:13
#5 0x55ead6253df7 in test_vcpu_dirty_limit 
tests/qtest/migration-test.c:3562:9
#6 0x55ead626a407 in migration_test_wrapper 
tests/qtest/migration-helpers.c:456:5

We also fail to unref the QMP rsp_return, so we leak that also.

Rather than duplicating the string, use the in-place value from
the qdict, and then unref the qdict.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b775ffed81..97f99c1316 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3318,15 +3318,18 @@ static QDict *query_vcpu_dirty_limit(QTestState *who)
 static bool calc_dirtyrate_ready(QTestState *who)
 {
 QDict *rsp_return;
-gchar *status;
+const char *status;
+bool ready;
 
 rsp_return = query_dirty_rate(who);
 g_assert(rsp_return);
 
-status = g_strdup(qdict_get_str(rsp_return, "status"));
+status = qdict_get_str(rsp_return, "status");
 g_assert(status);
+ready = g_strcmp0(status, "measuring");
+qobject_unref(rsp_return);
 
-return g_strcmp0(status, "measuring");
+return ready;
 }
 
 static void wait_for_calc_dirtyrate_complete(QTestState *who,
-- 
2.35.3




[PULL 08/34] tests/unit/crypto-tls-x509-helpers: deinit privkey in test_tls_cleanup

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

We create a gnutls_x509_privkey_t in test_tls_init(), but forget
to deinit it in test_tls_cleanup(), resulting in leaks
reported in hte migration test such as:

Indirect leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x55fa6d11c12e in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f12e)
 (BuildId: 852a267993587f557f50e5715f352f43720077ba)
#1 0x7f073982685d in __gmp_default_allocate 
(/lib/x86_64-linux-gnu/libgmp.so.10+0xa85d) (BuildId: 
f110719303ddbea25a5e89ff730fec520eed67b0)
#2 0x7f0739836193 in __gmpz_realloc 
(/lib/x86_64-linux-gnu/libgmp.so.10+0x1a193) (BuildId: 
f110719303ddbea25a5e89ff730fec520eed67b0)
#3 0x7f0739836594 in __gmpz_import 
(/lib/x86_64-linux-gnu/libgmp.so.10+0x1a594) (BuildId: 
f110719303ddbea25a5e89ff730fec520eed67b0)
#4 0x7f07398a91ed in nettle_mpz_set_str_256_u 
(/lib/x86_64-linux-gnu/libhogweed.so.6+0xb1ed) (BuildId: 
3cc4a3474de72db89e9dcc93bfb95fe377f48c37)
#5 0x7f073a146a5a  (/lib/x86_64-linux-gnu/libgnutls.so.30+0x131a5a) 
(BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b)
#6 0x7f073a07192c  (/lib/x86_64-linux-gnu/libgnutls.so.30+0x5c92c) 
(BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b)
#7 0x7f073a078333  (/lib/x86_64-linux-gnu/libgnutls.so.30+0x6) 
(BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b)
#8 0x7f073a0e8353  (/lib/x86_64-linux-gnu/libgnutls.so.30+0xd3353) 
(BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b)
#9 0x7f073a0ef0ac in gnutls_x509_privkey_import 
(/lib/x86_64-linux-gnu/libgnutls.so.30+0xda0ac) (BuildId: 
97b8f99f392f1fd37b969a7164bcea884e23649b)
#10 0x55fa6d2547e3 in test_tls_load_key 
tests/unit/crypto-tls-x509-helpers.c:99:11
#11 0x55fa6d25460c in test_tls_init 
tests/unit/crypto-tls-x509-helpers.c:128:15
#12 0x55fa6d2495c4 in test_migrate_tls_x509_start_common 
tests/qtest/migration-test.c:1044:5
#13 0x55fa6d24c23a in test_migrate_tls_x509_start_reject_anon_client 
tests/qtest/migration-test.c:1216:12
#14 0x55fa6d23fb40 in test_precopy_common 
tests/qtest/migration-test.c:1789:21
#15 0x55fa6d236b7c in test_precopy_tcp_tls_x509_reject_anon_client 
tests/qtest/migration-test.c:2614:5

(Oddly, there is no reported leak in the x509 unit tests, even though
those also use test_tls_init() and test_tls_cleanup().)

Deinit the privkey in test_tls_cleanup().

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/unit/crypto-tls-x509-helpers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/unit/crypto-tls-x509-helpers.c 
b/tests/unit/crypto-tls-x509-helpers.c
index b316155d6a..2daecc416c 100644
--- a/tests/unit/crypto-tls-x509-helpers.c
+++ b/tests/unit/crypto-tls-x509-helpers.c
@@ -135,6 +135,7 @@ void test_tls_init(const char *keyfile)
 void test_tls_cleanup(const char *keyfile)
 {
 asn1_delete_structure(&pkix_asn1);
+gnutls_x509_privkey_deinit(privkey);
 unlink(keyfile);
 }
 
-- 
2.35.3




[PULL 27/34] migration/multifd: Move nocomp code into multifd-nocomp.c

2024-09-04 Thread Fabiano Rosas
In preparation for adding new payload types to multifd, move most of
the no-compression code into multifd-nocomp.c. Let's try to keep a
semblance of layering by not mixing general multifd control flow with
the details of transmitting pages of ram.

There are still some pieces leftover, namely the p->normal, p->zero,
etc variables that we use for zero page tracking and the packet
allocation which is heavily dependent on the ram code.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/meson.build  |   1 +
 migration/multifd-nocomp.c | 394 +
 migration/multifd.c| 377 +--
 migration/multifd.h|   5 +
 4 files changed, 402 insertions(+), 375 deletions(-)
 create mode 100644 migration/multifd-nocomp.c

diff --git a/migration/meson.build b/migration/meson.build
index 5ce2acb41e..77f3abf08e 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -21,6 +21,7 @@ system_ss.add(files(
   'migration-hmp-cmds.c',
   'migration.c',
   'multifd.c',
+  'multifd-nocomp.c',
   'multifd-zlib.c',
   'multifd-zero-page.c',
   'options.c',
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
new file mode 100644
index 00..53ea9f9c83
--- /dev/null
+++ b/migration/multifd-nocomp.c
@@ -0,0 +1,394 @@
+/*
+ * Multifd RAM migration without compression
+ *
+ * Copyright (c) 2019-2020 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/ramblock.h"
+#include "exec/target_page.h"
+#include "file.h"
+#include "multifd.h"
+#include "options.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+
+static MultiFDSendData *multifd_ram_send;
+
+size_t multifd_ram_payload_size(void)
+{
+uint32_t n = multifd_ram_page_count();
+
+/*
+ * We keep an array of page offsets at the end of MultiFDPages_t,
+ * add space for it in the allocation.
+ */
+return sizeof(MultiFDPages_t) + n * sizeof(ram_addr_t);
+}
+
+void multifd_ram_save_setup(void)
+{
+multifd_ram_send = multifd_send_data_alloc();
+}
+
+void multifd_ram_save_cleanup(void)
+{
+g_free(multifd_ram_send);
+multifd_ram_send = NULL;
+}
+
+static void multifd_set_file_bitmap(MultiFDSendParams *p)
+{
+MultiFDPages_t *pages = &p->data->u.ram;
+
+assert(pages->block);
+
+for (int i = 0; i < pages->normal_num; i++) {
+ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true);
+}
+
+for (int i = pages->normal_num; i < pages->num; i++) {
+ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], false);
+}
+}
+
+static int multifd_nocomp_send_setup(MultiFDSendParams *p, Error **errp)
+{
+uint32_t page_count = multifd_ram_page_count();
+
+if (migrate_zero_copy_send()) {
+p->write_flags |= QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+}
+
+if (!migrate_mapped_ram()) {
+/* We need one extra place for the packet header */
+p->iov = g_new0(struct iovec, page_count + 1);
+} else {
+p->iov = g_new0(struct iovec, page_count);
+}
+
+return 0;
+}
+
+static void multifd_nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+g_free(p->iov);
+p->iov = NULL;
+return;
+}
+
+static void multifd_send_prepare_iovs(MultiFDSendParams *p)
+{
+MultiFDPages_t *pages = &p->data->u.ram;
+uint32_t page_size = multifd_ram_page_size();
+
+for (int i = 0; i < pages->normal_num; i++) {
+p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+p->iov[p->iovs_num].iov_len = page_size;
+p->iovs_num++;
+}
+
+p->next_packet_size = pages->normal_num * page_size;
+}
+
+static int multifd_nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
+{
+bool use_zero_copy_send = migrate_zero_copy_send();
+int ret;
+
+multifd_send_zero_page_detect(p);
+
+if (migrate_mapped_ram()) {
+multifd_send_prepare_iovs(p);
+multifd_set_file_bitmap(p);
+
+return 0;
+}
+
+if (!use_zero_copy_send) {
+/*
+ * Only !zerocopy needs the header in IOV; zerocopy will
+ * send it separately.
+ */
+multifd_send_prepare_header(p);
+}
+
+multifd_send_prepare_iovs(p);
+p->flags |= MULTIFD_FLAG_NOCOMP;
+
+multifd_send_fill_packet(p);
+
+if (use_zero_copy_send) {
+/* Send header first, without zerocopy */
+ret = qio_channel_write_all(p->c, (void *)p->packet,
+p->packet_len, errp);
+   

[PULL 32/34] migration/multifd: Add documentation for multifd methods

2024-09-04 Thread Fabiano Rosas
Add documentation clarifying the usage of the multifd methods. The
general idea is that the client code calls into multifd to trigger
send/recv of data and multifd then calls these hooks back from the
worker threads at opportune moments so the client can process a
portion of the data.

Suggested-by: Peter Xu 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.h | 76 +
 1 file changed, 70 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 13e7a88c01..3bb96e9558 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -229,17 +229,81 @@ typedef struct {
 } MultiFDRecvParams;
 
 typedef struct {
-/* Setup for sending side */
+/*
+ * The send_setup, send_cleanup, send_prepare are only called on
+ * the QEMU instance at the migration source.
+ */
+
+/*
+ * Setup for sending side. Called once per channel during channel
+ * setup phase.
+ *
+ * Must allocate p->iov. If packets are in use (default), one
+ * extra iovec must be allocated for the packet header. Any memory
+ * allocated in this hook must be released at send_cleanup.
+ *
+ * p->write_flags may be used for passing flags to the QIOChannel.
+ *
+ * p->compression_data may be used by compression methods to store
+ * compression data.
+ */
 int (*send_setup)(MultiFDSendParams *p, Error **errp);
-/* Cleanup for sending side */
+
+/*
+ * Cleanup for sending side. Called once per channel during
+ * channel cleanup phase.
+ */
 void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
-/* Prepare the send packet */
+
+/*
+ * Prepare the send packet. Called as a result of multifd_send()
+ * on the client side, with p pointing to the MultiFDSendParams of
+ * a channel that is currently idle.
+ *
+ * Must populate p->iov with the data to be sent, increment
+ * p->iovs_num to match the amount of iovecs used and set
+ * p->next_packet_size with the amount of data currently present
+ * in p->iov.
+ *
+ * Must indicate whether this is a compression packet by setting
+ * p->flags.
+ *
+ * As a last step, if packets are in use (default), must prepare
+ * the packet by calling multifd_send_fill_packet().
+ */
 int (*send_prepare)(MultiFDSendParams *p, Error **errp);
-/* Setup for receiving side */
+
+/*
+ * The recv_setup, recv_cleanup, recv are only called on the QEMU
+ * instance at the migration destination.
+ */
+
+/*
+ * Setup for receiving side. Called once per channel during
+ * channel setup phase. May be empty.
+ *
+ * May allocate data structures for the receiving of data. May use
+ * p->iov. Compression methods may use p->compress_data.
+ */
 int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
-/* Cleanup for receiving side */
+
+/*
+ * Cleanup for receiving side. Called once per channel during
+ * channel cleanup phase. May be empty.
+ */
 void (*recv_cleanup)(MultiFDRecvParams *p);
-/* Read all data */
+
+/*
+ * Data receive method. Called as a result of multifd_recv() on
+ * the client side, with p pointing to the MultiFDRecvParams of a
+ * channel that is currently idle. Only called if there is data
+ * available to receive.
+ *
+ * Must validate p->flags according to what was set at
+ * send_prepare.
+ *
+ * Must read the data from the QIOChannel p->c.
+ */
 int (*recv)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
-- 
2.35.3




[PULL 10/34] tests/qtest/migration-test: Don't strdup in get_dirty_rate()

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

We g_strdup() the "status" string we get out of the qdict in
get_dirty_rate(), but we never free it.  Since we only use this
string while the dictionary is still valid, we don't need to strdup
at all; drop the unnecessary call to avoid this leak:

Direct leak of 18 byte(s) in 2 object(s) allocated from:
#0 0x564b3e01913e in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f13e)
 (BuildId: d6403a811332fcc846f93c45e23abfd06d1e67c4)
#1 0x7f2f278ff738 in g_malloc debian/build/deb/../../../glib/gmem.c:128:13
#2 0x7f2f27914583 in g_strdup 
debian/build/deb/../../../glib/gstrfuncs.c:361:17
#3 0x564b3e14bb5b in get_dirty_rate tests/qtest/migration-test.c:3447:14
#4 0x564b3e138e00 in test_vcpu_dirty_limit 
tests/qtest/migration-test.c:3565:16
#5 0x564b3e14f417 in migration_test_wrapper 
tests/qtest/migration-helpers.c:456:5

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f0f0335c6b..3818595040 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3355,7 +3355,7 @@ static void wait_for_calc_dirtyrate_complete(QTestState 
*who,
 static int64_t get_dirty_rate(QTestState *who)
 {
 QDict *rsp_return;
-gchar *status;
+const char *status;
 QList *rates;
 const QListEntry *entry;
 QDict *rate;
@@ -3364,7 +3364,7 @@ static int64_t get_dirty_rate(QTestState *who)
 rsp_return = query_dirty_rate(who);
 g_assert(rsp_return);
 
-status = g_strdup(qdict_get_str(rsp_return, "status"));
+status = qdict_get_str(rsp_return, "status");
 g_assert(status);
 g_assert_cmpstr(status, ==, "measured");
 
-- 
2.35.3




[PULL 24/34] migration/multifd: Allow multifd sync without flush

2024-09-04 Thread Fabiano Rosas
Separate the multifd sync from flushing the client data to the
channels. These two operations are closely related but not strictly
necessary to be executed together.

The multifd sync is intrinsic to how multifd works. The multiple
channels operate independently and may finish IO out of order in
relation to each other. This applies also between the source and
destination QEMU.

Flushing the data that is left in the client-owned data structures
(e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
but that is particular to how the ram migration is implemented with
several passes over dirty data.

Make these two routines separate, allowing future code to call the
sync by itself if needed. This also allows the usage of
multifd_ram_send to be isolated to ram code.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 13 +
 migration/multifd.h |  1 +
 migration/ram.c |  8 
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cb7a121eb0..ce08257706 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -914,11 +914,8 @@ static int multifd_zero_copy_flush(QIOChannel *c)
 return ret;
 }
 
-int multifd_send_sync_main(void)
+int multifd_ram_flush_and_sync(void)
 {
-int i;
-bool flush_zero_copy;
-
 if (!migrate_multifd()) {
 return 0;
 }
@@ -930,6 +927,14 @@ int multifd_send_sync_main(void)
 }
 }
 
+return multifd_send_sync_main();
+}
+
+int multifd_send_sync_main(void)
+{
+int i;
+bool flush_zero_copy;
+
 flush_zero_copy = migrate_zero_copy_send();
 
 for (i = 0; i < migrate_multifd_channels(); i++) {
diff --git a/migration/multifd.h b/migration/multifd.h
index 5fa384d9af..00c872dfda 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -270,4 +270,5 @@ static inline uint32_t multifd_ram_page_count(void)
 
 void multifd_ram_save_setup(void);
 void multifd_ram_save_cleanup(void);
+int multifd_ram_flush_and_sync(void);
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 1815b2557b..67ca3d5d51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1326,7 +1326,7 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
 (!migrate_multifd_flush_after_each_section() ||
  migrate_mapped_ram())) {
 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
-int ret = multifd_send_sync_main();
+int ret = multifd_ram_flush_and_sync();
 if (ret < 0) {
 return ret;
 }
@@ -3066,7 +3066,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, 
Error **errp)
 }
 
 bql_unlock();
-ret = multifd_send_sync_main();
+ret = multifd_ram_flush_and_sync();
 bql_lock();
 if (ret < 0) {
 error_setg(errp, "%s: multifd synchronization failed", __func__);
@@ -3213,7 +3213,7 @@ out:
 && migration_is_setup_or_active()) {
 if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
 !migrate_mapped_ram()) {
-ret = multifd_send_sync_main();
+ret = multifd_ram_flush_and_sync();
 if (ret < 0) {
 return ret;
 }
@@ -3285,7 +3285,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 }
 
-ret = multifd_send_sync_main();
+ret = multifd_ram_flush_and_sync();
 if (ret < 0) {
 return ret;
 }
-- 
2.35.3




[PULL 03/34] tests/qtest/migration-test: Fix bootfile cleanup handling

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

If you invoke the migration-test binary in such a way that it doesn't run
any tests, then we never call bootfile_create(), and at the end of
main() bootfile_delete() will try to unlink(NULL), which is not valid.
This can happen if for instance you tell the test binary to run a
subset of tests that turns out to be empty, like this:

 (cd build/asan && QTEST_QEMU_BINARY=./qemu-system-x86_64 
./tests/qtest/migration-test --tap -k -p bang)
 # random seed: R02S6501b289ff8ced4231ba452c3a87bc6f
 # Skipping test: userfaultfd not available
 1..0
 ../../tests/qtest/migration-test.c:182:12: runtime error: null pointer passed 
as argument 1, which is declared to never be null
 /usr/include/unistd.h:858:48: note: nonnull attribute specified here

Handle this by making bootfile_delete() not needing to do anything
because bootfile_create() was never called.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
[fixed conflict with aee07f2563]
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 334b63cbaa..37ef99c980 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -145,6 +145,9 @@ static char *bootpath;
 
 static void bootfile_delete(void)
 {
+if (!bootpath) {
+return;
+}
 unlink(bootpath);
 g_free(bootpath);
 bootpath = NULL;
@@ -156,10 +159,7 @@ static void bootfile_create(char *dir, bool suspend_me)
 unsigned char *content;
 size_t len;
 
-if (bootpath) {
-bootfile_delete();
-}
-
+bootfile_delete();
 bootpath = g_strdup_printf("%s/bootsect", dir);
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 /* the assembled x86 boot sector should be exactly one sector large */
-- 
2.35.3




[PULL 30/34] migration/multifd: Fix p->iov leak in multifd-uadk.c

2024-09-04 Thread Fabiano Rosas
The send_cleanup() hook should free the p->iov that was allocated at
send_setup(). This was missed because the UADK code is conditional on
the presence of the accelerator, so it's not tested by default.

Fixes: 819dd20636 ("migration/multifd: Add UADK initialization")
Reported-by: Peter Xu 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-uadk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index 89f6a72f0e..6e6a290ae9 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -132,6 +132,8 @@ static void multifd_uadk_send_cleanup(MultiFDSendParams *p, 
Error **errp)
 
 multifd_uadk_uninit_sess(wd);
 p->compress_data = NULL;
+g_free(p->iov);
+p->iov = NULL;
 }
 
 static inline void prepare_next_iov(MultiFDSendParams *p, void *base,
-- 
2.35.3




[PULL 29/34] migration/multifd: Stop changing the packet on recv side

2024-09-04 Thread Fabiano Rosas
As observed by Philippe, the multifd_ram_unfill_packet() function
currently leaves the MultiFDPacket structure with mixed
endianness. This is harmless, but ultimately not very clean.

Stop touching the received packet and do the necessary work using
stack variables instead.

While here tweak the error strings and fix the space before
semicolons. Also remove the "100 times bigger" comment because it's
just one possible explanation for a size mismatch and it doesn't even
match the code.

CC: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd-nocomp.c | 29 -
 migration/multifd.c| 20 +---
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index f294d1b0b2..07c63f4a72 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -220,33 +220,28 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error 
**errp)
 MultiFDPacket_t *packet = p->packet;
 uint32_t page_count = multifd_ram_page_count();
 uint32_t page_size = multifd_ram_page_size();
+uint32_t pages_per_packet = be32_to_cpu(packet->pages_alloc);
 int i;
 
-packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
-/*
- * If we received a packet that is 100 times bigger than expected
- * just stop migration.  It is a magic number.
- */
-if (packet->pages_alloc > page_count) {
-error_setg(errp, "multifd: received packet "
-   "with size %u and expected a size of %u",
-   packet->pages_alloc, page_count) ;
+if (pages_per_packet > page_count) {
+error_setg(errp, "multifd: received packet with %u pages, expected %u",
+   pages_per_packet, page_count);
 return -1;
 }
 
 p->normal_num = be32_to_cpu(packet->normal_pages);
-if (p->normal_num > packet->pages_alloc) {
-error_setg(errp, "multifd: received packet "
-   "with %u normal pages and expected maximum pages are %u",
-   p->normal_num, packet->pages_alloc) ;
+if (p->normal_num > pages_per_packet) {
+error_setg(errp, "multifd: received packet with %u non-zero pages, "
+   "which exceeds maximum expected pages %u",
+   p->normal_num, pages_per_packet);
 return -1;
 }
 
 p->zero_num = be32_to_cpu(packet->zero_pages);
-if (p->zero_num > packet->pages_alloc - p->normal_num) {
-error_setg(errp, "multifd: received packet "
-   "with %u zero pages and expected maximum zero pages are %u",
-   p->zero_num, packet->pages_alloc - p->normal_num) ;
+if (p->zero_num > pages_per_packet - p->normal_num) {
+error_setg(errp,
+   "multifd: received packet with %u zero pages, expected 
maximum %u",
+   p->zero_num, pages_per_packet - p->normal_num);
 return -1;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index b89715fdc2..2a8cd9174c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -230,22 +230,20 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
-MultiFDPacket_t *packet = p->packet;
+const MultiFDPacket_t *packet = p->packet;
+uint32_t magic = be32_to_cpu(packet->magic);
+uint32_t version = be32_to_cpu(packet->version);
 int ret = 0;
 
-packet->magic = be32_to_cpu(packet->magic);
-if (packet->magic != MULTIFD_MAGIC) {
-error_setg(errp, "multifd: received packet "
-   "magic %x and expected magic %x",
-   packet->magic, MULTIFD_MAGIC);
+if (magic != MULTIFD_MAGIC) {
+error_setg(errp, "multifd: received packet magic %x, expected %x",
+   magic, MULTIFD_MAGIC);
 return -1;
 }
 
-packet->version = be32_to_cpu(packet->version);
-if (packet->version != MULTIFD_VERSION) {
-error_setg(errp, "multifd: received packet "
-   "version %u and expected version %u",
-   packet->version, MULTIFD_VERSION);
+if (version != MULTIFD_VERSION) {
+error_setg(errp, "multifd: received packet version %u, expected %u",
+   version, MULTIFD_VERSION);
 return -1;
 }
 
-- 
2.35.3




[PULL 04/34] tests/qtest/migration-test: Don't leak resp in multifd_mapped_ram_fdset_end()

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In multifd_mapped_ram_fdset_end() we call qtest_qmp() but forgot
to unref the response QDict we get back, which means it is leaked:

Indirect leak of 4120 byte(s) in 1 object(s) allocated from:
#0 0x55c0c095d318 in __interceptor_calloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f318)
 (BuildI
d: 07f667506452d6c467dbc06fd95191966d3e91b4)
#1 0x7f186f939c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13
#2 0x55c0c0ae9b01 in qdict_new qobject/qdict.c:30:13
#3 0x55c0c0afc16c in parse_object qobject/json-parser.c:317:12
#4 0x55c0c0afb90f in parse_value qobject/json-parser.c:545:16
#5 0x55c0c0afb579 in json_parser_parse qobject/json-parser.c:579:14
#6 0x55c0c0afa21d in json_message_process_token 
qobject/json-streamer.c:92:12
#7 0x55c0c0bca2e5 in json_lexer_feed_char qobject/json-lexer.c:313:13
#8 0x55c0c0bc97ce in json_lexer_feed qobject/json-lexer.c:350:9
#9 0x55c0c0afabbc in json_message_parser_feed qobject/json-streamer.c:121:5
#10 0x55c0c09cbd52 in qmp_fd_receive tests/qtest/libqmp.c:86:9
#11 0x55c0c09be69b in qtest_qmp_receive_dict tests/qtest/libqtest.c:760:12
#12 0x55c0c09bca77 in qtest_qmp_receive tests/qtest/libqtest.c:741:27
#13 0x55c0c09bee9d in qtest_vqmp tests/qtest/libqtest.c:812:12
#14 0x55c0c09bd257 in qtest_qmp tests/qtest/libqtest.c:835:16
#15 0x55c0c0a87747 in multifd_mapped_ram_fdset_end 
tests/qtest/migration-test.c:2393:12
#16 0x55c0c0a85eb3 in test_file_common tests/qtest/migration-test.c:1978:9
#17 0x55c0c0a746a3 in test_multifd_file_mapped_ram_fdset 
tests/qtest/migration-test.c:2437:5
#18 0x55c0c0a93237 in migration_test_wrapper 
tests/qtest/migration-helpers.c:458:5
#19 0x7f186f958aed in test_case_run 
debian/build/deb/../../../glib/gtestutils.c:2930:15
#20 0x7f186f958aed in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3018:16
#21 0x7f186f95880a in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3035:18
#22 0x7f186f95880a in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3035:18
#23 0x7f186f95880a in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3035:18
#24 0x7f186f95880a in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3035:18
#25 0x7f186f95880a in g_test_run_suite_internal 
debian/build/deb/../../../glib/gtestutils.c:3035:18
#26 0x7f186f958faa in g_test_run_suite 
debian/build/deb/../../../glib/gtestutils.c:3109:18
#27 0x7f186f959055 in g_test_run 
debian/build/deb/../../../glib/gtestutils.c:2231:7
#28 0x7f186f959055 in g_test_run 
debian/build/deb/../../../glib/gtestutils.c:2218:1
#29 0x55c0c0a6e427 in main tests/qtest/migration-test.c:4033:11

Unref the object after we've confirmed that it is what we expect.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 37ef99c980..b775ffed81 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2315,6 +2315,7 @@ static void multifd_mapped_ram_fdset_end(QTestState 
*from, QTestState *to,
 g_assert(qdict_haskey(resp, "return"));
 fdsets = qdict_get_qlist(resp, "return");
 g_assert(fdsets && qlist_empty(fdsets));
+qobject_unref(resp);
 }
 
 static void *multifd_mapped_ram_fdset_dio(QTestState *from, QTestState *to)
-- 
2.35.3




[PULL 26/34] migration/multifd: Register nocomp ops dynamically

2024-09-04 Thread Fabiano Rosas
Prior to moving the ram code into multifd-nocomp.c, change the code to
register the nocomp ops dynamically so we don't need to have the ops
structure defined in multifd.c.

While here, move the ops struct initialization to the end of the file
to make the next diff cleaner.

Reviewed-by: Prasad Pandit 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 9f40bb2f16..e100836cbe 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -287,22 +287,12 @@ static int multifd_nocomp_recv(MultiFDRecvParams *p, 
Error **errp)
 return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
 
-static MultiFDMethods multifd_nocomp_ops = {
-.send_setup = multifd_nocomp_send_setup,
-.send_cleanup = multifd_nocomp_send_cleanup,
-.send_prepare = multifd_nocomp_send_prepare,
-.recv_setup = multifd_nocomp_recv_setup,
-.recv_cleanup = multifd_nocomp_recv_cleanup,
-.recv = multifd_nocomp_recv
-};
-
-static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
-[MULTIFD_COMPRESSION_NONE] = &multifd_nocomp_ops,
-};
+static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {};
 
 void multifd_register_ops(int method, MultiFDMethods *ops)
 {
-assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
+assert(0 <= method && method < MULTIFD_COMPRESSION__MAX);
+assert(!multifd_ops[method]);
 multifd_ops[method] = ops;
 }
 
@@ -1701,3 +1691,19 @@ bool multifd_send_prepare_common(MultiFDSendParams *p)
 
 return true;
 }
+
+static MultiFDMethods multifd_nocomp_ops = {
+.send_setup = multifd_nocomp_send_setup,
+.send_cleanup = multifd_nocomp_send_cleanup,
+.send_prepare = multifd_nocomp_send_prepare,
+.recv_setup = multifd_nocomp_recv_setup,
+.recv_cleanup = multifd_nocomp_recv_cleanup,
+.recv = multifd_nocomp_recv
+};
+
+static void multifd_nocomp_register(void)
+{
+multifd_register_ops(MULTIFD_COMPRESSION_NONE, &multifd_nocomp_ops);
+}
+
+migration_init(multifd_nocomp_register);
-- 
2.35.3




[PULL 07/34] tests/qtest/migration-test: Free QCRyptoTLSTestCertReq objects

2024-09-04 Thread Fabiano Rosas
From: Peter Maydell 

In the migration test we create several TLS certificates with
the TLS_* macros from crypto-tls-x509-helpers.h. These macros
create both a QCryptoTLSCertReq object which must be deinitialized
and also an on-disk certificate file. The migration test currently
removes the on-disk file in test_migrate_tls_x509_finish() but
never deinitializes the QCryptoTLSCertReq, which means that memory
allocated as part of it is leaked:

Indirect leak of 2 byte(s) in 1 object(s) allocated from:
#0 0x5558ba33712e in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f12e)
 (BuildId: 4c8618f663e538538cad19d35233124cea161491)
#1 0x7f64afc131f4  (/lib/x86_64-linux-gnu/libtasn1.so.6+0x81f4) (BuildId: 
2fde6ecb43c586fe4077118f771077aa1298e7ea)
#2 0x7f64afc18d58 in asn1_write_value 
(/lib/x86_64-linux-gnu/libtasn1.so.6+0xdd58) (BuildId: 
2fde6ecb43c586fe4077118f771077aa1298e7ea)
#3 0x7f64af8fc678 in gnutls_x509_crt_set_version 
(/lib/x86_64-linux-gnu/libgnutls.so.30+0xe7678) (BuildId: 
97b8f99f392f1fd37b969a7164bcea884e23649b)
#4 0x5558ba470035 in test_tls_generate_cert 
tests/unit/crypto-tls-x509-helpers.c:234:5
#5 0x5558ba464e4a in test_migrate_tls_x509_start_common 
tests/qtest/migration-test.c:1058:5
#6 0x5558ba462c8a in test_migrate_tls_x509_start_default_host 
tests/qtest/migration-test.c:1123:12
#7 0x5558ba45ab40 in test_precopy_common 
tests/qtest/migration-test.c:1786:21
#8 0x5558ba450015 in test_precopy_unix_tls_x509_default_host 
tests/qtest/migration-test.c:2077:5
#9 0x5558ba46d3c7 in migration_test_wrapper 
tests/qtest/migration-helpers.c:456:5

(and similar reports).

The only function currently provided to deinit a QCryptoTLSCertReq is
test_tls_discard_cert(), which also removes the on-disk certificate
file.  For the migration tests we need to retain the on-disk files
until we've finished running the test, so the simplest fix is to
provide a new function test_tls_deinit_cert() which does only the
cleanup of the QCryptoTLSCertReq, and call it in the right places.

Signed-off-by: Peter Maydell 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c |  3 +++
 tests/unit/crypto-tls-x509-helpers.c | 12 ++--
 tests/unit/crypto-tls-x509-helpers.h |  6 ++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 97f99c1316..f0f0335c6b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1061,12 +1061,15 @@ test_migrate_tls_x509_start_common(QTestState *from,
QCRYPTO_TLS_TEST_CLIENT_HOSTILE_NAME :
QCRYPTO_TLS_TEST_CLIENT_NAME,
data->clientcert);
+test_tls_deinit_cert(&servercertreq);
 }
 
 TLS_CERT_REQ_SIMPLE_SERVER(clientcertreq, cacertreq,
data->servercert,
args->certhostname,
args->certipaddr);
+test_tls_deinit_cert(&clientcertreq);
+test_tls_deinit_cert(&cacertreq);
 
 qtest_qmp_assert_success(from,
  "{ 'execute': 'object-add',"
diff --git a/tests/unit/crypto-tls-x509-helpers.c 
b/tests/unit/crypto-tls-x509-helpers.c
index 3e74ec5b5d..b316155d6a 100644
--- a/tests/unit/crypto-tls-x509-helpers.c
+++ b/tests/unit/crypto-tls-x509-helpers.c
@@ -502,8 +502,7 @@ void test_tls_write_cert_chain(const char *filename,
 g_free(buffer);
 }
 
-
-void test_tls_discard_cert(QCryptoTLSTestCertReq *req)
+void test_tls_deinit_cert(QCryptoTLSTestCertReq *req)
 {
 if (!req->crt) {
 return;
@@ -511,6 +510,15 @@ void test_tls_discard_cert(QCryptoTLSTestCertReq *req)
 
 gnutls_x509_crt_deinit(req->crt);
 req->crt = NULL;
+}
+
+void test_tls_discard_cert(QCryptoTLSTestCertReq *req)
+{
+if (!req->crt) {
+return;
+}
+
+test_tls_deinit_cert(req);
 
 if (getenv("QEMU_TEST_DEBUG_CERTS") == NULL) {
 unlink(req->filename);
diff --git a/tests/unit/crypto-tls-x509-helpers.h 
b/tests/unit/crypto-tls-x509-helpers.h
index 562c160653..2a0f7c04fd 100644
--- a/tests/unit/crypto-tls-x509-helpers.h
+++ b/tests/unit/crypto-tls-x509-helpers.h
@@ -73,6 +73,12 @@ void test_tls_generate_cert(QCryptoTLSTestCertReq *req,
 void test_tls_write_cert_chain(const char *filename,
gnutls_x509_crt_t *certs,
size_t ncerts);
+/*
+ * Deinitialize the QCryptoTLSTestCertReq, but don't delete the certificate
+ * file on disk. (The caller is then responsible for doing that themselves.
+ */
+void test_tls_deinit_cert(QCryptoTLSTestCertReq *req);
+/* Deinit the QCryptoTLSTestCertReq, and delete the certificate file */
 void test_tls_discard_cert(QCryptoTLSTestCertReq *req);
 
 void test_tls_init(const char *keyfile);
-- 
2.35.3




Re: [RFC V1 4/6] migration: cpr_get_fd_param helper

2024-09-03 Thread Fabiano Rosas
Steve Sistare  writes:

> Add the helper function cpr_get_fd_param, for use by tap and vdpa.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas 



Re: [PATCH v2 13/17] migration/multifd: Add migration_has_device_state_support()

2024-09-03 Thread Fabiano Rosas
"Maciej S. Szmigiero"  writes:

> On 30.08.2024 20:55, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero"  writes:
>> 
>>> From: "Maciej S. Szmigiero" 
>>>
>>> Since device state transfer via multifd channels requires multifd
>>> channels with packets and is currently not compatible with multifd
>>> compression add an appropriate query function so device can learn
>>> whether it can actually make use of it.
>>>
>>> Signed-off-by: Maciej S. Szmigiero 
>> 
>> Reviewed-by: Fabiano Rosas 
>> 
>> Out of curiosity, what do you see as a blocker for migrating to a file?
>> 
>> We would just need to figure out a mapping from file offset some unit of
>> data to be able to write in parallel like with ram (of which the page
>> offset is mapped to the file offset).
>
> I'm not sure whether there's a point in that since VFIO devices
> just provide a raw device state stream - there's no way to know
> that some buffer is no longer needed because it consisted of
> dirty data that was completely overwritten by a later buffer.
>
> Also, the device type that the code was developed against - a (smart)
> NIC - has so large device state because (more or less) it keeps a lot
> of data about network connections passing / made through it.
>
> It doesn't really make sense to make snapshot of such device for later
> reload since these connections will be long dropped by their remote
> peers by this point.
>
> Such snapshotting might make more sense with GPU VFIO devices though.
>
> If such file migration support is desired at some later point then for
> sure the whole code would need to be carefully re-checked for implicit
> assumptions.

Thanks, let's keep those arguments in mind, maybe we put them in a doc
somewhere so we remember this in the future.

>
> Thanks,
> Maciej



Re: [PATCH v2 10/17] migration/multifd: Convert multifd_send()::next_channel to atomic

2024-09-03 Thread Fabiano Rosas
"Maciej S. Szmigiero"  writes:

> On 30.08.2024 20:13, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero"  writes:
>> 
>>> From: "Maciej S. Szmigiero" 
>>>
>>> This is necessary for multifd_send() to be able to be called
>>> from multiple threads.
>>>
>>> Signed-off-by: Maciej S. Szmigiero 
>>> ---
>>>   migration/multifd.c | 24 ++--
>>>   1 file changed, 18 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index d5a8e5a9c9b5..b25789dde0b3 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -343,26 +343,38 @@ bool multifd_send(MultiFDSendData **send_data)
>>>   return false;
>>>   }
>>>   
>>> -/* We wait here, until at least one channel is ready */
>>> -qemu_sem_wait(&multifd_send_state->channels_ready);
>>> -
>>>   /*
>>>* next_channel can remain from a previous migration that was
>>>* using more channels, so ensure it doesn't overflow if the
>>>* limit is lower now.
>>>*/
>>> -next_channel %= migrate_multifd_channels();
>>> -for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
>>> +i = qatomic_load_acquire(&next_channel);
>>> +if (unlikely(i >= migrate_multifd_channels())) {
>>> +qatomic_cmpxchg(&next_channel, i, 0);
>>> +}
>> 
>> Do we still need this? It seems not, because the mod down below would
>> already truncate to a value less than the number of channels. We don't
>> need it to start at 0 always, the channels are equivalent.
>
> The "modulo" operation below forces i_next to be in the proper range,
> not i.
>
> If the qatomic_cmpxchg() ends up succeeding then we use the (now out of
> bounds) i value to index multifd_send_state->params[].

Indeed.

>
>>> +
>>> +/* We wait here, until at least one channel is ready */
>>> +qemu_sem_wait(&multifd_send_state->channels_ready);
>>> +
>>> +while (true) {
>>> +int i_next;
>>> +
>>>   if (multifd_send_should_exit()) {
>>>   return false;
>>>   }
>>> +
>>> +i = qatomic_load_acquire(&next_channel);
>>> +i_next = (i + 1) % migrate_multifd_channels();
>>> +if (qatomic_cmpxchg(&next_channel, i, i_next) != i) {
>>> +continue;
>>> +}
>> 
>> Say channel 'i' is the only one that's idle. What's stopping the other
>> thread(s) to race at this point and loop around to the same index?
>
> See the reply below.
>
>>> +
>>>   p = &multifd_send_state->params[i];
>>>   /*
>>>* Lockless read to p->pending_job is safe, because only multifd
>>>* sender thread can clear it.
>>>*/
>>>   if (qatomic_read(&p->pending_job) == false) {
>> 
>> With the cmpxchg your other patch adds here, then the race I mentioned
>> above should be harmless. But we'd need to bring that code into this
>> patch.
>> 
>
> You're right - the sender code with this patch alone isn't thread safe
> yet but this commit is only literally about "converting
> multifd_send()::next_channel to atomic".
>
> At the time of this patch there aren't any multifd_send() calls from
> multiple threads, and the commit that introduces such possible call
> site (multifd_queue_device_state()) also modifies multifd_send()
> to be fully thread safe by introducing p->pending_job_preparing.

In general this would be a bad practice because this commit can end up
being moved around due to backporting or bisecting. It would be better
if it were complete from the start. It also affects backporting due to
ambiguity on where the Fixes tag should point to if someone eventually
finds a bug.

I already asked you to extract the other code into a separate patch, so
it's not that bad. If you prefer to keep both changes separate for
clarity, please note on the commit message that the next patch is
necessary for correctness.

>
> Thanks,
> Maciej



Re: [PATCH v2 09/17] migration/multifd: Device state transfer support - receive side

2024-09-03 Thread Fabiano Rosas
"Maciej S. Szmigiero"  writes:

> On 30.08.2024 22:22, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero"  writes:
>> 
>>> From: "Maciej S. Szmigiero" 
>>>
>>> Add a basic support for receiving device state via multifd channels -
>>> channels that are shared with RAM transfers.
>>>
>>> To differentiate between a device state and a RAM packet the packet
>>> header is read first.
>>>
>>> Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the
>>> packet header either device state (MultiFDPacketDeviceState_t) or RAM
>>> data (existing MultiFDPacket_t) is then read.
>>>
>>> The received device state data is provided to
>>> qemu_loadvm_load_state_buffer() function for processing in the
>>> device's load_state_buffer handler.
>>>
>>> Signed-off-by: Maciej S. Szmigiero 
>>> ---
>>>   migration/multifd.c | 127 +---
>>>   migration/multifd.h |  31 ++-
>>>   2 files changed, 138 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index b06a9fab500e..d5a8e5a9c9b5 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
> (..)
>>>   g_free(p->zero);
>>> @@ -1126,8 +1159,13 @@ static void *multifd_recv_thread(void *opaque)
>>>   rcu_register_thread();
>>>   
>>>   while (true) {
>>> +MultiFDPacketHdr_t hdr;
>>>   uint32_t flags = 0;
>>> +bool is_device_state = false;
>>>   bool has_data = false;
>>> +uint8_t *pkt_buf;
>>> +size_t pkt_len;
>>> +
>>>   p->normal_num = 0;
>>>   
>>>   if (use_packets) {
>>> @@ -1135,8 +1173,28 @@ static void *multifd_recv_thread(void *opaque)
>>>   break;
>>>   }
>>>   
>>> -ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>>> -   p->packet_len, &local_err);
>>> +ret = qio_channel_read_all_eof(p->c, (void *)&hdr,
>>> +   sizeof(hdr), &local_err);
>>> +if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>>> +break;
>>> +}
>>> +
>>> +ret = multifd_recv_unfill_packet_header(p, &hdr, &local_err);
>>> +if (ret) {
>>> +break;
>>> +}
>>> +
>>> +is_device_state = p->flags & MULTIFD_FLAG_DEVICE_STATE;
>>> +if (is_device_state) {
>>> +pkt_buf = (uint8_t *)p->packet_dev_state + sizeof(hdr);
>>> +pkt_len = sizeof(*p->packet_dev_state) - sizeof(hdr);
>>> +} else {
>>> +pkt_buf = (uint8_t *)p->packet + sizeof(hdr);
>>> +pkt_len = p->packet_len - sizeof(hdr);
>>> +}
>> 
>> Should we have made the packet an union as well? Would simplify these
>> sorts of operations. Not sure I want to start messing with that at this
>> point to be honest. But OTOH, look at this...
>
> RAM packet length is not constant (at least from the viewpoint of the
> migration code) so the union allocation would need some kind of a
> "multifd_ram_packet_size()" runtime size determination.
>
> Also, since RAM and device state packet body size is different then
> for the extra complexity introduced by that union we'll just get rid of
> that single pkt_buf assignment.
>
>>> +
>>> +ret = qio_channel_read_all_eof(p->c, (char *)pkt_buf, pkt_len,
>>> +   &local_err);
>>>   if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>>>   break;
>>>   }
>>> @@ -1181,8 +1239,33 @@ static void *multifd_recv_thread(void *opaque)
>>>   has_data = !!p->data->size;
>>>   }
>>>   
>>> -if (has_data) {
>>> -ret = multifd_recv_state->ops->recv(p, &local_err);
>>> +if (!is_device_state) {
>>> +if (has_data) {
>>> +ret = multifd_recv_state->ops->recv(p, &local_err);
>>> +if (ret != 0) {
>>> +

Re: [PATCH v2 05/17] thread-pool: Implement non-AIO (generic) pool support

2024-09-03 Thread Fabiano Rosas
"Maciej S. Szmigiero"  writes:

> On 3.09.2024 00:07, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero"  writes:
>> 
>>> From: "Maciej S. Szmigiero" 
>>>
>>> Migration code wants to manage device data sending threads in one place.
>>>
>>> QEMU has an existing thread pool implementation, however it was limited
>>> to queuing AIO operations only and essentially had a 1:1 mapping between
>>> the current AioContext and the ThreadPool in use.
>>>
>>> Implement what is necessary to queue generic (non-AIO) work on a ThreadPool
>>> too.
>>>
>>> This brings a few new operations on a pool:
>>> * thread_pool_set_minmax_threads() explicitly sets the minimum and maximum
>>> thread count in the pool.
>>>
>>> * thread_pool_join() operation waits until all the submitted work requests
>>> have finished.
>>>
>>> * thread_pool_poll() lets the new thread and / or thread completion bottom
>>> halves run (if they are indeed scheduled to be run).
>>> It is useful for thread pool users that need to launch or terminate new
>>> threads without returning to the QEMU main loop.
>>>
>>> Signed-off-by: Maciej S. Szmigiero 
>>> ---
>>>   include/block/thread-pool.h   | 10 -
>>>   tests/unit/test-thread-pool.c |  2 +-
>>>   util/thread-pool.c| 77 ++-
>>>   3 files changed, 76 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
>>> index b484c4780ea6..1769496056cd 100644
>>> --- a/include/block/thread-pool.h
>>> +++ b/include/block/thread-pool.h
>>> @@ -37,9 +37,15 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func,
>>>  void *arg, GDestroyNotify arg_destroy,
>>>  BlockCompletionFunc *cb, void *opaque);
>>>   int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg);
>>> -void thread_pool_submit(ThreadPoolFunc *func,
>>> -void *arg, GDestroyNotify arg_destroy);
>>> +BlockAIOCB *thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func,
>>> +   void *arg, GDestroyNotify arg_destroy,
>>> +   BlockCompletionFunc *cb, void *opaque);
>> 
>> These kinds of changes (create wrappers, change signatures, etc), could
>> be in their own patch as it's just code motion that should not have
>> functional impact. The "no_requests" stuff would be better discussed in
>> a separate patch.
>
> These changes *all* should have no functional impact on existing callers.
>
> But I get your overall point, will try to separate these really trivial
> parts.

Yeah, I guess I meant that one set of changes has a larger potential for
introducing a bug while the other is clearly harmless.

>
>>>   
>>> +void thread_pool_join(ThreadPool *pool);
>>> +void thread_pool_poll(ThreadPool *pool);
>>> +
>>> +void thread_pool_set_minmax_threads(ThreadPool *pool,
>>> +int min_threads, int max_threads);
>>>   void thread_pool_update_params(ThreadPool *pool, struct AioContext *ctx);
>>>   
>>>   #endif
>>> diff --git a/tests/unit/test-thread-pool.c b/tests/unit/test-thread-pool.c
>>> index e4afb9e36292..469c0f7057b6 100644
>>> --- a/tests/unit/test-thread-pool.c
>>> +++ b/tests/unit/test-thread-pool.c
>>> @@ -46,7 +46,7 @@ static void done_cb(void *opaque, int ret)
>>>   static void test_submit(void)
>>>   {
>>>   WorkerTestData data = { .n = 0 };
>>> -thread_pool_submit(worker_cb, &data, NULL);
>>> +thread_pool_submit(NULL, worker_cb, &data, NULL, NULL, NULL);
>>>   while (data.n == 0) {
>>>   aio_poll(ctx, true);
>>>   }
>>> diff --git a/util/thread-pool.c b/util/thread-pool.c
>>> index 69a87ee79252..2bf3be875a51 100644
>>> --- a/util/thread-pool.c
>>> +++ b/util/thread-pool.c
>>> @@ -60,6 +60,7 @@ struct ThreadPool {
>>>   QemuMutex lock;
>>>   QemuCond worker_stopped;
>>>   QemuCond request_cond;
>>> +QemuCond no_requests_cond;
>>>   QEMUBH *new_thread_bh;
>>>   
>>>   /* The following variables are only accessed from one AioContext. */
>>> @@ -73,6 +74,7 @@ struct ThreadPool {
>>>   int pending_threads; /*

  1   2   3   4   5   6   7   8   9   10   >