[PATCH v3] i386/cpu: fixup number of addressable IDs for processor cores in the physical package

2024-06-10 Thread Chuang Xu
When QEMU is started with:
-cpu host,host-cache-info=on,l3-cache=off \
-smp 2,sockets=1,dies=1,cores=1,threads=2
Guest can't acquire maximum number of addressable IDs for processor cores in
the physical package from CPUID[04H].

When creating a CPU topology of 1 core per package, host-cache-info only
uses the Host's addressable core IDs field (CPUID.04H.EAX[bits 31-26]),
resulting in a conflict (on the multicore Host) between the Guest core
topology information in this field and the Guest's actual cores number.

Fix it by removing the unnecessary condition to cover 1 core per package
case. This is safe because cores_per_pkg will not be 0 and will be at
least 1.

Fixes: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")
Signed-off-by: Guixiong Wei 
Signed-off-by: Yipeng Yin 
Signed-off-by: Chuang Xu 
---
 target/i386/cpu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647..b68f7460db 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6426,10 +6426,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
-if (cores_per_pkg > 1) {
-*eax &= ~0xFC00;
-*eax |= max_core_ids_in_package(_info) << 26;
-}
+*eax &= ~0xFC00;
+*eax |= max_core_ids_in_package(_info) << 26;
 if (host_vcpus_per_cache > threads_per_pkg) {
 *eax &= ~0x3FFC000;
 
-- 
2.20.1




[PATCH v2] i386/cpu: fixup number of addressable IDs for processor cores in the physical package

2024-06-03 Thread Chuang Xu
When QEMU is started with:
-cpu host,host-cache-info=on,l3-cache=off \
-smp 2,sockets=1,dies=1,cores=1,threads=2
Guest can't acquire maximum number of addressable IDs for processor cores in
the physical package from CPUID[04H].

When testing Intel TDX, guest attempts to acquire extended topology from 
CPUID[0bH],
but because the TDX module doesn't provide the emulation of CPUID[0bH],
guest will instead acquire extended topology from CPUID[04H]. However,
due to QEMU's inaccurate emulation of CPUID[04H], one of the vcpus in 2c TDX
guest would be offline.

Fix it by removing the unnecessary condition.

Fixes: d7caf13b5fcf742e5680c1d3448ba070fc811644 ("x86: cpu: fixup number of 
addressable IDs for logical processors sharing cache")

Signed-off-by: Guixiong Wei 
Signed-off-by: Yipeng Yin 
Signed-off-by: Chuang Xu 
---
 target/i386/cpu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647..b68f7460db 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6426,10 +6426,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
-if (cores_per_pkg > 1) {
-*eax &= ~0xFC00;
-*eax |= max_core_ids_in_package(_info) << 26;
-}
+*eax &= ~0xFC00;
+*eax |= max_core_ids_in_package(_info) << 26;
 if (host_vcpus_per_cache > threads_per_pkg) {
 *eax &= ~0x3FFC000;
 
-- 
2.20.1




Re: [PATCH] x86: cpu: fixup number of addressable IDs for processor cores in the physical package

2024-05-27 Thread Chuang Xu

Hi Zhao,

On 2024/5/28 上午10:31, Zhao Liu wrote:

Hi Chuang,

On Mon, May 27, 2024 at 11:13:33AM +0800, Chuang Xu wrote:

Date: Mon, 27 May 2024 11:13:33 +0800
From: Chuang Xu 
Subject: [PATCH] x86: cpu: fixup number of addressable IDs for processor
  cores in the physical package

According to the usual practice of QEMU commits, people tend to use
"i386/cpu" as the subject prefix, which indicates the code path.


X-Mailer: git-send-email 2.24.3 (Apple Git-128)

When QEMU is started with:
-cpu host,host-cache-info=on,l3-cache=off \

Just a discussion, "l3-cache=off" doesn't work in host cache pssthu
case, do you have a specific need that you don't want to see l3 cache?


No specific need, just generated by libvirt.

-smp 2,sockets=1,dies=1,cores=1,threads=2
Guest can't acquire maximum number of addressable IDs for processor cores in
the physical package from CPUID[04H].

This bug was introduced in commit d7caf13b5fcf742e5680c1d3448ba070fc811644.
Fix it by changing the judgement condition to a >= 1.

Pls add a "Fixes" tag like:

Fixes: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")

Since this is a historical issue that deserves to be ported to the
stable branch, you can cc stable list by:

Cc: qemu-sta...@nongnu.org


Signed-off-by: Chuang Xu 

As the patch sender, it's better to put your signature on the last line.
;-)


Signed-off-by: Guixiong Wei 
Signed-off-by: Yipeng Yin 
---
  target/i386/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cd16cb893d..0369c01153 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6097,7 +6097,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  if (*eax & 31) {
  int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
  int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
+if (cs->nr_cores >= 1) {

Like Igor suggested, this condition could be removed since cs->nr_cores can't
be 0.


  *eax &= ~0xFC00;
  *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
  }

...the code is outdated, pls rebase on the latest master branch.

My fault, sorry for forgetting to pull the latest code..

Regards,
Zhao


Thanks for all your suggestions!

Chuang




[PATCH] x86: cpu: fixup number of addressable IDs for processor cores in the physical package

2024-05-26 Thread Chuang Xu
When QEMU is started with:
-cpu host,host-cache-info=on,l3-cache=off \
-smp 2,sockets=1,dies=1,cores=1,threads=2
Guest can't acquire maximum number of addressable IDs for processor cores in
the physical package from CPUID[04H].

This bug was introduced in commit d7caf13b5fcf742e5680c1d3448ba070fc811644.
Fix it by changing the judgement condition to a >= 1.

Signed-off-by: Chuang Xu 
Signed-off-by: Guixiong Wei 
Signed-off-by: Yipeng Yin 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cd16cb893d..0369c01153 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6097,7 +6097,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
+if (cs->nr_cores >= 1) {
 *eax &= ~0xFC00;
 *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
 }
-- 
2.20.1




Re: [PATCH v2 04/11] multifd: Count the number of bytes sent correctly

2023-06-16 Thread Chuang Xu
Hi,Juan,

On 2023/1/30 下午4:09, Juan Quintela wrote:
> Current code asumes that all pages are whole.  That is not true for
> example for compression already.  Fix it for creating a new field
> ->sent_bytes that includes it.
>
> All ram_counters are used only from the migration thread, so we have
> two options:
> - put a mutex and fill everything when we sent it (not only
>ram_counters, also qemu_file->xfer_bytes).
> - Create a local variable that implements how much has been sent
>through each channel.  And when we push another packet, we "add" the
>previous stats.
>
> I choose two due to less changes overall.  On the previous code we
> increase transferred and then we sent.  Current code goes the other
> way around.  It sents the data, and after the fact, it updates the
> counters.  Notice that each channel can have a maximum of half a
> megabyte of data without counting, so it is not very important.
>
> Signed-off-by: Juan Quintela 
> ---
>   migration/multifd.h | 2 ++
>   migration/multifd.c | 6 --
>   2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index e2802a9ce2..36f899c56f 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -102,6 +102,8 @@ typedef struct {
>   uint32_t flags;
>   /* global number of generated multifd packets */
>   uint64_t packet_num;
> +/* How many bytes have we sent on the last packet */
> +uint64_t sent_bytes;
>   /* thread has work to do */
>   int pending_job;
>   /* array of pages to sent.
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 61cafe4c76..cd26b2fda9 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
>   static int next_channel;
>   MultiFDSendParams *p = NULL; /* make happy gcc */
>   MultiFDPages_t *pages = multifd_send_state->pages;
> -uint64_t transferred;
>
>   if (qatomic_read(_send_state->exiting)) {
>   return -1;
> @@ -429,7 +428,8 @@ static int multifd_send_pages(QEMUFile *f)
>   p->packet_num = multifd_send_state->packet_num++;
>   multifd_send_state->pages = p->pages;
>   p->pages = pages;
> -transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
> +uint64_t transferred = p->sent_bytes;
> +p->sent_bytes = 0;
>   qemu_file_acct_rate_limit(f, transferred);
>   qemu_mutex_unlock(>mutex);
>   stat64_add(_atomic_counters.multifd_bytes, transferred);
> @@ -719,6 +719,8 @@ static void *multifd_send_thread(void *opaque)
>   }
>
>   qemu_mutex_lock(>mutex);
> +p->sent_bytes += p->packet_len;
> +p->sent_bytes += p->next_packet_size;

Consider a scenario where some normal pages are transmitted in the first round,
followed by several consecutive rounds of zero pages. When zero pages
are transmitted,
next_packet_size of first round is still incorrectly added to
sent_bytes. If we set a rate
limiting for dirty page transmission, the transmission performance of
multi zero check
will degrade.

Maybe we should set next_packet_size to 0 in multifd_send_pages()?

>   p->pending_job--;
>   qemu_mutex_unlock(>mutex);
>



Re: [PATCH v8 0/6] migration: reduce time of loading non-iterable vmstate

2023-06-16 Thread Chuang Xu

Hi, Paolo,

A few months ago, Juan told me that this series requires your or someone 
familiar with memory API's feedback.


Could you please review it and provide some suggestions?

On 2023/3/17 下午4:18, Chuang Xu wrote:

In this version:

- delete useless line change.
- update comments and commit messages.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8362 CPU
   - Mellanox Technologies MT28841
- VM
   - 32 CPUs 128GB RAM VM
   - 8 16-queue vhost-net device
   - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   112 ms   285 ms
after20 ms194 ms


In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8362 CPU
   - Mellanox Technologies MT28841
- VM
   - 32 CPUs 128GB RAM VM
   - 8 1-queue vhost-net device
   - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   65 ms151 ms

after19 ms100 ms


In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8362 CPU
   - Mellanox Technologies MT28841
- VM
   - 32 CPUs 128GB RAM VM
   - 1 16-queue vhost-net device
   - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   24 ms51 ms
after9 ms 36 ms


As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Please review, Chuang


Thanks!



[PATCH v8 3/6] memory: Introduce memory_region_transaction_do_commit()

2023-03-17 Thread Chuang Xu
Split memory_region_transaction_do_commit() from
memory_region_transaction_commit().

We'll call do_commit() in address_space_to_flatview() in the later patch.

Signed-off-by: Chuang Xu 
---
 softmmu/memory.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index a992a365d9..33ecc62ee9 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1093,34 +1093,41 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }
 
-void memory_region_transaction_commit(void)
+void memory_region_transaction_do_commit(void)
 {
 AddressSpace *as;
 
-assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
---memory_region_transaction_depth;
-if (!memory_region_transaction_depth) {
-if (memory_region_update_pending) {
-flatviews_reset();
+if (memory_region_update_pending) {
+flatviews_reset();
 
-MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_set_flatview(as);
-address_space_update_ioeventfds(as);
-}
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
-} else if (ioeventfd_update_pending) {
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_update_ioeventfds(as);
-}
-ioeventfd_update_pending = false;
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_set_flatview(as);
+address_space_update_ioeventfds(as);
+}
+memory_region_update_pending = false;
+ioeventfd_update_pending = false;
+MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+} else if (ioeventfd_update_pending) {
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_update_ioeventfds(as);
 }
-   }
+ioeventfd_update_pending = false;
+}
+}
+
+void memory_region_transaction_commit(void)
+{
+assert(memory_region_transaction_depth);
+assert(qemu_mutex_iothread_locked());
+
+--memory_region_transaction_depth;
+if (!memory_region_transaction_depth) {
+memory_region_transaction_do_commit();
+}
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
2.20.1




[PATCH v8 0/6] migration: reduce time of loading non-iterable vmstate

2023-03-17 Thread Chuang Xu


In this version:

- delete useless line change.
- update comments and commit messages.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   112 ms   285 ms
after20 ms194 ms


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   65 ms151 ms

after19 ms100 ms


In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   24 ms51 ms
after9 ms 36 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang

[v7]

- introduce address_space_to_flatview_rcu().
- squash peter's fix into patch 1.
- rebase to latest upstream.
- update test results.

[v6]

- add peter's patch.
- split mr_do_commit() from mr_commit().
- adjust the sanity check in address_space_to_flatview().
- rebase to latest upstream.
- replace 8260 with 8362 as testing host.
- update the latest test results.

[v5]

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms






[PATCH v8 2/6] rcu: Introduce rcu_read_is_locked()

2023-03-17 Thread Chuang Xu
Add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 313fc414bc..7bf45602e1 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -115,6 +115,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[PATCH v8 4/6] memory: Add do_commit() and sanity check in address_space_to_flatview

2023-03-17 Thread Chuang Xu
In next patch, we wrap the vm_load into a whole mr transaction to speed
up vm_load. This brings a problem, old flatviews may be referenced
during the vm_load. vm_load contains far more memory updates than
referencing to a specific flatview, hence we introduce do_commit to
make sure address_space_to_flatview will return the newest flatview and
it should logically only be triggered in a few spots during vm_load.

Other than that, sanity check whether BQL or rcu is held before using
any flatview.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 23 +++
 softmmu/memory.c  |  5 +
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6fa0b071f0..d6fd89db64 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1095,8 +1096,30 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
+void memory_region_transaction_do_commit(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+if (qemu_mutex_iothread_locked()) {
+/* We exclusively own the flatview now.. */
+if (memory_region_transaction_in_progress()) {
+/*
+ * Fetch the flatview within a transaction in-progress, it
+ * means current_map may not be the latest, we need to update
+ * immediately to make sure the caller won't see obsolete
+ * mapping.
+ */
+memory_region_transaction_do_commit();
+}
+
+/* No further protection needed to access current_map */
+return as->current_map;
+}
+
+/* Otherwise we must have had the RCU lock or something went wrong */
+assert(rcu_read_is_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 33ecc62ee9..6a8e8b4e71 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1130,6 +1130,11 @@ void memory_region_transaction_commit(void)
 }
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




[PATCH v8 1/6] memory: Reference as->current_map directly in memory commit

2023-03-17 Thread Chuang Xu
From: Peter Xu 

Calling RCU variants of address_space_get|to_flatview() during memory
commit (flatview updates, triggering memory listeners, or updating
ioeventfds, etc.) is not 100% accurate, because commit() requires BQL
rather than RCU read lock, so the context exclusively owns current_map and
can be directly referenced.

Neither does it need a refcount to current_map because it cannot be freed
from under the caller.

Add address_space_get_flatview_raw() for the case where the context holds
BQL rather than RCU read lock and use it across the core memory updates,
Drop the extra refcounts on FlatView*.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 4699ba55ec..a992a365d9 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -61,6 +61,13 @@ struct AddrRange {
 Int128 size;
 };
 
+/* Called with BQL held */
+static inline FlatView *address_space_to_flatview_raw(AddressSpace *as)
+{
+assert(qemu_mutex_iothread_locked());
+return as->current_map;
+}
+
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
 return (AddrRange) { start, size };
@@ -155,7 +162,7 @@ enum ListenerDirection { Forward, Reverse };
 #define MEMORY_LISTENER_UPDATE_REGION(fr, as, dir, callback, _args...)  \
 do {\
 MemoryRegionSection mrs = section_from_flat_range(fr,   \
-address_space_to_flatview(as)); \
+address_space_to_flatview_raw(as)); \
 MEMORY_LISTENER_CALL(as, callback, dir, , ##_args); \
 } while(0)
 
@@ -753,6 +760,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 }
 
 static void address_space_add_del_ioeventfds(AddressSpace *as,
+ FlatView *view,
  MemoryRegionIoeventfd *fds_new,
  unsigned fds_new_nb,
  MemoryRegionIoeventfd *fds_old,
@@ -774,7 +782,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
   _new[inew]))) {
 fd = _old[iold];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -787,7 +795,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
  _old[iold]))) {
 fd = _new[inew];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -833,7 +841,7 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 ioeventfd_max = QEMU_ALIGN_UP(as->ioeventfd_nb, 4);
 ioeventfds = g_new(MemoryRegionIoeventfd, ioeventfd_max);
 
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
 tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
@@ -852,13 +860,12 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 }
 }
 
-address_space_add_del_ioeventfds(as, ioeventfds, ioeventfd_nb,
+address_space_add_del_ioeventfds(as, view, ioeventfds, ioeventfd_nb,
  as->ioeventfds, as->ioeventfd_nb);
 
 g_free(as->ioeventfds);
 as->ioeventfds = ioeventfds;
 as->ioeventfd_nb = ioeventfd_nb;
-flatview_unref(view);
 }
 
 /*
@@ -1026,7 +1033,7 @@ static void flatviews_reset(void)
 
 static void address_space_set_flatview(AddressSpace *as)
 {
-FlatView *old_view = address_space_to_flatview(as);
+FlatView *old_view = address_space_to_flatview_raw(as);
 MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
 FlatView *new_view = g_hash_table_lookup(flat_views, physmr);
 
@@ -2979,8 +2986,7 @@ static void listener_add_address_space(MemoryListener 
*listener,
 listener->log_global_start(listener);
 }
 }
-
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 MemoryRegionSection section = section_from_flat_range(fr, view);
 
@@ -2994,7 +3000,6 @@ static void listener_add_address_space(MemoryListener 
*listener,
 if (listener->commit) {
 listener->commit(listener);
 }
-flatview_unref(view);
 }
 
 static void listener_del_address_space(MemoryListener *listener,
@@ 

[PATCH v8 6/6] memory: Introduce address_space_to_flatview_rcu()

2023-03-17 Thread Chuang Xu
In last patch, we wrap vm_load with begin/commit, here we introduce
address_space_to_flatview_rcu() to avoid unnecessary enforce commit
during vm_load.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 17 +
 softmmu/memory.c  |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index d6fd89db64..2bf702dc94 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1123,6 +1123,23 @@ static inline FlatView 
*address_space_to_flatview(AddressSpace *as)
 return qatomic_rcu_read(>current_map);
 }
 
+/*
+ * We recommend using address_space_to_flatview() rather than this one.
+ * Note that if we use this during a memory region transaction, we may
+ * see obsolete flatviews. We must bear with an obsolete map until commit.
+ * And outside a memory region transaction, this is basically the same as
+ * address_space_to_flatview().
+ */
+static inline FlatView *address_space_to_flatview_rcu(AddressSpace *as)
+{
+/*
+ * Before using any flatview, sanity check BQL or RCU is held.
+ */
+assert(qemu_mutex_iothread_locked() || rcu_read_is_locked());
+
+return qatomic_rcu_read(>current_map);
+}
+
 /**
  * typedef flatview_cb: callback for flatview_for_each_range()
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 6a8e8b4e71..33d14e967d 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -815,7 +815,7 @@ FlatView *address_space_get_flatview(AddressSpace *as)
 
 RCU_READ_LOCK_GUARD();
 do {
-view = address_space_to_flatview(as);
+view = address_space_to_flatview_rcu(as);
 /* If somebody has replaced as->current_map concurrently,
  * flatview_ref returns false.
  */
-- 
2.20.1




[PATCH v8 5/6] migration: Reduce time of loading non-iterable vmstate

2023-03-17 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Note that the following test results are based on the application of the
next patch. Without the next patch, the improvement will be reduced.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 112 ms  285 ms
after   about 20 ms   194 ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 65 ms  about 151 ms

after   about 19 ms  about 100 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 24 ms  about 51 ms
after   about 9 ms   about 36 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index aa54a67fda..ecf7b27000 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2787,7 +2787,25 @@ int qemu_loadvm_state(QEMUFile *f)
 
 cpu_synchronize_all_pre_loadvm();
 
+/*
+ * Call memory_region_transaction_begin() before loading vmstate.
+ * This call is paired with memory_region_transaction_commit() at
+ * the end of qemu_loadvm_state_main(), in order to pack all the
+ * changes to memory region during the period of loading
+ * non-iterable vmstate in a single memory transaction.
+ * This operation will reduce time of loading non-iterable vmstate
+ */
+memory_region_transaction_begin();
+
 ret = qemu_loadvm_state_main(f, mis);
+
+/*
+ * Call memory_region_transaction_commit() after loading vmstate.
+ * At this point, qemu actually completes all the previous memory
+ * region transactions.
+ */
+memory_region_transaction_commit();
+
 qemu_event_set(>main_thread_load_event);
 
 trace_qemu_loadvm_state_post_main(ret);
-- 
2.20.1




Re: [PATCH v7 6/6] memory: Introduce address_space_to_flatview_rcu()

2023-03-13 Thread Chuang Xu

Hi, Peter,

On 2023/3/10 下午11:08, Peter Xu wrote:

On Fri, Mar 10, 2023 at 10:24:25AM +0800, Chuang Xu wrote:

In last patch, we wrap vm_load with begin/commit, here we introduce
address_space_to_flatview_rcu() to avoid unnecessary enforce commit
during vm_load.

Signed-off-by: Chuang Xu 
---
  include/exec/memory-internal.h |  2 +-
  include/exec/memory.h  | 20 
  softmmu/memory.c   |  2 +-
  3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 100c1237ac..1432240449 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -30,7 +30,7 @@ static inline AddressSpaceDispatch 
*flatview_to_dispatch(FlatView *fv)
  
  static inline AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)

  {
-return flatview_to_dispatch(address_space_to_flatview(as));
+return flatview_to_dispatch(address_space_to_flatview_rcu(as));
  }

I'm not sure whether this one is always safe.


Previously I considered that there was no address_space_translate_iommu()
traced during vm_load, so I took it as safe. But my trace may not be
able to obtain all cases of triggering do_commit() during vm_load..



tcg_commit() seems to be safe, but maybe address_space_translate_iommu() is
not?  Maybe easier to leave this untouched?


Yes, I'll fix it in v8 later.

  
  FlatView *address_space_get_flatview(AddressSpace *as);

diff --git a/include/exec/memory.h b/include/exec/memory.h
index d6fd89db64..235e3017bc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1100,6 +1100,9 @@ bool memory_region_transaction_in_progress(void);
  
  void memory_region_transaction_do_commit(void);
  
+/*

+ * We recommend using this by default.
+ */

I think this comment doesn't really help.. drop it?


  static inline FlatView *address_space_to_flatview(AddressSpace *as)


Thanks!




Re: [PATCH v7 3/6] memory: Introduce memory_region_transaction_do_commit()

2023-03-12 Thread Chuang Xu

Hi, Peter,

On 2023/3/10 下午10:51, Peter Xu wrote:

On Fri, Mar 10, 2023 at 10:24:22AM +0800, Chuang Xu wrote:

Split memory_region_transaction_do_commit() from
memory_region_transaction_commit().

We'll call do_commit() in address_space_to_flatview() in the later patch.

Signed-off-by: Chuang Xu 

[...]


+void memory_region_transaction_commit(void)
+{
+assert(memory_region_transaction_depth);
+assert(qemu_mutex_iothread_locked());

This context has nothing to assert BQL, so this can be dropped I think (you
have one in do_commit).


do_commit() will be triggered only when depth is 0. Before do_commit() is
triggered, we must ensure that the BQL is held when the depth is modified.


+
+--memory_region_transaction_depth;
+if (!memory_region_transaction_depth) {
+memory_region_transaction_do_commit();
+}
  }

With above dropped:

Reviewed-by: Peter Xu 


Thanks!




[PATCH v7 6/6] memory: Introduce address_space_to_flatview_rcu()

2023-03-09 Thread Chuang Xu
In last patch, we wrap vm_load with begin/commit, here we introduce
address_space_to_flatview_rcu() to avoid unnecessary enforce commit
during vm_load.

Signed-off-by: Chuang Xu 
---
 include/exec/memory-internal.h |  2 +-
 include/exec/memory.h  | 20 
 softmmu/memory.c   |  2 +-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 100c1237ac..1432240449 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -30,7 +30,7 @@ static inline AddressSpaceDispatch 
*flatview_to_dispatch(FlatView *fv)
 
 static inline AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
 {
-return flatview_to_dispatch(address_space_to_flatview(as));
+return flatview_to_dispatch(address_space_to_flatview_rcu(as));
 }
 
 FlatView *address_space_get_flatview(AddressSpace *as);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index d6fd89db64..235e3017bc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1100,6 +1100,9 @@ bool memory_region_transaction_in_progress(void);
 
 void memory_region_transaction_do_commit(void);
 
+/*
+ * We recommend using this by default.
+ */
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
 if (qemu_mutex_iothread_locked()) {
@@ -1123,6 +1126,23 @@ static inline FlatView 
*address_space_to_flatview(AddressSpace *as)
 return qatomic_rcu_read(>current_map);
 }
 
+/*
+ * We recommend using address_space_to_flatview() rather than this one.
+ * Note that if we use this during a memory region transaction, we may
+ * see obsolete flatviews. We must bear with an obsolete map until commit.
+ * And outside a memory region transaction, this is basically the same as
+ * address_space_to_flatview().
+ */
+static inline FlatView *address_space_to_flatview_rcu(AddressSpace *as)
+{
+/*
+ * Before using any flatview, sanity check BQL or RCU is held.
+ */
+assert(qemu_mutex_iothread_locked() || rcu_read_is_locked());
+
+return qatomic_rcu_read(>current_map);
+}
+
 /**
  * typedef flatview_cb: callback for flatview_for_each_range()
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 6a8e8b4e71..33d14e967d 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -815,7 +815,7 @@ FlatView *address_space_get_flatview(AddressSpace *as)
 
 RCU_READ_LOCK_GUARD();
 do {
-view = address_space_to_flatview(as);
+view = address_space_to_flatview_rcu(as);
 /* If somebody has replaced as->current_map concurrently,
  * flatview_ref returns false.
  */
-- 
2.20.1




[PATCH v7 4/6] memory: Add sanity check in address_space_to_flatview

2023-03-09 Thread Chuang Xu
Before using any flatview, sanity check whether BQL or rcu is held. And
if we're during a memory region transaction, try to immediately update
mappings, or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 23 +++
 softmmu/memory.c  |  5 +
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6fa0b071f0..d6fd89db64 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1095,8 +1096,30 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
+void memory_region_transaction_do_commit(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+if (qemu_mutex_iothread_locked()) {
+/* We exclusively own the flatview now.. */
+if (memory_region_transaction_in_progress()) {
+/*
+ * Fetch the flatview within a transaction in-progress, it
+ * means current_map may not be the latest, we need to update
+ * immediately to make sure the caller won't see obsolete
+ * mapping.
+ */
+memory_region_transaction_do_commit();
+}
+
+/* No further protection needed to access current_map */
+return as->current_map;
+}
+
+/* Otherwise we must have had the RCU lock or something went wrong */
+assert(rcu_read_is_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 33ecc62ee9..6a8e8b4e71 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1130,6 +1130,11 @@ void memory_region_transaction_commit(void)
 }
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




[PATCH v7 2/6] rcu: Introduce rcu_read_is_locked()

2023-03-09 Thread Chuang Xu
Add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 313fc414bc..7bf45602e1 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -115,6 +115,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[PATCH v7 3/6] memory: Introduce memory_region_transaction_do_commit()

2023-03-09 Thread Chuang Xu
Split memory_region_transaction_do_commit() from
memory_region_transaction_commit().

We'll call do_commit() in address_space_to_flatview() in the later patch.

Signed-off-by: Chuang Xu 
---
 softmmu/memory.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index a992a365d9..33ecc62ee9 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1093,34 +1093,41 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }
 
-void memory_region_transaction_commit(void)
+void memory_region_transaction_do_commit(void)
 {
 AddressSpace *as;
 
-assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
---memory_region_transaction_depth;
-if (!memory_region_transaction_depth) {
-if (memory_region_update_pending) {
-flatviews_reset();
+if (memory_region_update_pending) {
+flatviews_reset();
 
-MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_set_flatview(as);
-address_space_update_ioeventfds(as);
-}
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
-} else if (ioeventfd_update_pending) {
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_update_ioeventfds(as);
-}
-ioeventfd_update_pending = false;
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_set_flatview(as);
+address_space_update_ioeventfds(as);
+}
+memory_region_update_pending = false;
+ioeventfd_update_pending = false;
+MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+} else if (ioeventfd_update_pending) {
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_update_ioeventfds(as);
 }
-   }
+ioeventfd_update_pending = false;
+}
+}
+
+void memory_region_transaction_commit(void)
+{
+assert(memory_region_transaction_depth);
+assert(qemu_mutex_iothread_locked());
+
+--memory_region_transaction_depth;
+if (!memory_region_transaction_depth) {
+memory_region_transaction_do_commit();
+}
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
2.20.1




[PATCH v7 1/6] memory: Reference as->current_map directly in memory commit

2023-03-09 Thread Chuang Xu
From: Peter Xu 

Calling RCU variance of address_space_get|to_flatview() during memory
commit (flatview updates, triggering memory listeners, or updating
ioeventfds, etc.) is not 100% accurate, because commit() requires BQL
rather than RCU read lock, so the context exclusively owns current_map and
can be directly referenced.

Neither does it need a refcount to current_map because it cannot be freed
from under the caller.

Add address_space_get_flatview_raw() for the case where the context holds
BQL rather than RCU read lock and use it across the core memory updates,
Drop the extra refcounts on FlatView*.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 4699ba55ec..a992a365d9 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -61,6 +61,13 @@ struct AddrRange {
 Int128 size;
 };
 
+/* Called with BQL held */
+static inline FlatView *address_space_to_flatview_raw(AddressSpace *as)
+{
+assert(qemu_mutex_iothread_locked());
+return as->current_map;
+}
+
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
 return (AddrRange) { start, size };
@@ -155,7 +162,7 @@ enum ListenerDirection { Forward, Reverse };
 #define MEMORY_LISTENER_UPDATE_REGION(fr, as, dir, callback, _args...)  \
 do {\
 MemoryRegionSection mrs = section_from_flat_range(fr,   \
-address_space_to_flatview(as)); \
+address_space_to_flatview_raw(as)); \
 MEMORY_LISTENER_CALL(as, callback, dir, , ##_args); \
 } while(0)
 
@@ -753,6 +760,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 }
 
 static void address_space_add_del_ioeventfds(AddressSpace *as,
+ FlatView *view,
  MemoryRegionIoeventfd *fds_new,
  unsigned fds_new_nb,
  MemoryRegionIoeventfd *fds_old,
@@ -774,7 +782,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
   _new[inew]))) {
 fd = _old[iold];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -787,7 +795,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
  _old[iold]))) {
 fd = _new[inew];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -833,7 +841,7 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 ioeventfd_max = QEMU_ALIGN_UP(as->ioeventfd_nb, 4);
 ioeventfds = g_new(MemoryRegionIoeventfd, ioeventfd_max);
 
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
 tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
@@ -852,13 +860,12 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 }
 }
 
-address_space_add_del_ioeventfds(as, ioeventfds, ioeventfd_nb,
+address_space_add_del_ioeventfds(as, view, ioeventfds, ioeventfd_nb,
  as->ioeventfds, as->ioeventfd_nb);
 
 g_free(as->ioeventfds);
 as->ioeventfds = ioeventfds;
 as->ioeventfd_nb = ioeventfd_nb;
-flatview_unref(view);
 }
 
 /*
@@ -1026,7 +1033,7 @@ static void flatviews_reset(void)
 
 static void address_space_set_flatview(AddressSpace *as)
 {
-FlatView *old_view = address_space_to_flatview(as);
+FlatView *old_view = address_space_to_flatview_raw(as);
 MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
 FlatView *new_view = g_hash_table_lookup(flat_views, physmr);
 
@@ -2979,8 +2986,7 @@ static void listener_add_address_space(MemoryListener 
*listener,
 listener->log_global_start(listener);
 }
 }
-
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 MemoryRegionSection section = section_from_flat_range(fr, view);
 
@@ -2994,7 +3000,6 @@ static void listener_add_address_space(MemoryListener 
*listener,
 if (listener->commit) {
 listener->commit(listener);
 }
-flatview_unref(view);
 }
 
 static void listener_del_address_space(MemoryListener *listener,
@@ 

[PATCH v7 5/6] migration: Reduce time of loading non-iterable vmstate

2023-03-09 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Note that the following test results are based on the application of the
next patch. Without the next patch, the improvement will be reduced.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 112 ms  285 ms
after   about 20 ms   194 ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 65 ms  about 151 ms

after   about 19 ms  about 100 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 24 ms  about 51 ms
after   about 9 ms   about 36 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index aa54a67fda..9a7d3e40d6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2762,6 +2762,7 @@ out:
 goto retry;
 }
 }
+
 return ret;
 }
 
@@ -2787,7 +2788,25 @@ int qemu_loadvm_state(QEMUFile *f)
 
 cpu_synchronize_all_pre_loadvm();
 
+/*
+ * Call memory_region_transaction_begin() before loading vmstate.
+ * This call is paired with memory_region_transaction_commit() at
+ * the end of qemu_loadvm_state_main(), in order to pack all the
+ * changes to memory region during the period of loading
+ * non-iterable vmstate in a single memory transaction.
+ * This operation will reduce time of loading non-iterable vmstate
+ */
+memory_region_transaction_begin();
+
 ret = qemu_loadvm_state_main(f, mis);
+
+/*
+ * Call memory_region_transaction_commit() after loading vmstate.
+ * At this point, qemu actually completes all the previous memory
+ * region transactions.
+ */
+memory_region_transaction_commit();
+
 qemu_event_set(>main_thread_load_event);
 
 trace_qemu_loadvm_state_post_main(ret);
-- 
2.20.1




[PATCH v7 0/6] migration: reduce time of loading non-iterable vmstate

2023-03-09 Thread Chuang Xu


In this version:

- introduce address_space_to_flatview_rcu()
- squash peter's fix into patch 1
- rebase to latest upstream
- update test results

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   112 ms   285 ms
after20 ms194 ms


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   65 ms151 ms

after19 ms100 ms


In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   24 ms51 ms
after9 ms 36 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang

[v6]

- add peter's patch.
- split mr_do_commit() from mr_commit().
- adjust the sanity check in address_space_to_flatview().
- rebase to latest upstream.
- replace 8260 with 8362 as testing host.
- update the latest test results.

[v5]

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms






Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-09 Thread Chuang Xu

Hi, Peter,

On 2023/3/8 下午11:46, Peter Xu wrote:

1. squash fix into patch1 of yours.
2. introduce address_space_to_flatview_rcu()
3. add specific comment to define when to use which as_to_flat()

This can be together with 2).

We should suggest using address_space_to_flatview() by default in the
comment, and only use _rcu() with cautions e.g. we can mention commit()
hooks as example, and also mention the possibility of seeing very old (or
purely empty flatview) if during vm load.  In that sense this can be the
last patch of your set so there's the vm load context to reference.

I hope there'll be no outliers that takes only RCU (no bql) but still
expect a very new flatview then it'll crash easily if called in a vm load.
But let's see..  I assume your test cases are already a much larger set so
covers a lot of code paths already.


4. Does enforce commit() need further modification or keep current status?
Looks like you have some new thoughts on it?

I don't.

PS: I do have some thoughts but I don't think I mentioned them..  My
thoughts were that we can actually avoid calling begin()/commit()/... hooks
during a nested do_commit() at all but only update current_map.  That'll
further avoid the _rcu() patch to be introduced, but I think that needs
more changes and may not be necessary at all.  Ignore this.


Got it.


Are there any other missing points?

No from my side.

Note that 8.0 reached soft freeze.  Sorry to say so, but it seems this work
will only be possible (if no further objections coming) for 8.1 merge
windows, so the early merge will be after middle of Apirl.  Thanks for
being consistent with it already so far.


I also want to thank you for your long-term guidance and suggestions for
this series.

To tell the truth, as a new comer, this is the first patch I try to send
to the community. Your active discussion in the emails makes me feel that
I am doing something really meaningful, so I am willing to continuously devote
my energy to participate in the discussion, and at the same time, I benefit
a lot from the discussion with you.

Thanks!




Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-08 Thread Chuang Xu

Hi, Peter,

On 2023/3/8 下午10:58, Peter Xu wrote:

On Wed, Mar 08, 2023 at 06:03:45AM -0800, Chuang Xu wrote:

IIUC, Do you mean that different ways to get flatview are tricky?

Yes, and properly define when to use which.


As you said, it's slightly beyond what this series does. Maybe it would be
better if we discuss it in a new series and keep this series at v6?
what's your take?

Quotting your test result:

 time of loading non-iterable vmstate
before  112 ms
long's patch applied103 ms
my patch applied 44 ms
both applied 39 ms
add as_to_flat_rcu   19 ms

If introducing address_space_to_flatview_rcu() can further half the time,
maybe still worth it?

The thing is the extra _rcu() doesn't bring the major complexity, IMHO.  It
brings some on identifying which is really safe to not reference a latest
flatview (it seems to me only during a commit() hook..).

The major complexity still comes from the nested enforced commit() during
address_space_to_flatview() but that is already in the patchset.

Thanks,


OK, let me continue to finish v7.

Here I list some TODOs in v7:

1. squash fix into patch1 of yours.
2. introduce address_space_to_flatview_rcu()
3. add specific comment to define when to use which as_to_flat()
4. Does enforce commit() need further modification or keep current status?
   Looks like you have some new thoughts on it?

Are there any other missing points?

Thanks!




Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-08 Thread Chuang Xu
Hi, Peter,

On 2023/3/8 上午1:04, Peter Xu wrote:
> On Tue, Mar 07, 2023 at 09:24:31PM +0800, Chuang Xu wrote:
>>> Why do we need address_space_get_flatview_rcu()? I'm not sure whether
you
>> address_space_cahce_init() uses address_space_get_flatview() to acquire
>> a ref-ed flatview. If we want to use address_space_to_flatview_rcu() and
>> make the flatview ref-ed, maybe we need to add
address_space_get_flatview_rcu()?
>> Or if we use address_space_to_flatview_rcu() directly, then we'll get a
>> unref-ed flatview, I'm not sure wheather it's safe in other cases.. At
least
>> I don't want the assertion to be triggered one day.
> No we can't touch that, afaict. It was a real fix (447b0d0b9e) to a real
> bug.
>
> What I meant is we make address_space_get_flatview() always use
> address_space_to_flatview_rcu().

This time I clearly understand what you mean.

>
> I think it's safe, if you see the current callers of it (after my patch 1
> fixed version applied):
>
> memory_region_sync_dirty_bitmap[2249] view =
address_space_get_flatview(as);
> memory_region_update_coalesced_range[2457] view =
address_space_get_flatview(as);
> memory_region_clear_dirty_bitmap[2285] view =
address_space_get_flatview(as);
> mtree_info_flatview[3418] view = address_space_get_flatview(as);
> address_space_cache_init[3350] cache->fv =
address_space_get_flatview(as);
>
> Case 5 is what we're targeting for which I think it's fine. Logically any
> commit() hook should just use address_space_get_flatview_raw() to
reference
> the flatview, but at least address_space_cache_init() is also called in
> non-BQL sections, so _rcu is the only thing we can use here, iiuc..
>
> Case 1-3 is never called during vm load, so I think it's safe.
>
> Case 4 can be accessing stalled flatview but I assume fine since it's
just
> for debugging. E.g. if someone tries "info mtree -f" during vm loading
> after your patch applied, then one can see stalled info. I don't think it
> can even happen because so far "info mtree" should also take BQL.. so
it'll
> be blocked until vm load completes.
>
> The whole thing is still tricky. I didn't yet make my mind through on how

IIUC, Do you mean that different ways to get flatview are tricky?

> to make it very clean, I think it's slightly beyond what this series does
> and I need some more thinkings (or suggestions from others).
>
As you said, it's slightly beyond what this series does. Maybe it would be
better if we discuss it in a new series and keep this series at v6?
what's your take?

Thanks!


Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-07 Thread Chuang Xu

Hi, Peter,

On 2023/3/7 上午4:48, Peter Xu wrote:

On Mon, Mar 06, 2023 at 08:48:05PM +0800, Chuang Xu wrote:

Hi, Peter,

On 2023/3/6 上午6:05, Peter Xu wrote:

1.virtio_load->virtio_init_region_cache
2.virtio_load->virtio_set_features_nocheck

What is this one specifically?  I failed to see quickly why this needs to
reference the flatview.

0x560f3bb26510 : memory_region_transaction_do_commit+0x0/0x1c0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bb26e45 : address_space_get_flatview+0x95/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bb296b6 : memory_listener_register+0xf6/0x300 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baec59f : virtio_device_realize+0x12f/0x1a0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb9173 : object_property_set+0x73/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b9830d9 : virtio_pci_realize+0x299/0x4e0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b901204 : pci_qdev_realize+0x7e4/0x1090 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb9173 : object_property_set+0x73/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b99e4a0 : qdev_device_add_from_qdict+0xb00/0xc40 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bac0c75 : virtio_net_set_features+0x385/0x490 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baec238 : virtio_set_features_nocheck+0x58/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baf1e9e : virtio_load+0x33e/0x820 
[/data00/migration/qemu-open/build/qemu-system-x86_64]

I think this one can also be avoided.  Basically any memory listener
related op can avoid referencing the latest flatview because even if it's
during depth>0 it'll be synced again when depth==0.


3.vapic_post_load

Same confusion to this one..

0x557a57b0e510 : memory_region_transaction_do_commit+0x0/0x1c0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57b0e9b5 : memory_region_find_rcu+0x2e5/0x310 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57b11165 : memory_region_find+0x55/0xf0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57a07638 : vapic_prepare+0x58/0x250 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57a07a7b : vapic_post_load+0x1b/0x80 
[/data00/migration/qemu-open/build/qemu-system-x86_64]

AFAIU this one is the other one that truly need to reference the latest
flatview; the other one is (5) on AHCI.  It's a pity that it uses
memory_region_find_rcu() even if it must already have BQL so it's kind of
unclear (and enforced commit should never need to happen with RCU
logically..).


4.tcg_commit

This is not enabled if kvm is on, right?

Yes.

I obtained these results from qtests. And some qtests enabled tcg.


5.ahci_state_post_load

During my test, virtio_init_region_cache() will frequently trigger
do_commit() in address_space_to_flatview(), which will reduce the
optimization  effect of v6 compared with v1.

IIU above 1 & 4 could leverage one address_space_to_flatview_rcu() which
can keep the old semantics of address_space_to_flatview() by just assert
rcu read lock and do qatomic_rcu_read() (e.g., tcg_commit() will run again
at last stage of vm load).  Not sure how much it'll help.

This can really improve the performance of the existing patch, which is
basically the same as that of v1, but it needs to add 
address_space_to_flatview_rcu()
and address_space_get_flatview_rcu(). I have also considered this, but will
others find it confusing? Because there will be to_flatview(), to_flatview_raw()
and to_flatview_rcu()..

Why do we need address_space_get_flatview_rcu()?  I'm not sure whether you


address_space_cahce_init() uses address_space_get_flatview() to acquire
a ref-ed flatview. If we want to use address_space_to_flatview_rcu() and
make the flatview ref-ed, maybe we need to add address_space_get_flatview_rcu()?
Or if we use address_space_to_flatview_rcu() directly, then we'll get a
unref-ed flatview, I'm not sure wheather it's safe in other cases.. At least
I don't want 

Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-06 Thread Chuang Xu

Hi, Peter,

On 2023/3/6 上午6:05, Peter Xu wrote:

Hi, Chuang,

On Fri, Mar 03, 2023 at 06:56:50PM +0800, Chuang Xu wrote:

Sorry to forget to update the test results in the last patch of v6.

In this version:

- add peter's patch.
- split mr_do_commit() from mr_commit().
- adjust the sanity check in address_space_to_flatview().
- rebase to latest upstream.
- replace 8260 with 8362 as testing host.
- update the latest test results.

Here I list some cases which will trigger do_commit() in 
address_space_to_flatview():


I ran qtest cases and used systemtap to trace those do_commit().



1.virtio_load->virtio_init_region_cache
2.virtio_load->virtio_set_features_nocheck

What is this one specifically?  I failed to see quickly why this needs to
reference the flatview.


0x560f3bb26510 : memory_region_transaction_do_commit+0x0/0x1c0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bb26e45 : address_space_get_flatview+0x95/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bb296b6 : memory_listener_register+0xf6/0x300 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baec59f : virtio_device_realize+0x12f/0x1a0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb9173 : object_property_set+0x73/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b9830d9 : virtio_pci_realize+0x299/0x4e0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b901204 : pci_qdev_realize+0x7e4/0x1090 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb9173 : object_property_set+0x73/0x100 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3b99e4a0 : qdev_device_add_from_qdict+0xb00/0xc40 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3bac0c75 : virtio_net_set_features+0x385/0x490 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baec238 : virtio_set_features_nocheck+0x58/0x70 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x560f3baf1e9e : virtio_load+0x33e/0x820 
[/data00/migration/qemu-open/build/qemu-system-x86_64]




3.vapic_post_load

Same confusion to this one..


0x557a57b0e510 : memory_region_transaction_do_commit+0x0/0x1c0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57b0e9b5 : memory_region_find_rcu+0x2e5/0x310 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57b11165 : memory_region_find+0x55/0xf0 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57a07638 : vapic_prepare+0x58/0x250 
[/data00/migration/qemu-open/build/qemu-system-x86_64]
0x557a57a07a7b : vapic_post_load+0x1b/0x80 
[/data00/migration/qemu-open/build/qemu-system-x86_64]




4.tcg_commit

This is not enabled if kvm is on, right?


Yes.

I obtained these results from qtests. And some qtests enabled tcg.




5.ahci_state_post_load

During my test, virtio_init_region_cache() will frequently trigger
do_commit() in address_space_to_flatview(), which will reduce the
optimization  effect of v6 compared with v1.

IIU above 1 & 4 could leverage one address_space_to_flatview_rcu() which
can keep the old semantics of address_space_to_flatview() by just assert
rcu read lock and do qatomic_rcu_read() (e.g., tcg_commit() will run again
at last stage of vm load).  Not sure how much it'll help.


This can really improve the performance of the existing patch, which is
basically the same as that of v1, but it needs to add 
address_space_to_flatview_rcu()
and address_space_get_flatview_rcu(). I have also considered this, but will
others find it confusing? Because there will be to_flatview(), to_flatview_raw()
and to_flatview_rcu()..



You may also want to have a look at the other patch to optimize ioeventfd
commit here; I think that'll also speed up vm load but not sure how much
your series can further do upon:

https://lore.kernel.org/all/20230228142514.2582-1-longpe...@huawei.com/

Thanks,


Here are the latest test results:

test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of l

[PATCH RESEND v6 5/5] migration: Reduce time of loading non-iterable vmstate

2023-03-03 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 112 ms  285 ms
after   about 44 ms   208 ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 65 ms  about 151 ms

after   about 30 ms  about 110 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 24 ms  about 51 ms
after   about 12 ms  about 38 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index b5e6962bb6..3dd9daabd8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2770,6 +2770,7 @@ out:
 goto retry;
 }
 }
+
 return ret;
 }
 
@@ -2795,7 +2796,25 @@ int qemu_loadvm_state(QEMUFile *f)
 
 cpu_synchronize_all_pre_loadvm();
 
+/*
+ * Call memory_region_transaction_begin() before loading vmstate.
+ * This call is paired with memory_region_transaction_commit() at
+ * the end of qemu_loadvm_state_main(), in order to pack all the
+ * changes to memory region during the period of loading
+ * non-iterable vmstate in a single memory transaction.
+ * This operation will reduce time of loading non-iterable vmstate
+ */
+memory_region_transaction_begin();
+
 ret = qemu_loadvm_state_main(f, mis);
+
+/*
+ * Call memory_region_transaction_commit() after loading vmstate.
+ * At this point, qemu actually completes all the previous memory
+ * region transactions.
+ */
+memory_region_transaction_commit();
+
 qemu_event_set(>main_thread_load_event);
 
 trace_qemu_loadvm_state_post_main(ret);
-- 
2.20.1




[PATCH RESEND v6 3/5] memory: Introduce memory_region_transaction_do_commit()

2023-03-03 Thread Chuang Xu
Split memory_region_transaction_do_commit() from
memory_region_transaction_commit().

We'll call do_commit() in address_space_to_flatview() in the later patch.

Signed-off-by: Chuang Xu 
---
 softmmu/memory.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 213496802b..b89abf400e 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1093,34 +1093,41 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }
 
-void memory_region_transaction_commit(void)
+void memory_region_transaction_do_commit(void)
 {
 AddressSpace *as;
 
-assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
---memory_region_transaction_depth;
-if (!memory_region_transaction_depth) {
-if (memory_region_update_pending) {
-flatviews_reset();
+if (memory_region_update_pending) {
+flatviews_reset();
 
-MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_set_flatview(as);
-address_space_update_ioeventfds(as);
-}
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
-} else if (ioeventfd_update_pending) {
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_update_ioeventfds(as);
-}
-ioeventfd_update_pending = false;
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_set_flatview(as);
+address_space_update_ioeventfds(as);
+}
+memory_region_update_pending = false;
+ioeventfd_update_pending = false;
+MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+} else if (ioeventfd_update_pending) {
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_update_ioeventfds(as);
 }
-   }
+ioeventfd_update_pending = false;
+}
+}
+
+void memory_region_transaction_commit(void)
+{
+assert(memory_region_transaction_depth);
+assert(qemu_mutex_iothread_locked());
+
+--memory_region_transaction_depth;
+if (!memory_region_transaction_depth) {
+memory_region_transaction_do_commit();
+}
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
2.20.1




[PATCH RESEND v6 1/5] memory: Reference as->current_map directly in memory commit

2023-03-03 Thread Chuang Xu
From: Peter Xu 

Calling RCU variance of address_space_get|to_flatview() during memory
commit (flatview updates, triggering memory listeners, or updating
ioeventfds, etc.) is not 100% accurate, because commit() requires BQL
rather than RCU read lock, so the context exclusively owns current_map and
can be directly referenced.

Neither does it need a refcount to current_map because it cannot be freed
from under the caller.

Add address_space_get_flatview_raw() for the case where the context holds
BQL rather than RCU read lock and use it across the core memory updates,
Drop the extra refcounts on FlatView*.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..213496802b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -61,6 +61,13 @@ struct AddrRange {
 Int128 size;
 };
 
+/* Called with BQL held */
+static inline FlatView *address_space_to_flatview_raw(AddressSpace *as)
+{
+assert(qemu_mutex_iothread_locked());
+return as->current_map;
+}
+
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
 return (AddrRange) { start, size };
@@ -155,7 +162,7 @@ enum ListenerDirection { Forward, Reverse };
 #define MEMORY_LISTENER_UPDATE_REGION(fr, as, dir, callback, _args...)  \
 do {\
 MemoryRegionSection mrs = section_from_flat_range(fr,   \
-address_space_to_flatview(as)); \
+address_space_to_flatview_raw(as)); \
 MEMORY_LISTENER_CALL(as, callback, dir, , ##_args); \
 } while(0)
 
@@ -753,6 +760,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 }
 
 static void address_space_add_del_ioeventfds(AddressSpace *as,
+ FlatView *view,
  MemoryRegionIoeventfd *fds_new,
  unsigned fds_new_nb,
  MemoryRegionIoeventfd *fds_old,
@@ -774,7 +782,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
   _new[inew]))) {
 fd = _old[iold];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -787,7 +795,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
  _old[iold]))) {
 fd = _new[inew];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -833,7 +841,7 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 ioeventfd_max = QEMU_ALIGN_UP(as->ioeventfd_nb, 4);
 ioeventfds = g_new(MemoryRegionIoeventfd, ioeventfd_max);
 
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
 tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
@@ -852,13 +860,12 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 }
 }
 
-address_space_add_del_ioeventfds(as, ioeventfds, ioeventfd_nb,
+address_space_add_del_ioeventfds(as, view, ioeventfds, ioeventfd_nb,
  as->ioeventfds, as->ioeventfd_nb);
 
 g_free(as->ioeventfds);
 as->ioeventfds = ioeventfds;
 as->ioeventfd_nb = ioeventfd_nb;
-flatview_unref(view);
 }
 
 /*
@@ -1026,7 +1033,7 @@ static void flatviews_reset(void)
 
 static void address_space_set_flatview(AddressSpace *as)
 {
-FlatView *old_view = address_space_to_flatview(as);
+FlatView *old_view = address_space_to_flatview_raw(as);
 MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
 FlatView *new_view = g_hash_table_lookup(flat_views, physmr);
 
-- 
2.20.1




[PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-03 Thread Chuang Xu



Sorry to forget to update the test results in the last patch of v6.

In this version:

- add peter's patch.
- split mr_do_commit() from mr_commit().
- adjust the sanity check in address_space_to_flatview().
- rebase to latest upstream.
- replace 8260 with 8362 as testing host.
- update the latest test results.

Here I list some cases which will trigger do_commit() in 
address_space_to_flatview():

1.virtio_load->virtio_init_region_cache
2.virtio_load->virtio_set_features_nocheck
3.vapic_post_load
4.tcg_commit
5.ahci_state_post_load

During my test, virtio_init_region_cache() will frequently trigger
do_commit() in address_space_to_flatview(), which will reduce the
optimization  effect of v6 compared with v1.



The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

This time I replace 8260 with 8362 as testing host, use latest spdk as
vhost-user-blk backend. The downtime results are different from the previous,
but it doesn't affect the improvement comparison of loading vmstate.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   112 ms   285 ms
after44 ms208 ms


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   65 ms151 ms

after30 ms110 ms


In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   24 ms51 ms
after12 ms38 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang

[v5]

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms






[PATCH RESEND v6 2/5] rcu: Introduce rcu_read_is_locked()

2023-03-03 Thread Chuang Xu
Add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..719916d9d3 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[PATCH RESEND v6 4/5] memory: Add sanity check in address_space_to_flatview

2023-03-03 Thread Chuang Xu
Before using any flatview, sanity check whether BQL or rcu is held. And
if we're during a memory region transaction, try to immediately update
mappings, or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 23 +++
 softmmu/memory.c  |  5 +
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..84b531c6ff 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1095,8 +1096,30 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
+void memory_region_transaction_do_commit(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+if (qemu_mutex_iothread_locked()) {
+/* We exclusively own the flatview now.. */
+if (memory_region_transaction_in_progress()) {
+/*
+ * Fetch the flatview within a transaction in-progress, it
+ * means current_map may not be the latest, we need to update
+ * immediately to make sure the caller won't see obsolete
+ * mapping.
+ */
+memory_region_transaction_do_commit();
+}
+
+/* No further protection needed to access current_map */
+return as->current_map;
+}
+
+/* Otherwise we must have had the RCU lock or something went wrong */
+assert(rcu_read_is_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b89abf400e..1834e14cc8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1130,6 +1130,11 @@ void memory_region_transaction_commit(void)
 }
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




[PATCH v6 1/5] memory: Reference as->current_map directly in memory commit

2023-03-03 Thread Chuang Xu
From: Peter Xu 

Calling RCU variance of address_space_get|to_flatview() during memory
commit (flatview updates, triggering memory listeners, or updating
ioeventfds, etc.) is not 100% accurate, because commit() requires BQL
rather than RCU read lock, so the context exclusively owns current_map and
can be directly referenced.

Neither does it need a refcount to current_map because it cannot be freed
from under the caller.

Add address_space_get_flatview_raw() for the case where the context holds
BQL rather than RCU read lock and use it across the core memory updates,
Drop the extra refcounts on FlatView*.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..213496802b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -61,6 +61,13 @@ struct AddrRange {
 Int128 size;
 };
 
+/* Called with BQL held */
+static inline FlatView *address_space_to_flatview_raw(AddressSpace *as)
+{
+assert(qemu_mutex_iothread_locked());
+return as->current_map;
+}
+
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
 return (AddrRange) { start, size };
@@ -155,7 +162,7 @@ enum ListenerDirection { Forward, Reverse };
 #define MEMORY_LISTENER_UPDATE_REGION(fr, as, dir, callback, _args...)  \
 do {\
 MemoryRegionSection mrs = section_from_flat_range(fr,   \
-address_space_to_flatview(as)); \
+address_space_to_flatview_raw(as)); \
 MEMORY_LISTENER_CALL(as, callback, dir, , ##_args); \
 } while(0)
 
@@ -753,6 +760,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 }
 
 static void address_space_add_del_ioeventfds(AddressSpace *as,
+ FlatView *view,
  MemoryRegionIoeventfd *fds_new,
  unsigned fds_new_nb,
  MemoryRegionIoeventfd *fds_old,
@@ -774,7 +782,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
   _new[inew]))) {
 fd = _old[iold];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -787,7 +795,7 @@ static void address_space_add_del_ioeventfds(AddressSpace 
*as,
  _old[iold]))) {
 fd = _new[inew];
 section = (MemoryRegionSection) {
-.fv = address_space_to_flatview(as),
+.fv = view,
 .offset_within_address_space = int128_get64(fd->addr.start),
 .size = fd->addr.size,
 };
@@ -833,7 +841,7 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 ioeventfd_max = QEMU_ALIGN_UP(as->ioeventfd_nb, 4);
 ioeventfds = g_new(MemoryRegionIoeventfd, ioeventfd_max);
 
-view = address_space_get_flatview(as);
+view = address_space_to_flatview_raw(as);
 FOR_EACH_FLAT_RANGE(fr, view) {
 for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
 tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
@@ -852,13 +860,12 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 }
 }
 
-address_space_add_del_ioeventfds(as, ioeventfds, ioeventfd_nb,
+address_space_add_del_ioeventfds(as, view, ioeventfds, ioeventfd_nb,
  as->ioeventfds, as->ioeventfd_nb);
 
 g_free(as->ioeventfds);
 as->ioeventfds = ioeventfds;
 as->ioeventfd_nb = ioeventfd_nb;
-flatview_unref(view);
 }
 
 /*
@@ -1026,7 +1033,7 @@ static void flatviews_reset(void)
 
 static void address_space_set_flatview(AddressSpace *as)
 {
-FlatView *old_view = address_space_to_flatview(as);
+FlatView *old_view = address_space_to_flatview_raw(as);
 MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
 FlatView *new_view = g_hash_table_lookup(flat_views, physmr);
 
-- 
2.20.1




[PATCH v6 2/5] rcu: Introduce rcu_read_is_locked()

2023-03-03 Thread Chuang Xu
Add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..719916d9d3 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[PATCH v6 0/5] migration: reduce time of loading non-iterable vmstate

2023-03-03 Thread Chuang Xu



In this version:

- add peter's patch.
- split mr_do_commit() from mr_commit().
- adjust the sanity check in address_space_to_flatview().
- rebase to latest upstream.
- replace 8260 with 8362 as testing host.
- update the latest test results.

Here I list some cases which will trigger do_commit() in 
address_space_to_flatview():

1.virtio_load->virtio_init_region_cache
2.virtio_load->virtio_set_features_nocheck
3.vapic_post_load
4.tcg_commit
5.ahci_state_post_load

During my test, virtio_init_region_cache() will frequently trigger
do_commit() in address_space_to_flatview(), which will reduce the
optimization  effect of v6 compared with v1.



The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

This time I replace 8260 with 8362 as testing host, use latest spdk as
vhost-user-blk backend. The downtime results are different from the previous,
but it doesn't affect the improvement comparison of loading vmstate.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   112 ms   285 ms
after44 ms208 ms


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   65 ms151 ms

after30 ms110 ms


In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8362 CPU
  - Mellanox Technologies MT28841
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before   24 ms51 ms
after12 ms38 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang

[v5]

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms






[PATCH v6 5/5] migration: Reduce time of loading non-iterable vmstate

2023-03-03 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index b5e6962bb6..3dd9daabd8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2770,6 +2770,7 @@ out:
 goto retry;
 }
 }
+
 return ret;
 }
 
@@ -2795,7 +2796,25 @@ int qemu_loadvm_state(QEMUFile *f)
 
 cpu_synchronize_all_pre_loadvm();
 
+/*
+ * Call memory_region_transaction_begin() before loading vmstate.
+ * This call is paired with memory_region_transaction_commit() at
+ * the end of qemu_loadvm_state_main(), in order to pack all the
+ * changes to memory region during the period of loading
+ * non-iterable vmstate in a single memory transaction.
+ * This operation will reduce time of loading non-iterable vmstate
+ */
+memory_region_transaction_begin();
+
 ret = qemu_loadvm_state_main(f, mis);
+
+/*
+ * Call memory_region_transaction_commit() after loading vmstate.
+ * At this point, qemu actually completes all the previous memory
+ * region transactions.
+ */
+memory_region_transaction_commit();
+
 qemu_event_set(>main_thread_load_event);
 
 trace_qemu_loadvm_state_post_main(ret);
-- 
2.20.1




[PATCH v6 4/5] memory: Add sanity check in address_space_to_flatview

2023-03-03 Thread Chuang Xu
Before using any flatview, sanity check whether BQL or rcu is held. And
if we're during a memory region transaction, try to immediately update
mappings, or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 23 +++
 softmmu/memory.c  |  5 +
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..84b531c6ff 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1095,8 +1096,30 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
+void memory_region_transaction_do_commit(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+if (qemu_mutex_iothread_locked()) {
+/* We exclusively own the flatview now.. */
+if (memory_region_transaction_in_progress()) {
+/*
+ * Fetch the flatview within a transaction in-progress, it
+ * means current_map may not be the latest, we need to update
+ * immediately to make sure the caller won't see obsolete
+ * mapping.
+ */
+memory_region_transaction_do_commit();
+}
+
+/* No further protection needed to access current_map */
+return as->current_map;
+}
+
+/* Otherwise we must have had the RCU lock or something went wrong */
+assert(rcu_read_is_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b89abf400e..1834e14cc8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1130,6 +1130,11 @@ void memory_region_transaction_commit(void)
 }
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




[PATCH v6 3/5] memory: Introduce memory_region_transaction_do_commit()

2023-03-03 Thread Chuang Xu
Split memory_region_transaction_do_commit() from
memory_region_transaction_commit().

We'll call do_commit() in address_space_to_flatview() in the later patch.

Signed-off-by: Chuang Xu 
---
 softmmu/memory.c | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 213496802b..b89abf400e 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1093,34 +1093,41 @@ void memory_region_transaction_begin(void)
 ++memory_region_transaction_depth;
 }
 
-void memory_region_transaction_commit(void)
+void memory_region_transaction_do_commit(void)
 {
 AddressSpace *as;
 
-assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
---memory_region_transaction_depth;
-if (!memory_region_transaction_depth) {
-if (memory_region_update_pending) {
-flatviews_reset();
+if (memory_region_update_pending) {
+flatviews_reset();
 
-MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_set_flatview(as);
-address_space_update_ioeventfds(as);
-}
-memory_region_update_pending = false;
-ioeventfd_update_pending = false;
-MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
-} else if (ioeventfd_update_pending) {
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_update_ioeventfds(as);
-}
-ioeventfd_update_pending = false;
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_set_flatview(as);
+address_space_update_ioeventfds(as);
+}
+memory_region_update_pending = false;
+ioeventfd_update_pending = false;
+MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+} else if (ioeventfd_update_pending) {
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+address_space_update_ioeventfds(as);
 }
-   }
+ioeventfd_update_pending = false;
+}
+}
+
+void memory_region_transaction_commit(void)
+{
+assert(memory_region_transaction_depth);
+assert(qemu_mutex_iothread_locked());
+
+--memory_region_transaction_depth;
+if (!memory_region_transaction_depth) {
+memory_region_transaction_do_commit();
+}
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
2.20.1




Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-27 Thread Chuang Xu

Hi, Peter

On 2023/2/25 下午11:32, Peter Xu wrote:

On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:

Hi, Peter

Hi, Chuang,


On 2023/2/22 下午11:57, Peter Xu wrote:

I don't see why it's wrong, and that's exactly what I wanted to guarantee,
that if memory_region_update_pending==true when updating ioeventfd, we may
have some serios problem.

For this, I split my patch into two parts and I put this change into the
last patch.  See the attachment.  If you worry about this, you can e.g. try
applying patch 1 only later, but I still don't see why it could.


Sorry, I made some mistakes in my previous understanding of the code. However,
if this assertion is added, it will indeed trigger some coredump in qtest with
my patches. Here is the coredump(This is the only one I found):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7fa5e7b17535 in __GI_abort () at abort.c:79
#2  0x7fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n",
assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", 
file=0x56305fc028cb "../softmmu/memory.c", line=855, function=) at assert.c:92
#3  0x7fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 
"qemu_mutex_iothread_locked() && !memory_region_update_pending",
file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, 
function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> 
"address_space_update_ioeventfds")
at assert.c:101
#4  0x56305f8a9194 in address_space_update_ioeventfds 
(as=as@entry=0x563061293648) at ../softmmu/memory.c:855
#5  0x56305f8afe4f in address_space_init (as=as@entry=0x563061293648, 
root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 
"virtio-net-pci") at ../softmmu/memory.c:3172
#6  0x56305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at 
../hw/pci/pci.c:1145
#7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at 
../hw/pci/pci.c:2036
#8  0x56305f939daf in device_set_realized (obj=, value=true, 
errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
#9  0x56305f93d156 in property_set_bool (obj=0x563061293410, v=, 
name=, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at 
../qom/object.c:2285
#10 0x56305f93f403 in object_property_set (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, 
errp=errp@entry=0x7fa5e4f39ae0)
at ../qom/object.c:1420
#11 0x56305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0,
errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
#12 0x56305f93f674 in object_property_set_bool (obj=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, 
errp=errp@entry=0x7fa5e4f39ae0)
at ../qom/object.c:1489
#13 0x56305f93959c in qdev_realize (dev=, 
bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292
#14 0x56305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, 
from_json=, errp=, errp@entry=0x7fa5e4f39ae0)
at /data00/migration/qemu-open/include/hw/qdev-core.h:17
#15 0x56305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, 
n=0x563062043530) at ../hw/net/virtio-net.c:933
#16 virtio_net_set_features (vdev=, 
features=4611687122246533156) at ../hw/net/virtio-net.c:1004
#17 0x56305f872238 in virtio_set_features_nocheck 
(vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at 
../hw/virtio/virtio.c:2851
#18 0x56305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, 
version_id=11) at ../hw/virtio/virtio.c:3027
#19 0x56305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, 
vmsd=0x56305fef16c0 , opaque=0x563062043530, version_id=11) 
at ../migration/vmstate.c:137
#20 0x56305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at 
../migration/savevm.c:919
#21 0x56305f757905 in qemu_loadvm_section_start_full 
(f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
#22 0x56305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, 
mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
#23 0x56305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at 
../migration/savevm.c:2809
#24 0x56305f74980e in process_incoming_migration_co (opaque=) at ../migration/migration.c:606
#25 0x56305fab25cb in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:177


I really think changing depth is a hack... :(

Here IMHO the problem is we have other missing calls to
address_space_to_flatview() during commit() and that caused the issue.
There aren't a lot of those, and sorry to 

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-22 Thread Chuang Xu

Hi, Peter

On 2023/2/22 下午11:57, Peter Xu wrote:

On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote:

Hi, Peter

Hi, Chuang,


Note that as I mentioned in the comment, we temporarily replace this value
to prevent commit() and address_space_to_flatview() call each other recursively,
and eventually stack overflow.

Sorry to have overlooked that part.  IMHO here it's not about the depth,
but rather that we don't even need any RCU protection when updating
ioeventfds because we exclusively own the FlatView* too there.

I wanted to describe what I had in mind but instead I figured a patch may
be needed to be accurate (with some cleanups alongside), hence attached.
IIUC it can work with what I suggested before without fiddling with depth.
Please have a look.  The patch should apply cleanly to master branch so if
it works it can be your 1st patch too.

PS: Paolo - I know I asked this before, but it'll be good to have your
review comment on anything above.

Thanks,


Here are two problems I can see:

1. It is inappropriate to use assert(qemu_mutex_iothread_locked()
&& !memory_region_update_pending) in update_ioeventfds().

For example, when entering commit(), if memory_region_update_pending
is true, the assertion will be triggered immediately when update_ioeventfds
is called.

2. The problem of stack overflow has not been solved. There are
too many places where address_space_to_flatview() may be called.

Here are another coredump stack:

#8  0x55a3a769ed85 in memory_region_transaction_commit_force () at 
../softmmu/memory.c:1154
#9  0x55a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1118
#10 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 
, old_view=old_view@entry=0x55a3a9d44990, 
new_view=new_view@entry=0x55a3d6837390,
adding=adding@entry=false) at ../softmmu/memory.c:955
#11 0x55a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 
) at ../softmmu/memory.c:1062
#12 0x55a3a769e870 in address_space_update_flatview_all () at 
../softmmu/memory.c:1107
#13 0x55a3a769ed85 in memory_region_transaction_commit_force () at 
../softmmu/memory.c:1154
#14 0x55a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1118
#15 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 
, old_view=old_view@entry=0x55a3a9d44990, 
new_view=new_view@entry=0x55a3d67f8d90,
adding=adding@entry=false) at ../softmmu/memory.c:955
#16 0x55a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 
) at ../softmmu/memory.c:1062
#17 0x55a3a769e870 in address_space_update_flatview_all () at 
../softmmu/memory.c:1107
#18 0x55a3a769ed85 in memory_region_transaction_commit_force () at 
../softmmu/memory.c:1154
#19 0x55a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1118
#20 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 
, old_view=old_view@entry=0x55a3a9d44990, 
new_view=new_view@entry=0x55a3d67ba790,
adding=adding@entry=false) at ../softmmu/memory.c:955
#21 0x55a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 
) at ../softmmu/memory.c:1062
#22 0x55a3a769e870 in address_space_update_flatview_all () at 
../softmmu/memory.c:1107
#23 0x55a3a769ed85 in memory_region_transaction_commit_force () at 
../softmmu/memory.c:1154

And this may not be the only case where stack overflow occurs.
Thus, changing the depth value is the safest way I think.

Thanks!




Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-21 Thread Chuang Xu

Hi, Peter

On 2023/2/22 上午4:36, Peter Xu wrote:

On 2023/2/21 上午11:38, Chuang Xu wrote:

I think we need a memory_region_transaction_commit_force() to force
commit
some transactions when load vmstate. This function is designed like this:

/*
  * memory_region_transaction_commit_force() is desgined to
  * force the mr transaction to be commited in the process
  * of loading vmstate.
  */
void memory_region_transaction_commit_force(void)

I would call this memory_region_transaction_do_commit(), and I don't think
the manipulation of memory_region_transaction_depth is needed here since we
don't release BQL during the whole process, so changing that depth isn't
needed at all to me.

So, I think we can...


{
     AddressSpace *as;
     unsigned int memory_region_transaction_depth_copy =
memory_region_transaction_depth;

     /*
  * Temporarily replace memory_region_transaction_depth with 0 to
prevent
  * memory_region_transaction_commit_force() and
address_space_to_flatview()
  * call each other recursively.
  */
     memory_region_transaction_depth = 0;

... drop here ...


Note that as I mentioned in the comment, we temporarily replace this value
to prevent commit() and address_space_to_flatview() call each other recursively,
and eventually stack overflow.

Part of the coredump call stack is attached here:

#8  0x558de5a998b5 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1131
#9  0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#10 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#11 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#12 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#13 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#14 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#15 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#16 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#17 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#18 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#19 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#20 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#21 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#22 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#23 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#24 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130

So I think we need to change the depth here.




     assert(qemu_mutex_iothread_locked());


     if (memory_region_update_pending) {
     flatviews_reset();

     MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);

     QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
     address_space_set_flatview(as);
     address_space_update_ioeventfds(as);
     }
     memory_region_update_pending = false;
     ioeventfd_update_pending = false;
     MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
     } else if (ioeventfd_update_pending) {
     QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
     address_space_update_ioeventfds(as);
     }
     ioeventfd_update_pending = false;
     }

     /* recover memory_region_transaction_depth */
     memory_region_transaction_depth =
memory_region_transaction_depth_copy;

... drop here ...


}

... then call this new memory_region_transaction_do_commit() in
memory_region_transaction_commit().

void memory_region_transaction_commit(void)
{
 AddressSpace *as;

 assert(memory_region_transaction_depth);
 --memory_region_transaction_depth;
 memory_region_transaction_do_commit();
}

Then...


Now there are two options to use this function:
1. call it in address_space_to_flatview():

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
     /*
  * Before using any flatview, check whether we're during a memory
  * region transaction. If so, force commit the memory region
transaction.
  */
     if (memory_region_transaction_in_progress())

Here we need to add the condition of BQL holding, or some threads without
BQL held running here will trigger the assertion in
memory_region_transaction_commit_force().

I'm not sure whether this condition is sufficient, at least for the mr access
in the load thread it is sufficient (because the load thread will hold the BQL

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-21 Thread Chuang Xu

Hi, Peter

This email is a supplement to the previous one.

On 2023/2/21 上午11:38, Chuang Xu wrote:


I think we need a memory_region_transaction_commit_force() to force 
commit

some transactions when load vmstate. This function is designed like this:

/*
 * memory_region_transaction_commit_force() is desgined to
 * force the mr transaction to be commited in the process
 * of loading vmstate.
 */
void memory_region_transaction_commit_force(void)
{
    AddressSpace *as;
    unsigned int memory_region_transaction_depth_copy = 
memory_region_transaction_depth;


    /*
 * Temporarily replace memory_region_transaction_depth with 0 to 
prevent
 * memory_region_transaction_commit_force() and 
address_space_to_flatview()

 * call each other recursively.
 */
    memory_region_transaction_depth = 0;

    assert(qemu_mutex_iothread_locked());


    if (memory_region_update_pending) {
    flatviews_reset();

    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);

    QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
    address_space_set_flatview(as);
    address_space_update_ioeventfds(as);
    }
    memory_region_update_pending = false;
    ioeventfd_update_pending = false;
    MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
    } else if (ioeventfd_update_pending) {
    QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
    address_space_update_ioeventfds(as);
    }
    ioeventfd_update_pending = false;
    }

    /* recover memory_region_transaction_depth */
    memory_region_transaction_depth = 
memory_region_transaction_depth_copy;

}

Now there are two options to use this function:
1. call it in address_space_to_flatview():

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
    /*
 * Before using any flatview, check whether we're during a memory
 * region transaction. If so, force commit the memory region 
transaction.

 */
    if (memory_region_transaction_in_progress())


Here we need to add the condition of BQL holding, or some threads without
BQL held running here will trigger the assertion in
memory_region_transaction_commit_force().

I'm not sure whether this condition is sufficient, at least for the mr access
in the load thread it is sufficient (because the load thread will hold the BQL
when accessing mr). But for other cases, it seems that we will return to
our discussion on sanity check..

Another point I worry about is whether the number of mr transaction commits
has increased in some other scenarios because of this force commit. Although
So far, I haven't seen a simple virtual machine lifecycle trigger this force 
commit.
I did a little test: replace commit_force() with abort() and run qtest.
Almost all error I can see is related to migration..


memory_region_transaction_commit_force();
    return qatomic_rcu_read(>current_map);
}

2. call it before each post_load()

I prefer to use the former one, it is more general than the latter.
And with this function, the sanity check is not necessary any more.
Although we may inevitably call memory_region_transaction_commit_force()
several times, in my actual test, the number of calls to
memory_region_transaction_commit_force() is still insignificant compared
with the number of calls to memory_region_transaction_commit() before
optimization. As a result, This code won't affect the optimization 
effect,

but it can ensure reliability.


Looking forward to your opinion, Thanks!




Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-20 Thread Chuang Xu

Hi, Peter

It seems that there is a problem with the code format in my last email.
I adjusted the format and resend this to you. Hope the format of this
email won't be wrong again..  :)

On 2023/2/17 下午11:52, Peter Xu wrote:

Hello, Chuang,

On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote:

Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
in address_space_init() and it works. But I'm not sure if this code change is
appropriate. If this change is not appropriate, we may need to consider other
sanity check.

I'd suggest not adding RCU locks without a good reason.

address_space_init() is definitely a special context because the AS is
exclusively owned by the caller before it returns.  It means no RCU
protection needed at all because no one else is touching it; neither do we
need qatomic_rcu_read() when read.

So I suggest we directly reference current_map, even though that'll need a
rich comment:

  static void address_space_set_flatview(AddressSpace *as)
  {
-FlatView *old_view = address_space_to_flatview(as);
+/*
+ * NOTE: we don't use RCU flavoured of address_space_to_flatview()
+ * because we exclusively own as->current_map here: it's either during
+ * init of an address space, or during commit() with BQL held.
+ */
+FlatView *old_view = as->current_map;

We can have address_space_to_flatview_raw() but since we'll directly modify
as->current_map very soon in the same function, so may not even bother.

I agree with you.

But now I am facing a new problem. Our sanity check is not as reliable 
as we think.


Although my current code can pass all the tests that Juan told me in the 
email.
But if I configure with nothing and run 'make check', My patch triggers 
error

in ahci migrate test:

G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img
MALLOC_PERTURB_=1
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
/data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k
 


✀
 


stderr:
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device
':00:1f.2/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted
Migration did not complete, status: failed

It seems that ahci_state_post_load() will call memory_access_is_direct()
and access mr->ram. Due to mr transaction delay, this value doesn't meet
expectations. And Here is the call chain for you to read the code:
->ahci_state_post_load
->ahci_cond_start_engines
->ahci_map_fis_address
->map_page
->dma_memory_map
->address_space_map
->memory_access_is_direct


I think we need a memory_region_transaction_commit_force() to force commit
some transactions when load vmstate. This function is designed like this:

/*
 * memory_region_transaction_commit_force() is desgined to
 * force the mr transaction to be commited in the process
 * of loading vmstate.
 */
void memory_region_transaction_commit_force(void)
{
    AddressSpace *as;
    unsigned int memory_region_transaction_depth_copy = 
memory_region_transaction_depth;


    /*
 * Temporarily replace memory_region_transaction_depth with 0 to 
prevent
 * memory_region_transaction_commit_force() and 
address_space_to_flatview()

 * call each other recursively.
 */
    memory_region_transaction_depth = 0;

    assert(qemu_mutex_iothread_locked());


    if (memory_region_update_pending) {
    flatviews_reset();

    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);

    QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
    address_space_set_flatview(as);
    address_space_update_ioeventfds(as);
    }
    memory_region_update_pending = false;
    ioeventfd_update_pending = false;
    MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
    } else if (ioeventfd_update_pending) {
    QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
    address_space_update_ioeventfds(as);
    }
    ioeventfd_update_pending = false;
    }

    /* recover memory_region_transaction_depth */
    memory_region_transaction_depth = 
memory_region_transaction_depth_copy;

}

Now there are two options to use this function:
1. call it in address_space_to_flatview():

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
    /*
 * Before using any flatview, check whether we're during a memory
 * region transaction. If so, force commit the memory region 
transaction.

 */
    if (memory_region_transaction_in_progress())
    memory_region_transaction_commit_force();
    return qat

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-20 Thread Chuang Xu

Hi, Peter

On 2023/2/17 下午11:52, Peter Xu wrote:

Hello, Chuang,

On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote:

Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
in address_space_init() and it works. But I'm not sure if this code change is
appropriate. If this change is not appropriate, we may need to consider other
sanity check.

I'd suggest not adding RCU locks without a good reason.

address_space_init() is definitely a special context because the AS is
exclusively owned by the caller before it returns.  It means no RCU
protection needed at all because no one else is touching it; neither do we
need qatomic_rcu_read() when read.

So I suggest we directly reference current_map, even though that'll need a
rich comment:

  static void address_space_set_flatview(AddressSpace *as)
  {
-FlatView *old_view = address_space_to_flatview(as);
+/*
+ * NOTE: we don't use RCU flavoured of address_space_to_flatview()
+ * because we exclusively own as->current_map here: it's either during
+ * init of an address space, or during commit() with BQL held.
+ */
+FlatView *old_view = as->current_map;

We can have address_space_to_flatview_raw() but since we'll directly modify
as->current_map very soon in the same function, so may not even bother.


I agree with you.

But now I am facing a new problem. Our sanity check is not as reliable as we 
think.

Although my current code can pass all the tests that Juan told me in the email.
But if I configure with nothing and run 'make check', My patch triggers
error in ahci migrate test:

G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh 
QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img 
MALLOC_PERTURB_=1 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
/data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k

 ✀  

stderr:
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive 
buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:1f.2/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted
Migration did not complete, status: failed

It seems that ahci_state_post_load() will call memory_access_is_direct() and 
access mr->ram.
Due to mr transaction delay, this value doesn't meet expectations.
And Here is the call chain for you to read the code:
->ahci_state_post_load
->ahci_cond_start_engines
->ahci_map_fis_address
->map_page
->dma_memory_map
->address_space_map
->memory_access_is_direct

I think we need a memory_region_transaction_commit_force() to force commit
some transactions when load vmstate. This function is designed like this:

|
/* memory_region_transaction_commit_force() is desgined to * force the 
mr transaction to be commited in the process * of loading vmstate. */ 
void memory_region_transaction_commit_force(void) { AddressSpace *as; 
unsigned int memory_region_transaction_depth_copy = 
memory_region_transaction_depth; /* Temporarily replace 
memory_region_transaction_depth with 0 to prevent * 
memory_region_transaction_commit_force() and address_space_to_flatview() 
* call each other recursively. */ memory_region_transaction_depth = 0; 
assert(qemu_mutex_iothread_locked()); if (memory_region_update_pending) 
{ flatviews_reset(); MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); 
QTAILQ_FOREACH(as, _spaces, address_spaces_link) { 
address_space_set_flatview(as); address_space_update_ioeventfds(as); } 
memory_region_update_pending = false; ioeventfd_update_pending = false; 
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if 
(ioeventfd_update_pending) { QTAILQ_FOREACH(as, _spaces, 
address_spaces_link) { address_space_update_ioeventfds(as); } 
ioeventfd_update_pending = false; } /* recover 
memory_region_transaction_depth */ memory_region_transaction_depth = 
memory_region_transaction_depth_copy; }

Now there are two options to use this function:
1. call it in address_space_to_flatview():

|
static inline FlatView *address_space_to_flatview(AddressSpace *as) {
/* * Before using any flatview, check whether we're during a memory * 
region transaction. If so, force commit the memory transaction. */
if (memory_region_transaction_in_progress()) 
memory_region_transaction_commit_force();

return qatomic_rcu_read(>current_map); }
2. call it before each post_load()
I prefer to use the former one, it is more general than the latter.
And with this function, the sanity check is not necessary any more.
Although we may inevitably call 
memory_region_transaction_commit_force()

several times, in my actual test, the number of calls to ||
|memory_region_transaction_commit_forc

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-20 Thread Chuang Xu

Hi, Juan

On 2023/2/16 上午3:10, Juan Quintela wrote:

Chuang Xu  wrote:

In this version:

Hi

I had to drop this.  It breaks migration of dbus-vmstate.

.[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover   
  ERROR   5.66s   killed by signal 6 SIGABRT

G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh 
QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 
/scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k

――
 ✀  
――
stderr:
qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: 
address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() 
&& qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.
Broken pipe
../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() 
detected QEMU death from signal 6 (Aborted) (core dumped)

(test program exited with status code -6)

TAP parsing error: Too few tests run (expected 23, got 12)


Can you take a look at this?

I reproduced it with "make check" and qemu compiled with the configure
line attached.

Later, Juan.

configure --enable-trace-backends=log
--prefix=/usr
--sysconfdir=/etc/sysconfig/
--audio-drv-list=pa
--target-list=x86_64-softmmu
--with-coroutine=ucontext
--with-git-submodules=validate
--enable-fdt=system
--enable-alsa
--enable-attr
--enable-auth-pam
--enable-avx2
--enable-avx512f
--enable-bochs
--enable-bpf
--enable-brlapi
--disable-bsd-user
--enable-bzip2
--enable-cap-ng
--disable-capstone
--disable-cfi
--disable-cfi-debug
--enable-cloop
--disable-cocoa
--enable-containers
--disable-coreaudio
--enable-coroutine-pool
--enable-crypto-afalg
--enable-curl
--enable-curses
--enable-dbus-display
--enable-debug-info
--disable-debug-mutex
--disable-debug-stack-usage
--disable-debug-tcg
--enable-dmg
--disable-docs
--disable-dsound
--enable-fdt
--disable-fuse
--disable-fuse-lseek
--disable-fuzzing
--disable-gcov
--enable-gettext
--enable-gio
--enable-glusterfs
--enable-gnutls
--disable-gprof
--enable-gtk
--disable-guest-agent
--disable-guest-agent-msi
--disable-hax
--disable-hvf
--enable-iconv
--enable-install-blobs
--enable-jack
--enable-keyring
--enable-kvm
--enable-l2tpv3
--enable-libdaxctl
--enable-libiscsi
--enable-libnfs
--enable-libpmem
--enable-libssh
--enable-libudev
--enable-libusb
--enable-linux-aio
--enable-linux-io-uring
--disable-linux-user
--enable-live-block-migration
--disable-lto
--disable-lzfse
--enable-lzo
--disable-malloc-trim
--enable-membarrier
--enable-module-upgrades
--enable-modules
--enable-mpath
--enable-multiprocess
--disable-netmap
--enable-nettle
--enable-numa
--disable-nvmm
--enable-opengl
--enable-oss
--enable-pa
--enable-parallels
--enable-pie
--enable-plugins
--enable-png
--disable-profiler
--enable-pvrdma
--enable-qcow1
--enable-qed
--disable-qom-cast-debug
--enable-rbd
--enable-rdma
--enable-replication
--enable-rng-none
--disable-safe-stack
--disable-sanitizers
--enable-stack-protector
--enable-sdl
--enable-sdl-image
--enable-seccomp
--enable-selinux
--enable-slirp
--enable-slirp-smbd
--enable-smartcard
--enable-snappy
--enable-sparse
--enable-spice
--enable-spice-protocol
--enable-system
--enable-tcg
--disable-tcg-interpreter
--disable-tools
--enable-tpm
--disable-tsan
--disable-u2f
--enable-usb-redir
--disable-user
--disable-vde
--enable-vdi
--enable-vhost-crypto
--enable-vhost-kernel
--enable-vhost-net
--enable-vhost-user
--enable-vhost-user-blk-server
--enable-vhost-vdpa
--enable-virglrenderer
--enable-virtfs
--enable-virtiofsd
  

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-17 Thread Chuang Xu

Hi, Peter!

In my last email to Juan, I mentioned two errors.
Now I want to discuss them with you.

On 2023/2/16 下午11:41, Chuang Xu wrote:
I ran qtest with reference to your environment, and finally reported 
two errors.


Error 1(the same as yours):

 QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 
G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh 
/data00/migration/qemu-open/build/tests/qtest/virtio-net-failover 
--tap -k
― 
✀ 
―

stderr:
qemu-system-x86_64: 
/data00/migration/qemu-open/include/exec/memory.h:1114: 
address_space_to_flatview: Assertion 
`(!memory_region_transaction_in_progress() && 
qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.

Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from 
signal 6 (Aborted) (core dumped)


(test program exited with status code -6)

TAP parsing error: Too few tests run (expected 23, got 12)

Coredump backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f3af64a8535 in __GI_abort () at abort.c:79
#2  0x7f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 
"(!memory_region_transaction_in_progress() && 
qemu_mutex_iothread_locked()) || rcu_read_is_locked()",
    file=0x55d9425f4870 
"/data00/migration/qemu-open/include/exec/memory.h", line=1114, 
function=) at assert.c:92
#3  0x7f3af64b61a2 in __GI___assert_fail 
(assertion=assertion@entry=0x55d9425f48a8 
"(!memory_region_transaction_in_progress() && 
qemu_mutex_iothread_locked()) || rcu_read_is_locked()",
    file=file@entry=0x55d9425f4870 
"/data00/migration/qemu-open/include/exec/memory.h", 
line=line@entry=1114, function=function@entry=0x55d9426cdce0 
<__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101
#4  0x55d942373853 in address_space_to_flatview 
(as=0x55d944738648) at 
/data00/migration/qemu-open/include/exec/memory.h:1112
#5  0x55d9423746f5 in address_space_to_flatview 
(as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126
#6  address_space_set_flatview (as=as@entry=0x55d944738648) at 
../softmmu/memory.c:1029
#7  0x55d94237ace3 in address_space_update_topology 
(as=0x55d944738648) at ../softmmu/memory.c:1080
#8  address_space_init (as=as@entry=0x55d944738648, 
root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 
"virtio-net-pci") at ../softmmu/memory.c:3082
#9  0x55d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, 
devfn=, name=0x55d9444b6c40 "virtio-net-pci", 
pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145
#10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at 
../hw/pci/pci.c:2036
#11 0x55d942404a8f in device_set_realized (obj=, 
value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510
#12 0x55d942407e36 in property_set_bool (obj=0x55d944738410, 
v=, name=, opaque=0x55d9444c71d0, 
errp=0x7f3aef7feae0) at ../qom/object.c:2285
#13 0x55d94240a0e3 in object_property_set 
(obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 
"realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) 
at ../qom/object.c:1420
#14 0x55d94240d15f in object_property_set_qobject 
(obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 
"realized", value=value@entry=0x55d945306cb0, 
errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28
#15 0x55d94240a354 in object_property_set_bool 
(obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", 
value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at 
../qom/object.c:1489
#16 0x55d94240427c in qdev_realize (dev=, 
bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at 
../hw/core/qdev.c:292
#17 0x55d9421ef4a0 in qdev_device_add_from_qdict 
(opts=0x55d945309c00, from_json=, errp=, 
errp@entry=0x7f3aef7feae0) at 
/data00/migration/qemu-open/include/hw/qdev-core.h:17
#18 0x55d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, 
n=0x55d9454e8530) at ../hw/net/virtio-net.c:933
#19 virtio_net_set_features (vdev=, 
features=4611687122246533156) at ../hw/net/virtio-net.c:1004
#20 0x55d94233d248 in virtio_set_features_nocheck 
(vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at 
../hw/virtio/virtio.c:2851
#21 0x55d942342eae in virtio_load (vdev=0x55d9454e8530, 
f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027
#22 0x55d942207601 in vmstate_load_state 
(f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 , 
opaque=0x55d9454e8530, version_id=11) at ../migration/vms

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-16 Thread Chuang Xu
Hi, Juan

Thanks for your test results!

On 2023/2/16 上午3:10, Juan Quintela wrote:
> Chuang Xu wrote:
>> In this version:
> Hi
>
> I had to drop this. It breaks migration of dbus-vmstate.

Previously, I only focused on the precopy migration test
in the normal scenario, but did not run qtest. So I need to
apologize for my inexperience..

>
> .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover
ERROR 5.66s killed by signal 6 SIGABRT
>>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145
/scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k
>
――
✀
――

> stderr:
> qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112:
address_space_to_flatview: Assertion
`(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked())
|| rcu_read_is_locked()' failed.
> Broken pipe
> ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu()
detected QEMU death from signal 6 (Aborted) (core dumped)
>
> (test program exited with status code -6)
>
> TAP parsing error: Too few tests run (expected 23, got 12)
>


>
> Can you take a look at this?
>
> I reproduced it with "make check" and qemu compiled with the configure
> line attached.
>
> Later, Juan.
>
> configure --enable-trace-backends=log
> --prefix=/usr
> --sysconfdir=/etc/sysconfig/
> --audio-drv-list=pa
> --target-list=x86_64-softmmu
> --with-coroutine=ucontext
> --with-git-submodules=validate
> --enable-fdt=system
> --enable-alsa
> --enable-attr
> --enable-auth-pam
> --enable-avx2
> --enable-avx512f
> --enable-bochs
> --enable-bpf
> --enable-brlapi
> --disable-bsd-user
> --enable-bzip2
> --enable-cap-ng
> --disable-capstone
> --disable-cfi
> --disable-cfi-debug
> --enable-cloop
> --disable-cocoa
> --enable-containers
> --disable-coreaudio
> --enable-coroutine-pool
> --enable-crypto-afalg
> --enable-curl
> --enable-curses
> --enable-dbus-display
> --enable-debug-info
> --disable-debug-mutex
> --disable-debug-stack-usage
> --disable-debug-tcg
> --enable-dmg
> --disable-docs
> --disable-dsound
> --enable-fdt
> --disable-fuse
> --disable-fuse-lseek
> --disable-fuzzing
> --disable-gcov
> --enable-gettext
> --enable-gio
> --enable-glusterfs
> --enable-gnutls
> --disable-gprof
> --enable-gtk
> --disable-guest-agent
> --disable-guest-agent-msi
> --disable-hax
> --disable-hvf
> --enable-iconv
> --enable-install-blobs
> --enable-jack
> --enable-keyring
> --enable-kvm
> --enable-l2tpv3
> --enable-libdaxctl
> --enable-libiscsi
> --enable-libnfs
> --enable-libpmem
> --enable-libssh
> --enable-libudev
> --enable-libusb
> --enable-linux-aio
> --enable-linux-io-uring
> --disable-linux-user
> --enable-live-block-migration
> --disable-lto
> --disable-lzfse
> --enable-lzo
> --disable-malloc-trim
> --enable-membarrier
> --enable-module-upgrades
> --enable-modules
> --enable-mpath
> --enable-multiprocess
> --disable-netmap
> --enable-nettle
> --enable-numa
> --disable-nvmm
> --enable-opengl
> --enable-oss
> --enable-pa
> --enable-parallels
> --enable-pie
> --enable-plugins
> --enable-png
> --disable-profiler
> --enable-pvrdma
> --enable-qcow1
> --enable-qed
> --disable-qom-cast-debug
> --enable-rbd
> --enable-rdma
> --enable-replication
> --enable-rng-none
> --disable-safe-stack
> --disable-sanitizers
> --enable-stack-protector
> --enable-sdl
> --enable-sdl-image
> --enable-seccomp
> --enable-selinux
> --enable-slirp
> --enable-slirp-smbd
> --enable-smartcard
> --enable-snappy
> --enable-sparse
> --enable-spice
> --enable-spice-protocol
> --enable-system
> --enable-tcg
> --disable-tcg-interpreter
> --disable-tools
> --enable-tpm
> --disable-tsan
> --disable-u2f
> --enable-usb-redir
> --disable-user
> --disable-vde
> --enable-vdi
> --enable-vhost-crypto
> --enable-vhost-kernel
> --enable-vhost-net
> --enable-vhost-user
> --enable-vhost-user-blk-server
> --enable-vhost-vdpa
> --enable-virglrenderer
> --enable-virtfs
> --enable-virtiofsd
> --enable-vnc
> --enable-vnc-j

Re: [RFC v5 1/3] rcu: introduce rcu_read_is_locked()

2023-02-14 Thread Chuang Xu

Hi, Paolo!

On 2023/2/2 下午6:59, Juan Quintela wrote:

Chuang Xu  wrote:

add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 

Reviewed-by: Juan Quintela 

Althought I think that petting a review from Paolo or anyone that knows
more than RCU could be a good idea.


---
  include/qemu/rcu.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..719916d9d3 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
  }
  }
  
+static inline bool rcu_read_is_locked(void)

+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
  extern void synchronize_rcu(void);
  
  /*


Do you have any more suggestions about patch1?

Looking forward to your reply.

Thanks!




[RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-01-17 Thread Chuang Xu
In this version:

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of  
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

(This result is different from that of v1. It may be that someone has 
changed something on my host.., but it does not affect the display of 
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms



In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms




[RFC v5 3/3] migration: reduce time of loading non-iterable vmstate

2023-01-17 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..8ca6d396f4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,16 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
 uint8_t section_type;
 int ret = 0;
 
+/*
+ * Call memory_region_transaction_begin() before loading vmstate.
+ * This call is paired with memory_region_transaction_commit() at
+ * the end of qemu_loadvm_state_main(), in order to pack all the
+ * changes to memory region during the period of loading
+ * non-iterable vmstate in a single memory transaction.
+ * This operation will reduce time of loading non-iterable vmstate
+ */
+memory_region_transaction_begin();
+
 retry:
 while (true) {
 section_type = qemu_get_byte(f);
@@ -2684,6 +2694,14 @@ out:
 goto retry;
 }
 }
+
+/*
+ * Call memory_region_transaction_commit() after loading vmstate.
+ * At this point, qemu actually completes all the previous memory
+ * region transactions.
+ */
+memory_region_transaction_commit();
+
 return ret;
 }
 
-- 
2.20.1




[RFC v5 1/3] rcu: introduce rcu_read_is_locked()

2023-01-17 Thread Chuang Xu
add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..719916d9d3 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[RFC v5 2/3] memory: add depth assert in address_space_to_flatview

2023-01-17 Thread Chuang Xu
Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 15 +++
 softmmu/memory.c  |  5 +
 2 files changed, 20 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..ce13ebb763 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1069,8 +1070,22 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ * Meanwhile it's safe to access current_map with RCU read lock held
+ * even if during a memory transaction. It means the user can bear
+ * with an obsolete map.
+ */
+assert((!memory_region_transaction_in_progress() &&
+qemu_mutex_iothread_locked()) ||
+rcu_read_is_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..856c37fd0a 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void)
}
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2023-01-13 Thread Chuang Xu

Hi, Peter,

On 2023/1/12 下午11:13, Peter Xu wrote:

We wanted to capture outliers when you apply the follow up patch to vm load
procedure.

That will make depth>0 for the whole process of vm load during migration,
and we wanted to make sure it's safe, hence this patch, right?


In my perspective, both BQL and RCU can ensure that the flatview will not be
released when the worker thread accesses the flatview, because before the rcu
thread releases the flatview, it will make sure itself holding BQL and the
worker thread not holding RCU. So, whether the depth is 0 or not, as long as
BQL or RCU is held, the worker thread will not access the obsolete flatview
(IIUC 'obsolete' means that flatview is released).


To summarize, the original check didn't consider BQL, and if to consider
BQL I think it should be something like:

/* Guarantees valid access to the flatview, either lock works */
assert(BQL_HELD() || RCU_HELD());

/*
 * Guarantees any BQL holder is not reading obsolete flatview (e.g. when
 * during vm load)
 */
if (BQL_HELD())
assert(depth==0);

IIUC it can be merged into:

assert((BQL_HELD() && depth==0) || RCU_HELD());

IMHO assert(BQL_HELD() || RCU_HELD()) is enough..

Yes, but IMHO this will guarantee safe use of flatview only if _before_
your follow up patch.

Before that patch, the depth==0 should always stand (when BQL_HELD()
stands) I think.

After that patch, since depth will be increased at the entry of vm load
there's risk that we can overlook code that will be referencing flatview
during vm load and that can reference an obsolete flatview.  Since the
whole process of qemu_loadvm_state_main() will have BQL held we won't hit
the assertion if only to check "BQL_HELD() || RCU_HELD()" because BQL_HELD
always satisfies.


Or you think that once a mr transaction is in progress, the old flatview has
been obsolete? If we regard flatview as obsolete when a mr transaction is in
progress, How can RCU ensure that flatview is not obsolete?

AFAIU RCU cannot guarantee that.  So IIUC any RCU lock user need to be able
to tolerant obsolete flatviews being referenced and it should not harm the
system.  If it needs the latest update, it should take care of that
separately.

For example, the virtio code we're looking at in this series uses RCU lock
to build address space cache for the device vrings which is based on the
current flatview of mem.  It's safe to reference obsolete flatview in this
case (it means the flatview can be during an update when virtio is
establishing the address space cache), IMHO that's fine because the address
space cache will be updated again in virtio_memory_listener_commit() so
it'll be consistent at last.  The intermediate phase of inconsistency
should be fine in this case just like any DMA happens during a memory
hotplug.

For this specific patch, IMHO the core is to check depth>0 reference, and
we need RCU_HELD to mask out where the obsolete references are fine.

Thanks,


Thanks a lot for your patience!

I ignored the preconditions for this discussion, so that I asked a stupid 
question..

Now I'm in favor of checking “(BQL_HELD() && depth==0) || RCU_HELD()” in v5.
And what does Paolo thinks of Peter's solution?

Thanks again!




Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2023-01-12 Thread Chuang Xu

Hi, Peter, Paolo,

On 2023/1/10 下午10:45, Peter Xu wrote:

On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote:

Hi, Peter and Paolo,

Hi, Chuang, Paolo,


I'm sorry I didn't reply to your email in time. I was infected with
COVID-19 two weeks ago, so I couldn't think about the problems discussed
in your email for a long time.

On 2023/1/4 上午1:43, Peter Xu wrote:

Hi, Paolo,

On Wed, Dec 28, 2022 at 09:27:50AM +0100, Paolo Bonzini wrote:

Il ven 23 dic 2022, 16:54 Peter Xu ha scritto:


This is not valid because the transaction could happen in *another*

thread.

In that case memory_region_transaction_depth() will be > 0, but RCU is
needed.

Do you mean the code is wrong, or the comment? Note that the code has
checked rcu_read_locked() where introduced in patch 1, but maybe

something

else was missed?


The assertion is wrong. It will succeed even if RCU is unlocked in this
thread but a transaction is in progress in another thread.

IIUC this is the case where the context:

(1) doesn't have RCU read lock held, and,
(2) doesn't have BQL held.

Is it safe at all to reference any flatview in such a context? The thing
is I think the flatview pointer can be freed anytime if both locks are

not

taken.


Perhaps you can check (memory_region_transaction_depth() > 0 &&
!qemu_mutex_iothread_locked()) || rcu_read_locked() instead?

What if one thread calls address_space_to_flatview() with BQL held but

not

RCU read lock held? I assume it's a legal operation, but it seems to be
able to trigger the assert already?

Thanks,


I'm not sure whether I understand the content of your discussion correctly,
so here I want to explain my understanding firstly.

 From my perspective, Paolo thinks that when thread 1 is in a transaction,
thread 2 will trigger the assertion when accessing the flatview without
holding RCU read lock, although sometimes the thread 2's access to flatview
is legal. So Paolo suggests checking (memory_region_transaction_depth() > 0
&& !qemu_mutex_iothread_locked()) || rcu_read_locked() instead.

And Peter thinks that as long as no thread holds the BQL or RCU read lock,
the old flatview can be released (actually executed by the rcu thread with
BQL held). When thread 1 is in a transaction, if thread 2 access the
flatview
with BQL held but not RCU read lock held, it's a legal operation. In this
legal case, it seems that both my code and Paolo's code will trigger
assertion.

IIUC your original patch is fine in this case (BQL held, RCU not held), as
long as depth==0.  IMHO what we want to trap here is when BQL held (while
RCU is not) and depth>0 which can cause unpredictable side effect of using
obsolete flatview.


I don't quite understand the side effects of depth>0 when BQL is held (while
RCU is not).

In my perspective, both BQL and RCU can ensure that the flatview will not be
released when the worker thread accesses the flatview, because before the rcu
thread releases the flatview, it will make sure itself holding BQL and the
worker thread not holding RCU. So, whether the depth is 0 or not, as long as
BQL or RCU is held, the worker thread will not access the obsolete flatview
(IIUC 'obsolete' means that flatview is released).



To summarize, the original check didn't consider BQL, and if to consider
BQL I think it should be something like:

   /* Guarantees valid access to the flatview, either lock works */
   assert(BQL_HELD() || RCU_HELD());

   /*
* Guarantees any BQL holder is not reading obsolete flatview (e.g. when
* during vm load)
*/
   if (BQL_HELD())
   assert(depth==0);

IIUC it can be merged into:

   assert((BQL_HELD() && depth==0) || RCU_HELD());


IMHO assert(BQL_HELD() || RCU_HELD()) is enough..

Or you think that once a mr transaction is in progress, the old flatview has
been obsolete? If we regard flatview as obsolete when a mr transaction is in
progress, How can RCU ensure that flatview is not obsolete?

What does Paolo think of this check?

Thanks!


I'm not sure if I have a good understanding of your emails? I think
checking(memory_region_transaction_get_depth() == 0 || rcu_read_locked() ||
qemu_mutex_iothread_locked()) should cover the case you discussed.

This seems still problematic too?  Since the assert can pass even if
neither BQL nor RCU is held (as long as depth==0).

Thanks,





Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2023-01-10 Thread Chuang Xu
Hi, Peter and Paolo,
I'm sorry I didn't reply to your email in time. I was infected with
COVID-19 two weeks ago, so I couldn't think about the problems discussed
in your email for a long time.

On 2023/1/4 上午1:43, Peter Xu wrote:
> Hi, Paolo,
>
> On Wed, Dec 28, 2022 at 09:27:50AM +0100, Paolo Bonzini wrote:
>> Il ven 23 dic 2022, 16:54 Peter Xu ha scritto:
>>
 This is not valid because the transaction could happen in *another*
>>> thread.
 In that case memory_region_transaction_depth() will be > 0, but RCU is
 needed.
>>> Do you mean the code is wrong, or the comment? Note that the code has
>>> checked rcu_read_locked() where introduced in patch 1, but maybe
something
>>> else was missed?
>>>
>> The assertion is wrong. It will succeed even if RCU is unlocked in this
>> thread but a transaction is in progress in another thread.
> IIUC this is the case where the context:
>
> (1) doesn't have RCU read lock held, and,
> (2) doesn't have BQL held.
>
> Is it safe at all to reference any flatview in such a context? The thing
> is I think the flatview pointer can be freed anytime if both locks are
not
> taken.
>
>> Perhaps you can check (memory_region_transaction_depth() > 0 &&
>> !qemu_mutex_iothread_locked()) || rcu_read_locked() instead?
> What if one thread calls address_space_to_flatview() with BQL held but
not
> RCU read lock held? I assume it's a legal operation, but it seems to be
> able to trigger the assert already?
>
> Thanks,
>
I'm not sure whether I understand the content of your discussion correctly,
so here I want to explain my understanding firstly.

>From my perspective, Paolo thinks that when thread 1 is in a transaction,
thread 2 will trigger the assertion when accessing the flatview without
holding RCU read lock, although sometimes the thread 2's access to flatview
is legal. So Paolo suggests checking (memory_region_transaction_depth() > 0
&& !qemu_mutex_iothread_locked()) || rcu_read_locked() instead.

And Peter thinks that as long as no thread holds the BQL or RCU read lock,
the old flatview can be released (actually executed by the rcu thread with
BQL held). When thread 1 is in a transaction, if thread 2 access the
flatview
with BQL held but not RCU read lock held, it's a legal operation. In this
legal case, it seems that both my code and Paolo's code will trigger
assertion.

I'm not sure if I have a good understanding of your emails? I think
checking(memory_region_transaction_get_depth() == 0 || rcu_read_locked() ||
qemu_mutex_iothread_locked()) should cover the case you discussed.

What's your take?

Thanks!


Re: [RFC v4 1/3] rcu: introduce rcu_read_locked()

2023-01-05 Thread Chuang Xu
On 2023/1/4 下午10:20, Alex Bennée wrote:
> Chuang Xu writes:
>
>> add rcu_read_locked() to detect holding of rcu lock.
>>
>> Signed-off-by: Chuang Xu
>> ---
>> include/qemu/rcu.h | 7 +++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
>> index b063c6fde8..42cbd0080f 100644
>> --- a/include/qemu/rcu.h
>> +++ b/include/qemu/rcu.h
>> @@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
>> }
>> }
>>
>> +static inline bool rcu_read_locked(void)
> We use the locked suffix to indicate functions that should be called
> with a lock held. Perhaps renaming this to rcu_read_is_locked() would
> make the intent of the function clearer?

Yes, rcu_read_is_locked() do make the intent of the function clearer.
I'll rename the function in v5.

Thanks!

>> +{
>> + struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
>> +
>> + return p_rcu_reader->depth > 0;
>> +}
>> +
>> extern void synchronize_rcu(void);
>>
>> /*
>


Re: [External] Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2023-01-03 Thread Chuang Xu
On 2022/12/28 下午6:50, Philippe Mathieu-Daudé wrote:

On 23/12/22 15:23, Chuang Xu wrote:

Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 

---
  include/exec/memory.h | 9 +
  softmmu/memory.c  | 5 +
  2 files changed, 14 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..66c43b4862 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
  MemoryRegion *root;
  };
  +int memory_region_transaction_get_depth(void);


Do we want to expose this; isn't the depth internal?

If we need to expose something, can we restrict it to

  bool memory_region_in_transaction(void) or
  bool memory_region_transaction_in_progress(void)?

Yes, we'd better not expose the value of an internal
variable. I'll make changes in v5.

Thanks!


Re: [RFC v4 3/3] migration: reduce time of loading non-iterable vmstate

2023-01-03 Thread Chuang Xu
On 2022/12/24 上午12:06, David Hildenbrand wrote:

On 23.12.22 15:23, Chuang Xu wrote:

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 8 16-queue vhost-net device
   - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
beforeabout 150 ms  740+ ms
afterabout 30 ms  630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 8 1-queue vhost-net device
   - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
beforeabout 90 ms about 250 ms

afterabout 25 ms about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
   - Intel(R) Xeon(R) Platinum 8260 CPU
   - NVIDIA Mellanox ConnectX-5
- VM
   - 32 CPUs 128GB RAM VM
   - 1 16-queue vhost-net device
   - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
beforeabout 20 ms about 70 ms
afterabout 11 ms about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading
non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 

---
  migration/savevm.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..19785e5a54 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f,
MigrationIncomingState *mis)
  uint8_t section_type;
  int ret = 0;
  +/* call memory_region_transaction_begin() before loading vmstate */



I'd suggest extending the comment *why* you are doing that, that it's a
pure performance optimization, and how it achieves that.

Thanks! I'll extend the comment in v5.


Re: [RFC v4 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu
On 2022/12/23 下午11:50, Peter Xu wrote:

Chuang,

On Fri, Dec 23, 2022 at 10:23:04PM +0800, Chuang Xu wrote:

In this version:

- attach more information in the cover letter.
- remove changes on virtio_load().
- add rcu_read_locked() to detect holding of rcu lock.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

Have you investigated why multi-queue added so much downtime overhead with
the same environment, comparing to below [1]?

I have analyzed the downtime in detail. Both stopping and starting the
device are
time-consuming.

For stopping vhost-net devices, vhost_net_stop_one() will be called once more
for each additional queue, while vhost_virtqueue_stop() will be called twice
in vhost_dev_stop(). For example, we need to call vhost_virtqueue_stop()
32(= 16 * 2) times to stop a 16-queue vhost-net device. In
vhost_virtqueue_stop(),
QEMU needs to negotiate with the vhost user daemon. The same is true
for vhost-net
devices startup.

For stopping vhost-user-blk devices, vhost_virtqueue_stop() will be called once
more for each additional queue. For example, we need to call
vhost_virtqueue_stop()
4 times to stop a 4-queue vhost-user-blk device. The same is true for
vhost-user-blk
devices startup.

It seems that the vhost-user-blk device is less affected by the number
of queues
than the vhost-net device. However, the vhost-user-blk device needs to prepare
inflight when it is started. The negotiation with spdk in this step is also
time-consuming. I tried to move this step to the startup phase of the
target QEMU
before the migration started. In my test, This optimization can greatly reduce
the vhost-user-blk device startup time and thus reduce the downtime.
I'm not sure
whether this is hacky. If you are interested in this, maybe we can
discuss it further.

(This result is different from that of v1. It may be that someone has
changed something on my host.., but it does not affect the display of
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms
after   about 25 ms  about 160 ms

[1]


In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms


As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

The downtime measured in precopy can be more complicated than postcopy
because the time of switch is calculated by qemu based on the downtime
setup, and also that contains part of RAM migrations.  Postcopy should be
more accurate on that because there's no calculation done, meanwhile
there's no RAM transferred during downtime.

However postcopy downtime is not accurate either in implementation of it in
postcopy_start(), where the downtime is measured right after we flushed the
packed data, and right below it there's some idea of optimizing it:

if (migrate_postcopy_ram()) {
/*
 * Although this ping is just for debug, it could potentially be
 * used for getting a better measurement of downtime at the source.
 */
qemu_savevm_send_ping(ms->to_dst_file, 4);
}

So maybe I'll have a look there.

The current calculation of downtime is really inaccurate, because the sou

[RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2022-12-23 Thread Chuang Xu
Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 5 +
 2 files changed, 14 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..66c43b4862 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+int memory_region_transaction_get_depth(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked());
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..01192e2e5b 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void)
}
 }
 
+int memory_region_transaction_get_depth(void)
+{
+return memory_region_transaction_depth;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1




[RFC v4 1/3] rcu: introduce rcu_read_locked()

2022-12-23 Thread Chuang Xu
add rcu_read_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu 
---
 include/qemu/rcu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..42cbd0080f 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
 }
 }
 
+static inline bool rcu_read_locked(void)
+{
+struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1




[RFC v4 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu
In this version:

- attach more information in the cover letter.
- remove changes on virtio_load().
- add rcu_read_locked() to detect holding of rcu lock.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

(This result is different from that of v1. It may be that someone has 
changed something on my host.., but it does not affect the display of 
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms



In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms




[RFC v4 3/3] migration: reduce time of loading non-iterable vmstate

2022-12-23 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 150 ms  740+ ms
after   about 30 ms   630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 90 ms  about 250 ms

after   about 25 ms  about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

time of loading non-iterable vmstate downtime
before  about 20 ms  about 70 ms
after   about 11 ms  about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..19785e5a54 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
 uint8_t section_type;
 int ret = 0;
 
+/* call memory_region_transaction_begin() before loading vmstate */
+memory_region_transaction_begin();
+
 retry:
 while (true) {
 section_type = qemu_get_byte(f);
@@ -2684,6 +2687,10 @@ out:
 goto retry;
 }
 }
+
+/* call memory_region_transaction_commit() after loading vmstate */
+memory_region_transaction_commit();
+
 return ret;
 }
 
-- 
2.20.1




Re: [RFC v3 1/3] memory: add depth assert in address_space_to_flatview

2022-12-20 Thread Chuang Xu
On 2022/12/16 上午12:04, Peter Xu wrote:

On Wed, Dec 14, 2022 at 04:38:52PM -0500, Peter Xu wrote:

On Wed, Dec 14, 2022 at 08:03:38AM -0800, Chuang Xu wrote:

On 2022/12/13 下午9:35, Chuang Xu wrote:

Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 


---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 1 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..b43cd46084 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };

+static unsigned memory_region_transaction_depth;
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_depth == 0);
 return qatomic_rcu_read(>current_map);
 }

diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..f177c40cd8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -37,7 +37,6 @@

 //#define DEBUG_UNASSIGNED

-static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;

Here are some new situations to be synchronized.

I found that there is a probability to trigger assert in the QEMU startup phase.

Here is the coredump backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f7825e33535 in __GI_abort () at abort.c:79
#2  0x7f7825e3340f in __assert_fail_base (fmt=0x7f7825f94ef0
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5653c643add8
"memory_region_transaction_depth == 0",
file=0x5653c63dad78
"/data00/migration/qemu-open/include/exec/memory.h", line=1082,
function=) at assert.c:92
#3  0x7f7825e411a2 in __GI___assert_fail
(assertion=assertion@entry=0x5653c643add8
"memory_region_transaction_depth == 0",
file=file@entry=0x5653c63dad78
"/data00/migration/qemu-open/include/exec/memory.h",
line=line@entry=1082,
function=function@entry=0x5653c643bd00 <__PRETTY_FUNCTION__.18101>
"address_space_to_flatview") at assert.c:101
#4  0x5653c60f0383 in address_space_to_flatview (as=0x5653c6af2340
) at
/data00/migration/qemu-open/include/exec/memory.h:1082
#5  address_space_to_flatview (as=0x5653c6af2340
) at
/data00/migration/qemu-open/include/exec/memory.h:1074
#6  address_space_get_flatview (as=0x5653c6af2340
) at ../softmmu/memory.c:809
#7  0x5653c60fef04 in address_space_cache_init
(cache=cache@entry=0x7f781fff8420, as=,
addr=63310635776, len=48, is_write=is_write@entry=false)
at ../softmmu/physmem.c:3352
#8  0x5653c60c08c5 in virtqueue_split_pop (vq=0x7f781c576270,
sz=264) at ../hw/virtio/virtio.c:1959
#9  0x5653c60c0b7d in virtqueue_pop (vq=vq@entry=0x7f781c576270,
sz=) at ../hw/virtio/virtio.c:2177
#10 0x5653c609f14f in virtio_scsi_pop_req
(s=s@entry=0x5653c9034300, vq=vq@entry=0x7f781c576270) at
../hw/scsi/virtio-scsi.c:219
#11 0x5653c60a04a3 in virtio_scsi_handle_cmd_vq
(vq=0x7f781c576270, s=0x5653c9034300) at ../hw/scsi/virtio-scsi.c:735
#12 virtio_scsi_handle_cmd (vdev=0x5653c9034300, vq=0x7f781c576270) at
../hw/scsi/virtio-scsi.c:776
#13 0x5653c60ba72f in virtio_queue_notify_vq (vq=0x7f781c576270)
at ../hw/virtio/virtio.c:2847
#14 0x5653c62d9706 in aio_dispatch_handler
(ctx=ctx@entry=0x5653c84909e0, node=0x7f68e4007840) at
../util/aio-posix.c:369
#15 0x5653c62da254 in aio_dispatch_ready_handlers
(ready_list=0x7f781fffe6a8, ctx=0x5653c84909e0) at
../util/aio-posix.c:399
#16 aio_poll (ctx=0x5653c84909e0, blocking=blocking@entry=true) at
../util/aio-posix.c:713
#17 0x5653c61b2296 in iothread_run
(opaque=opaque@entry=0x5653c822c390) at ../iothread.c:67
#18 0x5653c62dcd8a in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:505
#19 0x7f7825fd8fa3 in start_thread (arg=) at
pthread_create.c:486
#20 0x7f7825f0a06f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This does look like a bug to me.

Paolo/Michael?

Hmm, I found that virtqueue_split_pop() took the rcu lock.. then I think
it's fine.

Chuang, I think what you can try next is add a helper to detect holding of
rcu lock, then assert with "depth==0 || rcu_read_locked()".  I think that's:

static inline bool rcu_read_locked(void)
{
struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();

return p_rcu_reader->depth > 0;
}

Then IIUC you can even drop patch 2 because virtio_load() also takes the
rcu lock.


Good idea! Later I'll try this way in my v4 patches.

Thanks very much!


Re: [RFC v3 1/3] memory: add depth assert in address_space_to_flatview

2022-12-20 Thread Chuang Xu
On 2022/12/16 上午12:51, Peter Maydell wrote:

On Tue, 13 Dec 2022 at 13:36, Chuang Xu 
 wrote:

Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 

---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 1 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..b43cd46084 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };

+static unsigned memory_region_transaction_depth;

This looks odd. If you define a static variable in a
header file then each .c file which directly or indirectly
includes the header will get its own private copy of the
variable. This probably isn't what you want...

thanks
-- PMM

Yes, Maybe we should add a function to acquire the value..

I'll add this part to v4. Thanks!


Re: [RFC v3 1/3] memory: add depth assert in address_space_to_flatview

2022-12-14 Thread Chuang Xu
On 2022/12/13 下午9:35, Chuang Xu wrote:

Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 

---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 1 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..b43cd46084 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };

+static unsigned memory_region_transaction_depth;
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_depth == 0);
 return qatomic_rcu_read(>current_map);
 }

diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..f177c40cd8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -37,7 +37,6 @@

 //#define DEBUG_UNASSIGNED

-static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;

Here are some new situations to be synchronized.

I found that there is a probability to trigger assert in the QEMU startup phase.

Here is the coredump backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f7825e33535 in __GI_abort () at abort.c:79
#2  0x7f7825e3340f in __assert_fail_base (fmt=0x7f7825f94ef0
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5653c643add8
"memory_region_transaction_depth == 0",
file=0x5653c63dad78
"/data00/migration/qemu-open/include/exec/memory.h", line=1082,
function=) at assert.c:92
#3  0x7f7825e411a2 in __GI___assert_fail
(assertion=assertion@entry=0x5653c643add8
"memory_region_transaction_depth == 0",
file=file@entry=0x5653c63dad78
"/data00/migration/qemu-open/include/exec/memory.h",
line=line@entry=1082,
function=function@entry=0x5653c643bd00 <__PRETTY_FUNCTION__.18101>
"address_space_to_flatview") at assert.c:101
#4  0x5653c60f0383 in address_space_to_flatview (as=0x5653c6af2340
) at
/data00/migration/qemu-open/include/exec/memory.h:1082
#5  address_space_to_flatview (as=0x5653c6af2340
) at
/data00/migration/qemu-open/include/exec/memory.h:1074
#6  address_space_get_flatview (as=0x5653c6af2340
) at ../softmmu/memory.c:809
#7  0x5653c60fef04 in address_space_cache_init
(cache=cache@entry=0x7f781fff8420, as=,
addr=63310635776, len=48, is_write=is_write@entry=false)
at ../softmmu/physmem.c:3352
#8  0x5653c60c08c5 in virtqueue_split_pop (vq=0x7f781c576270,
sz=264) at ../hw/virtio/virtio.c:1959
#9  0x5653c60c0b7d in virtqueue_pop (vq=vq@entry=0x7f781c576270,
sz=) at ../hw/virtio/virtio.c:2177
#10 0x5653c609f14f in virtio_scsi_pop_req
(s=s@entry=0x5653c9034300, vq=vq@entry=0x7f781c576270) at
../hw/scsi/virtio-scsi.c:219
#11 0x5653c60a04a3 in virtio_scsi_handle_cmd_vq
(vq=0x7f781c576270, s=0x5653c9034300) at ../hw/scsi/virtio-scsi.c:735
#12 virtio_scsi_handle_cmd (vdev=0x5653c9034300, vq=0x7f781c576270) at
../hw/scsi/virtio-scsi.c:776
#13 0x5653c60ba72f in virtio_queue_notify_vq (vq=0x7f781c576270)
at ../hw/virtio/virtio.c:2847
#14 0x5653c62d9706 in aio_dispatch_handler
(ctx=ctx@entry=0x5653c84909e0, node=0x7f68e4007840) at
../util/aio-posix.c:369
#15 0x5653c62da254 in aio_dispatch_ready_handlers
(ready_list=0x7f781fffe6a8, ctx=0x5653c84909e0) at
../util/aio-posix.c:399
#16 aio_poll (ctx=0x5653c84909e0, blocking=blocking@entry=true) at
../util/aio-posix.c:713
#17 0x5653c61b2296 in iothread_run
(opaque=opaque@entry=0x5653c822c390) at ../iothread.c:67
#18 0x5653c62dcd8a in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:505
#19 0x7f7825fd8fa3 in start_thread (arg=) at
pthread_create.c:486
#20 0x7f7825f0a06f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I find that some functions doesn't take bql before calling
address_space_to_flatview(), as shown in the backtrace. I
also find other similar situations in the code. I find that
prepare_mmio_access() will take bql to provide some protection,
but I don't think it's sufficient.I think that if we want to
ensure that the reading and writing of mr and the modification
of mr don't occur at the same time, maybe we need to take bql
in address_space_to_flatview() like this:


static FlatView *address_space_to_flatview(AddressSpace *as)
{
bool release_lock = false;
FlatView *ret;

if (!qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
release_lock = true;
}

/*
 * Before using any flatview, sanity

Re: [RFC v3 2/3] virtio: support delay of checks in virtio_load()

2022-12-14 Thread Chuang Xu
On 2022/12/14 上午12:31, Peter Xu wrote:

On Tue, Dec 13, 2022 at 09:35:09PM +0800, Chuang Xu wrote:

Delay checks in virtio_load() to avoid possible address_space_to_flatview() call
during memory region's begin/commit.

I didn't notice virtio has the vm change handler already, looks good to
reuse it. :) A few more comments though (before some real virtio developers
chim im).


Signed-off-by: Chuang Xu 

---
 hw/virtio/virtio.c | 37 +++--
 include/hw/virtio/virtio.h |  2 ++
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index eb6347ab5d..f556e565c6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3642,8 +3642,26 @@ int virtio_load(VirtIODevice *vdev, QEMUFile
*f, int version_id)
 vdev->start_on_kick = true;
 }

+vdev->delay_check = true;
+
+if (vdc->post_load) {
+ret = vdc->post_load(vdev);
+if (ret) {
+return ret;
+}
+}
+
+return 0;
+}
+
+static void virtio_load_check_delay(VirtIODevice *vdev)
+{
 RCU_READ_LOCK_GUARD();
-for (i = 0; i < num; i++) {
+for (int i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+if (vdev->vq[i].vring.num == 0) {
+break;
+}
+
 if (vdev->vq[i].vring.desc) {
 uint16_t nheads;

@@ -3696,19 +3714,12 @@ int virtio_load(VirtIODevice *vdev, QEMUFile
*f, int version_id)
  i, vdev->vq[i].vring.num,
  vdev->vq[i].last_avail_idx,
  vdev->vq[i].used_idx);
-return -1;
+abort();

This is when the switchover finished.  I'm not sure how severe this is and
whether there can be something to remedy - abort() is probably the least we
want to do here, since the admin may not want to crash the whole VM due to
one vring failure on one device.

At this time, the vcpus are still stopped. If these checks fail in
virtio_load(), - 1 will be returned, and the migration will be rolled
back. But virtio_vmstate_change() returns nothing, if we want to
rollback the migration after the checks fail, I think we need abort().

 }
 }
 }

-if (vdc->post_load) {
-ret = vdc->post_load(vdev);
-if (ret) {
-return ret;
-}
-}
-
-return 0;
+return;
 }

 void virtio_cleanup(VirtIODevice *vdev)
@@ -3722,6 +3733,11 @@ static void virtio_vmstate_change(void *opaque,
bool running, RunState state)
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 bool backend_run = running && virtio_device_started(vdev, vdev->status);
+
+if (vdev->delay_check) {
+virtio_load_check_delay(vdev);
+vdev->delay_check = false;
+}
 vdev->vm_running = running;

 if (backend_run) {
@@ -3789,6 +3805,7 @@ void virtio_init(VirtIODevice *vdev, uint16_t
device_id, size_t config_size)
 virtio_vmstate_change, vdev);
 vdev->device_endian = virtio_default_endian();
 vdev->use_guest_notifier_mask = true;
+vdev->delay_check = false;
 }

 /*
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index acfd4df125..269e80d04a 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -135,6 +135,8 @@ struct VirtIODevice
 AddressSpace *dma_as;
 QLIST_HEAD(, VirtQueue) *vector_queues;
 QTAILQ_ENTRY(VirtIODevice) next;
+/* @delay_check: delay checks in virtio_load */
+bool delay_check;

I think it covers more than the check?  It also initializes variables like
used_idx and shadow_avail_idx.  I'm not sure how vital they are, but I'd
just avoid using the word "check" if not sure (e.g. "load_delay", or
"load_finalize"?).

OK. I prefer to use "load_finalize".

Thanks!

 };

 struct VirtioDeviceClass {
-- 
2.20.1


[RFC v3 3/3] migration: reduce time of loading non-iterable vmstate

2022-12-13 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..19785e5a54 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
 uint8_t section_type;
 int ret = 0;
 
+/* call memory_region_transaction_begin() before loading vmstate */
+memory_region_transaction_begin();
+
 retry:
 while (true) {
 section_type = qemu_get_byte(f);
@@ -2684,6 +2687,10 @@ out:
 goto retry;
 }
 }
+
+/* call memory_region_transaction_commit() after loading vmstate */
+memory_region_transaction_commit();
+
 return ret;
 }
 
-- 
2.20.1




[RFC v3 2/3] virtio: support delay of checks in virtio_load()

2022-12-13 Thread Chuang Xu
Delay checks in virtio_load() to avoid possible address_space_to_flatview() call
during memory region's begin/commit.

Signed-off-by: Chuang Xu 
---
 hw/virtio/virtio.c | 37 +++--
 include/hw/virtio/virtio.h |  2 ++
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index eb6347ab5d..f556e565c6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3642,8 +3642,26 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 vdev->start_on_kick = true;
 }
 
+vdev->delay_check = true;
+
+if (vdc->post_load) {
+ret = vdc->post_load(vdev);
+if (ret) {
+return ret;
+}
+}
+
+return 0;
+}
+
+static void virtio_load_check_delay(VirtIODevice *vdev)
+{
 RCU_READ_LOCK_GUARD();
-for (i = 0; i < num; i++) {
+for (int i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+if (vdev->vq[i].vring.num == 0) {
+break;
+}
+
 if (vdev->vq[i].vring.desc) {
 uint16_t nheads;
 
@@ -3696,19 +3714,12 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
  i, vdev->vq[i].vring.num,
  vdev->vq[i].last_avail_idx,
  vdev->vq[i].used_idx);
-return -1;
+abort();
 }
 }
 }
 
-if (vdc->post_load) {
-ret = vdc->post_load(vdev);
-if (ret) {
-return ret;
-}
-}
-
-return 0;
+return;
 }
 
 void virtio_cleanup(VirtIODevice *vdev)
@@ -3722,6 +3733,11 @@ static void virtio_vmstate_change(void *opaque, bool 
running, RunState state)
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 bool backend_run = running && virtio_device_started(vdev, vdev->status);
+
+if (vdev->delay_check) {
+virtio_load_check_delay(vdev);
+vdev->delay_check = false;
+}
 vdev->vm_running = running;
 
 if (backend_run) {
@@ -3789,6 +3805,7 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, 
size_t config_size)
 virtio_vmstate_change, vdev);
 vdev->device_endian = virtio_default_endian();
 vdev->use_guest_notifier_mask = true;
+vdev->delay_check = false;
 }
 
 /*
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index acfd4df125..269e80d04a 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -135,6 +135,8 @@ struct VirtIODevice
 AddressSpace *dma_as;
 QLIST_HEAD(, VirtQueue) *vector_queues;
 QTAILQ_ENTRY(VirtIODevice) next;
+/* @delay_check: delay checks in virtio_load */
+bool delay_check;
 };
 
 struct VirtioDeviceClass {
-- 
2.20.1




[RFC v3 1/3] memory: add depth assert in address_space_to_flatview

2022-12-13 Thread Chuang Xu
Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 1 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..b43cd46084 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+static unsigned memory_region_transaction_depth;
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_depth == 0);
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..f177c40cd8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -37,7 +37,6 @@
 
 //#define DEBUG_UNASSIGNED
 
-static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;
-- 
2.20.1




[RFC v3 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-13 Thread Chuang Xu
Hi!

In this version:

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

Please review, Chuang.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms




Re: [RFC v2 2/3] virtio: support delay of checks in virtio_load()

2022-12-13 Thread Chuang Xu
On 2022/12/13 上午4:18, Peter Xu wrote:

On Tue, Dec 13, 2022 at 12:49:41AM +0800, Chuang Xu wrote:

+bool migration_enable_load_check_delay;

I'm just afraid this is still too hacky.

One thing is because this variable itself to be only set at specific phase
during migration to cover that commit().  The other thing is I'm not sure
we can always rely on the commit() being happen 100% - what if there's no
memory layout changes throughout the whole process of vm load?  That'll be
skipped if memory_region_update_pending==false as I said.

Yes, you're right. I wanted to set memory_region_update_pending to true at
the beginning of qemu_loadvm_state_main(), but somehow I forgot this detail..

So far the best I can come up with is we allow each virtio device to
register a vm state change handler (during virtio_load) to do the rest,
then in the handler it unregisters itself so it only runs once right before
the VM starts.  But I'm not sure whether the virtio developers will be
happy with it.  Maybe worth a try.

Feel free to have a look at like kvmvapic_vm_state_change() if you think
that idea worth exploring.

That's a good idea!

But I don't think it's necessary to register a new vm state change handler.
Maybe we just need to add a delay_check flag to VirtIODevice and do those
delayed checks in virtio_vmstate_change() when delay_check is true.

Later I'll upload the v3 patches.

Thanks!


Re: [RFC v2 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-13 Thread Chuang Xu
On 2022/12/13 上午4:23, Peter Xu wrote:

On Tue, Dec 13, 2022 at 12:49:39AM +0800, Chuang Xu wrote:

Hi!

Chuang,


In this version:

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes.

Since there'll be other changes besides migration, please consider also
copy the relevant maintainers too on either memory and virtio in your next
post:

$ ./scripts/get_maintainer.pl -f softmmu/memory.c -f hw/virtio/virtio.c
Paolo Bonzini   (supporter:Memory API)
Peter Xu   (supporter:Memory API)
David Hildenbrand   (supporter:Memory API)
"Philippe Mathieu-Daudé"  
(reviewer:Memory API)
"Michael S. Tsirkin"  
(supporter:virtio)qemu-devel@nongnu.org (open list:All patches CC
here)



Sorry I forgot to update the cc list..

Thanks for your reminder!


[RFC v2 0/3] migration: reduce time of loading non-iterable vmstate

2022-12-12 Thread Chuang Xu


Hi!

In this version:

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

Please review, Chuang.

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms




[RFC v2 3/3] migration: reduce time of loading non-iterable vmstate

2022-12-12 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

Signed-off-by: Chuang Xu 
---
 migration/savevm.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..68a7a99b79 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2617,6 +2617,9 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis)
 uint8_t section_type;
 int ret = 0;
 
+/* call memory_region_transaction_begin() before loading vmstate */
+memory_region_transaction_begin();
+
 retry:
 while (true) {
 section_type = qemu_get_byte(f);
@@ -2684,6 +2687,16 @@ out:
 goto retry;
 }
 }
+
+/*
+ * call memory_region_transaction_commit() after loading non-iterable
+ * vmstate, make sure the migration_enable_load_check_delay flag is
+ * true during commit.
+ */
+migration_enable_load_check_delay = true;
+memory_region_transaction_commit();
+migration_enable_load_check_delay = false;
+
 return ret;
 }
 
-- 
2.20.1




[RFC v2 1/3] memory: add depth assert in address_space_to_flatview

2022-12-12 Thread Chuang Xu
Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.

Signed-off-by: Chuang Xu 
---
 include/exec/memory.h | 9 +
 softmmu/memory.c  | 1 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..b43cd46084 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
 MemoryRegion *root;
 };
 
+static unsigned memory_region_transaction_depth;
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+/*
+ * Before using any flatview, sanity check we're not during a memory
+ * region transaction or the map can be invalid.  Note that this can
+ * also be called during commit phase of memory transaction, but that
+ * should also only happen when the depth decreases to 0 first.
+ */
+assert(memory_region_transaction_depth == 0);
 return qatomic_rcu_read(>current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..f177c40cd8 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -37,7 +37,6 @@
 
 //#define DEBUG_UNASSIGNED
 
-static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;
-- 
2.20.1




[RFC v2 2/3] virtio: support delay of checks in virtio_load()

2022-12-12 Thread Chuang Xu
Delay checks in virtio_load() to avoid possible address_space_to_flatview() call
during memory region's begin/commit.

Signed-off-by: Chuang Xu 
---
 hw/virtio/virtio.c  | 33 ++---
 include/sysemu/sysemu.h |  1 +
 softmmu/globals.c   |  3 +++
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index eb6347ab5d..3e3fa2a89d 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -33,6 +33,7 @@
 #include "hw/virtio/virtio-access.h"
 #include "sysemu/dma.h"
 #include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
 #include "standard-headers/linux/virtio_ids.h"
 #include "standard-headers/linux/vhost_types.h"
 #include "standard-headers/linux/virtio_blk.h"
@@ -3642,8 +3643,20 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 vdev->start_on_kick = true;
 }
 
+if (vdc->post_load) {
+ret = vdc->post_load(vdev);
+if (ret) {
+return ret;
+}
+}
+
+return 0;
+}
+
+static void virtio_load_check_delay(VirtIODevice *vdev)
+{
 RCU_READ_LOCK_GUARD();
-for (i = 0; i < num; i++) {
+for (int i = 0; i < VIRTIO_QUEUE_MAX; i++) {
 if (vdev->vq[i].vring.desc) {
 uint16_t nheads;
 
@@ -3696,19 +3709,12 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
  i, vdev->vq[i].vring.num,
  vdev->vq[i].last_avail_idx,
  vdev->vq[i].used_idx);
-return -1;
+abort();
 }
 }
 }
 
-if (vdc->post_load) {
-ret = vdc->post_load(vdev);
-if (ret) {
-return ret;
-}
-}
-
-return 0;
+return;
 }
 
 void virtio_cleanup(VirtIODevice *vdev)
@@ -4158,7 +4164,12 @@ static void virtio_memory_listener_commit(MemoryListener 
*listener)
 if (vdev->vq[i].vring.num == 0) {
 break;
 }
-virtio_init_region_cache(vdev, i);
+
+if (migration_enable_load_check_delay) {
+virtio_load_check_delay(vdev);
+} else {
+virtio_init_region_cache(vdev, i);
+}
 }
 }
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 6a7a31e64d..0523091445 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -12,6 +12,7 @@ extern int only_migratable;
 extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
+extern bool migration_enable_load_check_delay;
 
 const char *qemu_get_vm_name(void);
 
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 527edbefdd..1bd8f6c978 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -65,3 +65,6 @@ bool qemu_uuid_set;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_EMULATE;
 bool xen_domid_restrict;
+
+bool migration_enable_load_check_delay;
+
-- 
2.20.1




Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-09 Thread Chuang Xu
Hi, Peter!

This email is a supplement to my previous email 7 hours ago.

On 2022/12/9 上午12:00, Peter Xu wrote:

If the assert will work that'll be even better.  I'm actually worried this
can trigger like what you mentioned in the virtio path, I didn't expect it
comes that soon.  So if there's a minimum cases and we can fixup easily
that'll be great.  Hopefully there aren't so much or we'll need to revisit
the whole idea.

Thanks,


Only delaying virtio_init_region_cache() will result in the failure of the
checks and caches following original virtio_init_region_cache().

Here are the patches related to these checks and cache
operation:https://gitlab.com/qemu-project/qemu/-/commit/1abeb5a65d515f8a8a9cfc4a82342f731bd9321fhttps://gitlab.com/qemu-project/qemu/-/commit/be1fea9bc286f64c6c995bb0d7145a0b738aeddbhttps://gitlab.com/qemu-project/qemu/-/commit/b796fcd1bf2978aed15748db04e054f34789e9ebhttps://gitlab.com/qemu-project/qemu/-/commit/bccdef6b1a204db0f41ffb6e24ce373e4d7890d4

I think I should try to postpone these checks and caches too..

Thanks!


Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-08 Thread Chuang Xu
On 2022/12/9 上午12:00, Peter Xu wrote:

On Thu, Dec 08, 2022 at 10:39:11PM +0800, Chuang Xu wrote:

On 2022/12/8 上午6:08, Peter Xu wrote:

On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:

On 2022/12/6 上午12:28, Peter Xu wrote:

Chuang,

No worry on the delay; you're faster than when I read yours. :)

On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:

As a start, maybe you can try with poison address_space_to_flatview() (by
e.g. checking the start_pack_mr_change flag and assert it is not set)
during this process to see whether any call stack can even try to
dereference a flatview.

It's just that I didn't figure a good way to "prove" its validity, even if
I think this is an interesting idea worth thinking to shrink the downtime.

Thanks for your sugguestions!
I used a thread local variable to identify whether the current thread is a
migration thread(main thread of target qemu) and I modified the code of
qemu_coroutine_switch to make sure the thread local variable true only in
process_incoming_migration_co call stack. If the target qemu detects that
start_pack_mr_change is set and address_space_to_flatview() is called in
non-migrating threads or non-migrating coroutine, it will crash.

Are you using the thread var just to avoid the assert triggering in the
migration thread when commiting memory changes?

I think _maybe_ another cleaner way to sanity check this is directly upon
the depth:

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
  /*
   * Before using any flatview, sanity check we're not during a memory
   * region transaction or the map can be invalid.  Note that this can
   * also be called during commit phase of memory transaction, but that
   * should also only happen when the depth decreases to 0 first.
   */
  assert(memory_region_transaction_depth == 0);
  return qatomic_rcu_read(>current_map);
}

That should also cover the safe cases of memory transaction commits during
migration.


Peter, I tried this way and found that the target qemu will crash.

Here is the gdb backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7ff2929d851a in __GI_abort () at abort.c:118
#2  0x7ff2929cfe67 in __assert_fail_base (fmt=,
assertion=assertion@entry=0x55a32578cdc0
"memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0
"/data00/migration/qemu-5.2.0/include/exec/memory.h",
 line=line@entry=766, function=function@entry=0x55a32578d6e0
<__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at
assert.c:92
#3  0x7ff2929cff12 in __GI___assert_fail
(assertion=assertion@entry=0x55a32578cdc0
"memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0
"/data00/migration/qemu-5.2.0/include/exec/memory.h",
line=line@entry=766,
 function=function@entry=0x55a32578d6e0
<__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at
assert.c:101
#4  0x55a324b2ed5e in address_space_to_flatview (as=0x55a326132580
) at
/data00/migration/qemu-5.2.0/include/exec/memory.h:766
#5  0x55a324e79559 in address_space_to_flatview (as=0x55a326132580
) at ../softmmu/memory.c:811
#6  address_space_get_flatview (as=0x55a326132580
) at ../softmmu/memory.c:805
#7  0x55a324e96474 in address_space_cache_init
(cache=cache@entry=0x55a32a4fb000, as=,
addr=addr@entry=68404985856, len=len@entry=4096, is_write=false) at
../softmmu/physmem.c:3307
#8  0x55a324ea9cba in virtio_init_region_cache
(vdev=0x55a32985d9a0, n=0) at ../hw/virtio/virtio.c:185
#9  0x55a324eaa615 in virtio_load (vdev=0x55a32985d9a0,
f=, version_id=) at
../hw/virtio/virtio.c:3203
#10 0x55a324c6ab96 in vmstate_load_state
(f=f@entry=0x55a329dc0c00, vmsd=0x55a325fc1a60 ,
opaque=0x55a32985d9a0, version_id=1) at ../migration/vmstate.c:143
#11 0x55a324cda138 in vmstate_load (f=0x55a329dc0c00,
se=0x55a329941c90) at ../migration/savevm.c:913
#12 0x55a324cdda34 in qemu_loadvm_section_start_full
(mis=0x55a3284ef9e0, f=0x55a329dc0c00) at ../migration/savevm.c:2741
#13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00,
mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
#14 0x55a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at
../migration/savevm.c:3021
#15 0x55a324d14b4e in process_incoming_migration_co
(opaque=) at ../migration/migration.c:574
#16 0x55a32501ae3b in coroutine_trampoline (i0=,
i1=) at ../util/coroutine-ucontext.c:173
#17 0x7ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x7ffed80dc2a0 in ?? ()
#19 0x in ?? ()

address_space_cache_init() is the only caller of address_space_to_flatview
I can find in vmstate_load call stack so far. Although I think the mr used
by address_space_cache_init() won't be affected by the delay of
memory_region_transaction_commit(), we really need a mechanism to prevent
the modified mr from being used.

Maybe we can build a stale

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-08 Thread Chuang Xu


On 2022/12/8 上午6:08, Peter Xu wrote:

On Thu, Dec 08, 2022 at 12:07:03AM +0800, Chuang Xu wrote:

On 2022/12/6 上午12:28, Peter Xu wrote:

Chuang,

No worry on the delay; you're faster than when I read yours. :)

On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:

As a start, maybe you can try with poison address_space_to_flatview() (by
e.g. checking the start_pack_mr_change flag and assert it is not set)
during this process to see whether any call stack can even try to
dereference a flatview.

It's just that I didn't figure a good way to "prove" its validity, even if
I think this is an interesting idea worth thinking to shrink the downtime.

Thanks for your sugguestions!
I used a thread local variable to identify whether the current thread is a
migration thread(main thread of target qemu) and I modified the code of
qemu_coroutine_switch to make sure the thread local variable true only in
process_incoming_migration_co call stack. If the target qemu detects that
start_pack_mr_change is set and address_space_to_flatview() is called in
non-migrating threads or non-migrating coroutine, it will crash.

Are you using the thread var just to avoid the assert triggering in the
migration thread when commiting memory changes?

I think _maybe_ another cleaner way to sanity check this is directly upon
the depth:

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
  /*
   * Before using any flatview, sanity check we're not during a memory
   * region transaction or the map can be invalid.  Note that this can
   * also be called during commit phase of memory transaction, but that
   * should also only happen when the depth decreases to 0 first.
   */
  assert(memory_region_transaction_depth == 0);
  return qatomic_rcu_read(>current_map);
}

That should also cover the safe cases of memory transaction commits during
migration.


Peter, I tried this way and found that the target qemu will crash.

Here is the gdb backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7ff2929d851a in __GI_abort () at abort.c:118
#2  0x7ff2929cfe67 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 0", 
file=file@entry=0x55a32575d9b0 "/data00/migration/qemu-5.2.0/include/exec/memory.h",
 line=line@entry=766, function=function@entry=0x55a32578d6e0 
<__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
#3  0x7ff2929cff12 in __GI___assert_fail (assertion=assertion@entry=0x55a32578cdc0 
"memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 
"/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
 function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
"address_space_to_flatview") at assert.c:101
#4  0x55a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
) at 
/data00/migration/qemu-5.2.0/include/exec/memory.h:766
#5  0x55a324e79559 in address_space_to_flatview (as=0x55a326132580 
) at ../softmmu/memory.c:811
#6  address_space_get_flatview (as=0x55a326132580 ) at 
../softmmu/memory.c:805
#7  0x55a324e96474 in address_space_cache_init (cache=cache@entry=0x55a32a4fb000, 
as=, addr=addr@entry=68404985856, len=len@entry=4096, 
is_write=false) at ../softmmu/physmem.c:3307
#8  0x55a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, n=0) 
at ../hw/virtio/virtio.c:185
#9  0x55a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=, 
version_id=) at ../hw/virtio/virtio.c:3203
#10 0x55a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
vmsd=0x55a325fc1a60 , opaque=0x55a32985d9a0, version_id=1) 
at ../migration/vmstate.c:143
#11 0x55a324cda138 in vmstate_load (f=0x55a329dc0c00, se=0x55a329941c90) at 
../migration/savevm.c:913
#12 0x55a324cdda34 in qemu_loadvm_section_start_full (mis=0x55a3284ef9e0, 
f=0x55a329dc0c00) at ../migration/savevm.c:2741
#13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
#14 0x55a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
../migration/savevm.c:3021
#15 0x55a324d14b4e in process_incoming_migration_co (opaque=) at ../migration/migration.c:574
#16 0x55a32501ae3b in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:173
#17 0x7ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x7ffed80dc2a0 in ?? ()
#19 0x in ?? ()

address_space_cache_init() is the only caller of address_space_to_flatview
I can find in vmstate_load call stack so far. Although I think the mr used
by address_space_cache_init() won't be affected by the delay of
memory_region_transaction_commit(), we really need a mechanism to prevent
the modified mr from being used.

Maybe we can build a stale list:
If a subregion is added, add its parent to the stale list(consider

Re: [External] Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-07 Thread Chuang Xu


On 2022/12/6 上午12:28, Peter Xu wrote:

Chuang,

No worry on the delay; you're faster than when I read yours. :)

On Mon, Dec 05, 2022 at 02:56:15PM +0800, Chuang Xu wrote:

As a start, maybe you can try with poison address_space_to_flatview() (by
e.g. checking the start_pack_mr_change flag and assert it is not set)
during this process to see whether any call stack can even try to
dereference a flatview.

It's just that I didn't figure a good way to "prove" its validity, even if
I think this is an interesting idea worth thinking to shrink the downtime.

Thanks for your sugguestions!
I used a thread local variable to identify whether the current thread is a
migration thread(main thread of target qemu) and I modified the code of
qemu_coroutine_switch to make sure the thread local variable true only in
process_incoming_migration_co call stack. If the target qemu detects that
start_pack_mr_change is set and address_space_to_flatview() is called in
non-migrating threads or non-migrating coroutine, it will crash.

Are you using the thread var just to avoid the assert triggering in the
migration thread when commiting memory changes?

I think _maybe_ another cleaner way to sanity check this is directly upon
the depth:

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
 /*
  * Before using any flatview, sanity check we're not during a memory
  * region transaction or the map can be invalid.  Note that this can
  * also be called during commit phase of memory transaction, but that
  * should also only happen when the depth decreases to 0 first.
  */
 assert(memory_region_transaction_depth == 0);
 return qatomic_rcu_read(>current_map);
}

That should also cover the safe cases of memory transaction commits during
migration.


Peter, I tried this way and found that the target qemu will crash.

Here is the gdb backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7ff2929d851a in __GI_abort () at abort.c:118
#2  0x7ff2929cfe67 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x55a32578cdc0 "memory_region_transaction_depth == 0", 
file=file@entry=0x55a32575d9b0 "/data00/migration/qemu-5.2.0/include/exec/memory.h",
line=line@entry=766, function=function@entry=0x55a32578d6e0 
<__PRETTY_FUNCTION__.20463> "address_space_to_flatview") at assert.c:92
#3  0x7ff2929cff12 in __GI___assert_fail (assertion=assertion@entry=0x55a32578cdc0 
"memory_region_transaction_depth == 0", file=file@entry=0x55a32575d9b0 
"/data00/migration/qemu-5.2.0/include/exec/memory.h", line=line@entry=766,
function=function@entry=0x55a32578d6e0 <__PRETTY_FUNCTION__.20463> 
"address_space_to_flatview") at assert.c:101
#4  0x55a324b2ed5e in address_space_to_flatview (as=0x55a326132580 
) at 
/data00/migration/qemu-5.2.0/include/exec/memory.h:766
#5  0x55a324e79559 in address_space_to_flatview (as=0x55a326132580 
) at ../softmmu/memory.c:811
#6  address_space_get_flatview (as=0x55a326132580 ) at 
../softmmu/memory.c:805
#7  0x55a324e96474 in address_space_cache_init (cache=cache@entry=0x55a32a4fb000, 
as=, addr=addr@entry=68404985856, len=len@entry=4096, 
is_write=false) at ../softmmu/physmem.c:3307
#8  0x55a324ea9cba in virtio_init_region_cache (vdev=0x55a32985d9a0, n=0) 
at ../hw/virtio/virtio.c:185
#9  0x55a324eaa615 in virtio_load (vdev=0x55a32985d9a0, f=, 
version_id=) at ../hw/virtio/virtio.c:3203
#10 0x55a324c6ab96 in vmstate_load_state (f=f@entry=0x55a329dc0c00, 
vmsd=0x55a325fc1a60 , opaque=0x55a32985d9a0, version_id=1) 
at ../migration/vmstate.c:143
#11 0x55a324cda138 in vmstate_load (f=0x55a329dc0c00, se=0x55a329941c90) at 
../migration/savevm.c:913
#12 0x55a324cdda34 in qemu_loadvm_section_start_full (mis=0x55a3284ef9e0, 
f=0x55a329dc0c00) at ../migration/savevm.c:2741
#13 qemu_loadvm_state_main (f=f@entry=0x55a329dc0c00, 
mis=mis@entry=0x55a3284ef9e0) at ../migration/savevm.c:2939
#14 0x55a324cdf66a in qemu_loadvm_state (f=0x55a329dc0c00) at 
../migration/savevm.c:3021
#15 0x55a324d14b4e in process_incoming_migration_co (opaque=) at ../migration/migration.c:574
#16 0x55a32501ae3b in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:173
#17 0x7ff2929e8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x7ffed80dc2a0 in ?? ()
#19 0x in ?? ()

address_space_cache_init() is the only caller of address_space_to_flatview
I can find in vmstate_load call stack so far. Although I think the mr used
by address_space_cache_init() won't be affected by the delay of
memory_region_transaction_commit(), we really need a mechanism to prevent
the modified mr from being used.

Maybe we can build a stale list:
If a subregion is added, add its parent to the stale list(considering that
new subregion's priority has uncertain effects on flatviews).
If a subregion is deleted, add itself to

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-12-04 Thread Chuang Xu

Peter, I'm sorry I didn't reply to your email in time, because I was busy with
other work last week. Here is my latest progress.

On 2022/11/29 上午1:41, Peter Xu wrote:

On Mon, Nov 28, 2022 at 05:42:43PM +0800, Chuang Xu wrote:

On 2022/11/25 上午12:40, Peter Xu wrote:

On Fri, Nov 18, 2022 at 04:36:48PM +0800, Chuang Xu wrote:

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

Signed-off-by: Chuang Xu

This is an interesting idea..  I think it means at least the address space
operations will all be messed up if happening during the precopy loading

Sorry, I don't quite understand the meaning of "messed up" here.. Maybe I need
more information about how the address space operations will be messed up.

AFAIK the major thing we do during commit of memory regions is to apply the
memory region changes to the rest (flatviews, or ioeventfds), basically it
makes everything matching with the new memory region layout.

If we allow memory region commit to be postponed for the whole loading
process, it means at least from flat view pov any further things like:

   address_space_write(_space_memory, ...)

Could write to wrong places because the flat views are not updated.


I have tested migration on normal qemu and optimized qemu repeatedly,
I haven't trace any other operation on target qemu's mr (such as
address_space_write...) happens so far.


progress, but I don't directly see its happening either.  For example, in
most post_load()s of vmsd I think the devices should just write directly to
its buffers, accessing MRs directly, even if they want DMAs or just update
fields to correct states.  Even so, I'm not super confident that holds

And I'm not sure whether the "its happening" means "begin/commit happening"
or "messed up happening"? If it's the former, Here are what I observe:
the stage of loading iterable vmstate doesn't call begin/commit, but the
stage of loading noniterable vmstate calls a large amount of begin/commit
in field->info->get() operation. For example:

#0  memory_region_transaction_commit () at ../softmmu/memory.c:1085
#1  0x559b6f683523 in pci_update_mappings (d=d@entry=0x7f5cd8682010) at 
../hw/pci/pci.c:1361
#2  0x559b6f683a1f in get_pci_config_device (f=, 
pv=0x7f5cd86820a0, size=256, field=) at ../hw/pci/pci.c:545
#3  0x559b6f5fcd86 in vmstate_load_state (f=f@entry=0x559b757eb4b0, 
vmsd=vmsd@entry=0x559b70909d40 , 
opaque=opaque@entry=0x7f5cd8682010, version_id=2)
 at ../migration/vmstate.c:143
#4  0x559b6f68466f in pci_device_load (s=s@entry=0x7f5cd8682010, 
f=f@entry=0x559b757eb4b0) at ../hw/pci/pci.c:664
#5  0x559b6f6ad38a in virtio_pci_load_config (d=0x7f5cd8682010, 
f=0x559b757eb4b0) at ../hw/virtio/virtio-pci.c:181
#6  0x559b6f7dfe91 in virtio_load (vdev=0x7f5cd868a1a0, f=0x559b757eb4b0, 
version_id=1) at ../hw/virtio/virtio.c:3071
#7  0x559b6f5fcd86 in vmstate_load_state (f=f@entry=0x559b757eb4b0, 
vmsd=0x559b709ae260 , opaque=0x7f5cd868a1a0, 
version_id=1) at ../migration/vmstate.c:143
#8  0x559b6f62da48 in vmstate_load (f=0x559b757eb4b0, se=0x559b7591c010) at 
../migration/savevm.c:913
#9  0x559b6f631334 in qemu_loadvm_section_start_full (mis=0x559b73f1a580, 
f=0x559b757eb4b0) at ../migration/savevm.c:2741
#10 qemu_loadvm_state_main (f=f@entry=0x559b757eb4b0, 
mis=mis@entry=0x559b73f1a580) at ../migration/savevm.c:2937
#11 0x559b6f632faa in qemu_loadvm_state (f=0x559b757eb4b0) at 
../migration/savevm.c:3018
#12 0x559b6f6d2ece in process_incoming_migration_co (opaque=) at ../migration/migration.c:574
#13 0x559b6f9f9f0b in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:173
#14 0x7f5cfeecf000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x7fff04a2e8f0 in ?? ()
#16 0x in ?? ()


true, not to mention any other side effects (e.g., would we release bql
during precopy for any reason?).

Copy Paolo and PeterM for some extra eyes.


What I observe is that during the loading process, migration thread will call 
Condwait to
wait for the vcpu threads to complete tasks, such as kvm_apic_post_load, and 
rcu thread
will acquire the bql to do the flatview_destroy operation. So far, I haven't 
seen the
side effects of these two situations.

Yes that's something I'd worry about.

The current memory API should be

Re: [RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-11-28 Thread Chuang Xu


On 2022/11/25 上午12:40, Peter Xu wrote:

On Fri, Nov 18, 2022 at 04:36:48PM +0800, Chuang Xu wrote:

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

Signed-off-by: Chuang Xu

This is an interesting idea..  I think it means at least the address space
operations will all be messed up if happening during the precopy loading


Sorry, I don't quite understand the meaning of "messed up" here.. Maybe I need
more information about how the address space operations will be messed up.


progress, but I don't directly see its happening either.  For example, in
most post_load()s of vmsd I think the devices should just write directly to
its buffers, accessing MRs directly, even if they want DMAs or just update
fields to correct states.  Even so, I'm not super confident that holds


And I'm not sure whether the "its happening" means "begin/commit happening"
or "messed up happening"? If it's the former, Here are what I observe:
the stage of loading iterable vmstate doesn't call begin/commit, but the
stage of loading noniterable vmstate calls a large amount of begin/commit
in field->info->get() operation. For example:

#0  memory_region_transaction_commit () at ../softmmu/memory.c:1085
#1  0x559b6f683523 in pci_update_mappings (d=d@entry=0x7f5cd8682010) at 
../hw/pci/pci.c:1361
#2  0x559b6f683a1f in get_pci_config_device (f=, 
pv=0x7f5cd86820a0, size=256, field=) at ../hw/pci/pci.c:545
#3  0x559b6f5fcd86 in vmstate_load_state (f=f@entry=0x559b757eb4b0, 
vmsd=vmsd@entry=0x559b70909d40 , 
opaque=opaque@entry=0x7f5cd8682010, version_id=2)
at ../migration/vmstate.c:143
#4  0x559b6f68466f in pci_device_load (s=s@entry=0x7f5cd8682010, 
f=f@entry=0x559b757eb4b0) at ../hw/pci/pci.c:664
#5  0x559b6f6ad38a in virtio_pci_load_config (d=0x7f5cd8682010, 
f=0x559b757eb4b0) at ../hw/virtio/virtio-pci.c:181
#6  0x559b6f7dfe91 in virtio_load (vdev=0x7f5cd868a1a0, f=0x559b757eb4b0, 
version_id=1) at ../hw/virtio/virtio.c:3071
#7  0x559b6f5fcd86 in vmstate_load_state (f=f@entry=0x559b757eb4b0, 
vmsd=0x559b709ae260 , opaque=0x7f5cd868a1a0, 
version_id=1) at ../migration/vmstate.c:143
#8  0x559b6f62da48 in vmstate_load (f=0x559b757eb4b0, se=0x559b7591c010) at 
../migration/savevm.c:913
#9  0x559b6f631334 in qemu_loadvm_section_start_full (mis=0x559b73f1a580, 
f=0x559b757eb4b0) at ../migration/savevm.c:2741
#10 qemu_loadvm_state_main (f=f@entry=0x559b757eb4b0, 
mis=mis@entry=0x559b73f1a580) at ../migration/savevm.c:2937
#11 0x559b6f632faa in qemu_loadvm_state (f=0x559b757eb4b0) at 
../migration/savevm.c:3018
#12 0x559b6f6d2ece in process_incoming_migration_co (opaque=) at ../migration/migration.c:574
#13 0x559b6f9f9f0b in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:173
#14 0x7f5cfeecf000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x7fff04a2e8f0 in ?? ()
#16 0x in ?? ()


true, not to mention any other side effects (e.g., would we release bql
during precopy for any reason?).

Copy Paolo and PeterM for some extra eyes.


What I observe is that during the loading process, migration thread will call 
Condwait to
wait for the vcpu threads to complete tasks, such as kvm_apic_post_load, and 
rcu thread
will acquire the bql to do the flatview_destroy operation. So far, I haven't 
seen the
side effects of these two situations.


---
  migration/migration.c | 1 +
  migration/migration.h | 2 ++
  migration/savevm.c| 8 
  3 files changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index e6f8bc2478..ed20704552 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -224,6 +224,7 @@ void migration_object_init(void)
  qemu_sem_init(_incoming->postcopy_pause_sem_fast_load, 0);
  qemu_mutex_init(_incoming->page_request_mutex);
  current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
+current_incoming->start_pack_mr_change = false;
  
  migration_object_check(current_migration, _fatal);
  
diff --git a/migration/migration.h b/migration/migration.h

index 58b245b138..86597f5feb 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -186,6 +186,8 @@ struct MigrationIncomingState {
   * contains valid information.
   */
  Qemu

[RFC PATCH] migration: reduce time of loading non-iterable vmstate

2022-11-18 Thread Chuang Xu
The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

time of loading non-iterable vmstate
before  about 210 ms
after   about 40 ms

Signed-off-by: Chuang Xu 
---
 migration/migration.c | 1 +
 migration/migration.h | 2 ++
 migration/savevm.c| 8 
 3 files changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index e6f8bc2478..ed20704552 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -224,6 +224,7 @@ void migration_object_init(void)
 qemu_sem_init(_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
+current_incoming->start_pack_mr_change = false;
 
 migration_object_check(current_migration, _fatal);
 
diff --git a/migration/migration.h b/migration/migration.h
index 58b245b138..86597f5feb 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -186,6 +186,8 @@ struct MigrationIncomingState {
  * contains valid information.
  */
 QemuMutex page_request_mutex;
+
+bool start_pack_mr_change;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c..a073009a74 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2630,6 +2630,12 @@ retry:
 switch (section_type) {
 case QEMU_VM_SECTION_START:
 case QEMU_VM_SECTION_FULL:
+/* call memory_region_transaction_begin() before loading 
non-iterable vmstate */
+if (section_type == QEMU_VM_SECTION_FULL && 
!mis->start_pack_mr_change) {
+memory_region_transaction_begin();
+mis->start_pack_mr_change = true;
+}
+
 ret = qemu_loadvm_section_start_full(f, mis);
 if (ret < 0) {
 goto out;
@@ -2650,6 +2656,8 @@ retry:
 }
 break;
 case QEMU_VM_EOF:
+/* call memory_region_transaction_commit() after loading 
non-iterable vmstate */
+memory_region_transaction_commit();
 /* This is the end of migration */
 goto out;
 default:
-- 
2.20.1




Re: [PATCH v7 10/12] multifd: Support for zero pages transmission

2022-10-25 Thread chuang xu



On 2022/8/2 下午2:39, Juan Quintela wrote:

This patch adds counters and similar.  Logic will be added on the
following patch.

Signed-off-by: Juan Quintela 

---

Added counters for duplicated/non duplicated pages.
Removed reviewed by from David.
Add total_zero_pages
---
  migration/multifd.h| 17 -
  migration/multifd.c| 36 +---
  migration/ram.c|  2 --
  migration/trace-events |  8 
  4 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index cd389d18d2..a1b852200d 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -47,7 +47,10 @@ typedef struct {
  /* size of the next packet that contains pages */
  uint32_t next_packet_size;
  uint64_t packet_num;
-uint64_t unused[4];/* Reserved for future use */
+/* zero pages */
+uint32_t zero_pages;
+uint32_t unused32[1];/* Reserved for future use */
+uint64_t unused64[3];/* Reserved for future use */
  char ramblock[256];
  uint64_t offset[];
  } __attribute__((packed)) MultiFDPacket_t;
@@ -127,6 +130,8 @@ typedef struct {
  uint64_t num_packets;
  /* non zero pages sent through this channel */
  uint64_t total_normal_pages;
+/* zero pages sent through this channel */
+uint64_t total_zero_pages;
  /* buffers to send */
  struct iovec *iov;
  /* number of iovs used */
@@ -135,6 +140,10 @@ typedef struct {
  ram_addr_t *normal;
  /* num of non zero pages */
  uint32_t normal_num;
+/* Pages that are  zero */
+ram_addr_t *zero;
+/* num of zero pages */
+uint32_t zero_num;
  /* used for compression methods */
  void *data;
  }  MultiFDSendParams;
@@ -184,12 +193,18 @@ typedef struct {
  uint8_t *host;
  /* non zero pages recv through this channel */
  uint64_t total_normal_pages;
+/* zero pages recv through this channel */
+uint64_t total_zero_pages;
  /* buffers to recv */
  struct iovec *iov;
  /* Pages that are not zero */
  ram_addr_t *normal;
  /* num of non zero pages */
  uint32_t normal_num;
+/* Pages that are  zero */
+ram_addr_t *zero;
+/* num of zero pages */
+uint32_t zero_num;
  /* used for de-compression methods */
  void *data;
  } MultiFDRecvParams;
diff --git a/migration/multifd.c b/migration/multifd.c
index 68fc9f8e88..4473d9f834 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -263,6 +263,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
  packet->normal_pages = cpu_to_be32(p->normal_num);
  packet->next_packet_size = cpu_to_be32(p->next_packet_size);
  packet->packet_num = cpu_to_be64(p->packet_num);
+packet->zero_pages = cpu_to_be32(p->zero_num);
  
  if (p->pages->block) {

  strncpy(packet->ramblock, p->pages->block->idstr, 256);
@@ -323,7 +324,15 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
  p->next_packet_size = be32_to_cpu(packet->next_packet_size);
  p->packet_num = be64_to_cpu(packet->packet_num);
  
-if (p->normal_num == 0) {

+p->zero_num = be32_to_cpu(packet->zero_pages);
+if (p->zero_num > packet->pages_alloc - p->normal_num) {
+error_setg(errp, "multifd: received packet "
+   "with %u zero pages and expected maximum pages are %u",
+   p->zero_num, packet->pages_alloc - p->normal_num) ;
+return -1;
+}
+
+if (p->normal_num == 0 && p->zero_num == 0) {
  return 0;
  }
  
@@ -432,6 +441,8 @@ static int multifd_send_pages(QEMUFile *f)

  ram_counters.multifd_bytes += p->sent_bytes;
  qemu_file_acct_rate_limit(f, p->sent_bytes);
  p->sent_bytes = 0;
+ram_counters.normal += p->normal_num;
+ram_counters.duplicate += p->zero_num;
  qemu_mutex_unlock(>mutex);
  qemu_sem_post(>sem);
  
@@ -545,6 +556,8 @@ void multifd_save_cleanup(void)

  p->iov = NULL;
  g_free(p->normal);
  p->normal = NULL;
+g_free(p->zero);
+p->zero = NULL;
  multifd_send_state->ops->send_cleanup(p, _err);
  if (local_err) {
  migrate_set_error(migrate_get_current(), local_err);
@@ -666,6 +679,7 @@ static void *multifd_send_thread(void *opaque)
  qemu_mutex_unlock(>mutex);
  
  p->normal_num = 0;

+p->zero_num = 0;
  
  if (use_zero_copy_send) {

  p->iovs_num = 0;
@@ -687,8 +701,8 @@ static void *multifd_send_thread(void *opaque)
  }
  multifd_send_fill_packet(p);
  
-trace_multifd_send(p->id, packet_num, p->normal_num, p->flags,

-   p->next_packet_size);
+trace_multifd_send(p->id, packet_num, p->normal_num, p->zero_num,
+   p->flags, p->next_packet_size);
  
  if (use_zero_copy_send) {

  /* 

Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-06-15 Thread chuang xu


On 2022/6/14 下午10:14, Dr. David Alan Gilbert wrote:

I don't think we can tell which one of them triggered the error; so the
only thing I can suggest is that we document the need for optmem_max
setting; I wonder how we get a better answer than 'a few 100KB'?
I guess it's something like the number of packets inflight *
sizeof(cmsghdr) ?

Dave


Three cases with errno ENOBUFS are described in the official 
doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html):


1.The socket option was not set

2.The socket exceeds its optmem limit

3.The user exceeds its ulimit on locked pages

For case 1, if the code logic is correct, this possibility can be ignored.

For case 2, I asked a kernel developer about the reason for "a few 
100KB". He said that the recommended value should be for the purpose of 
improving the performance of zero_copy send. If the NICsends data slower 
than the data generation speed, even if optmem is set to 100KB, there is 
a probability that sendmsg returns with errno ENOBUFS.


For case 3, If I do not set max locked memory for the qemu, the max 
locked memory will be unlimited. I set the max locked memory for qemu 
and found that once the memory usage exceeds the max locked memory, oom 
will occur.  Does this mean that sendmsg cannot return with errno 
ENOBUFS at all when user exceeds its ulimit on locked pages?


If the above is true, can we take the errno as the case 2?

I modified the code logic to call sendmsg again when the errno is 
ENOBUFS and set optmem to the initial 20KB(echo 20480 > 
/proc/sys/net/core/optmem_max), now the multifd zero_copy migration goes 
well.


Here are the changes I made to the code:


Signed-off-by: chuang xu 
---
 io/channel-socket.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index dc9c165de1..9267f55a1d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -595,9 +595,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
*ioc,

 #ifdef QEMU_MSG_ZEROCOPY
 case ENOBUFS:
 if (sflags & MSG_ZEROCOPY) {
-    error_setg_errno(errp, errno,
- "Process can't lock enough memory for 
using MSG_ZEROCOPY");

-    return -1;
+    goto retry;
 }
 break;
 #endif
--

Dave, what's your take?

Best Regards,

chuang xu


Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-06-14 Thread chuang xu


On 2022/5/13 下午2:28, Leonardo Bras wrote:

@@ -557,15 +578,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
  memcpy(CMSG_DATA(cmsg), fds, fdsize);
  }
  
+#ifdef QEMU_MSG_ZEROCOPY

+if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+sflags = MSG_ZEROCOPY;
+}
+#endif
+
   retry:
-ret = sendmsg(sioc->fd, , 0);
+ret = sendmsg(sioc->fd, , sflags);
  if (ret <= 0) {
-if (errno == EAGAIN) {
+switch (errno) {
+case EAGAIN:
  return QIO_CHANNEL_ERR_BLOCK;
-}
-if (errno == EINTR) {
+case EINTR:
  goto retry;
+#ifdef QEMU_MSG_ZEROCOPY
+case ENOBUFS:
+if (sflags & MSG_ZEROCOPY) {
+error_setg_errno(errp, errno,
+ "Process can't lock enough memory for using 
MSG_ZEROCOPY");
+return -1;
+}
+break;
+#endif
  }
+
  error_setg_errno(errp, errno,
   "Unable to write to socket");
  return -1;


Hi, Leo.

There are some other questions I would like to discuss with you.

I tested the multifd zero_copy migration and found that sometimes even 
if max locked memory of qemu was set to 16GB(much greater than 
`MULTIFD_PACKET_SIZE`), the error "Process can't lock enough memory for 
using MSG_ZEROCOPY" would still be reported.


I noticed that the 
doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html) 
says "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
if the socket option was not set, _the socket exceeds its optmem limit_ 
or the user exceeds its ulimit on locked pages."


I also found that the RFC(https://lwn.net/Articles/715279/) says _"__The 
change to allocate notification skbuffs from optmem requires__ensuring 
that net.core.optmem is at least a few 100KB."_


On my host,  optmem was initially set to 20KB, I tried to change it to 
100KB (echo 102400 > /proc/sys/net/core/optmem_max) as the RFC says.Then 
I tested the multifd zero_copy migration repeatedly,and the error 
disappeared.


So when sendmsg returns -1 with errno ENOBUFS, should we distinguish 
between error ''socket exceeds optmem limit" and error "user exceeds 
ulimit on locked pages"? Or is there any better way to avoid this problem?


Best Regards,

chuang xu


Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-06-08 Thread chuang xu



On 2022/6/8 下午1:24, Leonardo Bras Soares Passos wrote:

I will send a fix shortly.
Is that ok if I include a "Reported-by:  徐闯
" in the patch?


okay.

Best Regards,

chuang xu