[PATCH 0/2] Make coroutine annotations ready for static analysis
Clang has a generic __annotate__ attribute that can be used by static analyzers to understand properties of functions and analyze the control flow. Unlike TSA annotations, the __annotate__ attribute applies to function pointers as well, which is very fortunate because many BlockDriver function driver run in coroutines. Paolo Alberto Faria (2): coroutine: annotate coroutine_fn for libclang block: Add no_coroutine_fn and coroutine_mixed_fn marker include/block/block-common.h | 11 +++ include/qemu/coroutine.h | 37 2 files changed, 44 insertions(+), 4 deletions(-) -- 2.38.1
[PATCH 2/2] block: Add no_coroutine_fn and coroutine_mixed_fn marker
From: Alberto Faria Add more annotations to functions, describing valid and invalid calls from coroutine to non-coroutine context. When applied to a function, no_coroutine_fn advertises that it should not be called from coroutine_fn functions. This can be because the function blocks or, in the case of generated_co_wrapper, to enforce that coroutine_fn functions directly call the coroutine_fn that backs the generated_co_wrapper. coroutine_mixed_fn instead is for function that can be called in both coroutine and non-coroutine context, but will suspend when called in coroutine context. Annotating them is a first step towards enforcing that non-annotated functions are absolutely not going to suspend. These can be used for example with the vrc tool from https://github.com/bonzini/vrc: # find functions that *really* cannot be called from no_coroutine_fn (vrc) load --loader clang libblock.fa.p/meson-generated_.._block_block-gen.c.o # The comma is an "AND". The "path" here consists of a single node (vrc) paths [no_coroutine_fn,!coroutine_mixed_fn] bdrv_remove_persistent_dirty_bitmap bdrv_create bdrv_can_store_new_dirty_bitmap # find how coroutine_fns end up calling a mixed function (vrc) load --loader clang --force libblock.fa.p/*.c.o # regular expression search (vrc) paths [coroutine_fn] [!no_coroutine_fn]* [coroutine_mixed_fn] ... bdrv_pread <- vhdx_log_write <- vhdx_log_write_and_flush <- vhdx_co_writev ... Signed-off-by: Alberto Faria [Rebase, add coroutine_mixed_fn. - Paolo] Signed-off-by: Paolo Bonzini --- include/block/block-common.h | 11 +++ include/qemu/coroutine.h | 33 + 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/include/block/block-common.h b/include/block/block-common.h index 4749c46a5e7e..cce79bd00135 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -50,11 +50,14 @@ * - co_wrapper_mixed_bdrv_rdlock are co_wrapper_mixed functions but * automatically take and release the graph rdlock when creating a new * coroutine. + * + * These functions should not be called from a coroutine_fn; instead, + * call the wrapped function directly. */ -#define co_wrapper -#define co_wrapper_mixed -#define co_wrapper_bdrv_rdlock -#define co_wrapper_mixed_bdrv_rdlock +#define co_wrapper no_coroutine_fn +#define co_wrapper_mixed no_coroutine_fn coroutine_mixed_fn +#define co_wrapper_bdrv_rdlock no_coroutine_fn +#define co_wrapper_mixed_bdrv_rdlock no_coroutine_fn coroutine_mixed_fn #include "block/dirty-bitmap.h" #include "block/blockjob.h" diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index b0c97f6fb7ad..5f5ab8136a3a 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -28,6 +28,27 @@ * These functions are re-entrant and may be used outside the global mutex. */ +/** + * Mark a function that can suspend when executed in coroutine context, + * but can handle running in non-coroutine context too. + * + * Functions that execute in coroutine context cannot be called directly from + * normal functions. In the future it would be nice to enable compiler or + * static checker support for catching such errors. This annotation might make + * it possible and in the meantime it serves as documentation. + * + * For example: + * + * static void coroutine_fn foo(void) { + * + * } + */ +#ifdef __clang__ +#define coroutine_mixed_fn __attribute__((__annotate__("coroutine_mixed_fn"))) +#else +#define coroutine_mixed_fn +#endif + /** * Mark a function that executes in coroutine context * @@ -48,6 +69,18 @@ #define coroutine_fn #endif +/** + * Mark a function that should never be called from a coroutine context + * + * This typically means that there is an analogous, coroutine_fn function that + * should be used instead. + */ +#ifdef __clang__ +#define no_coroutine_fn __attribute__((__annotate__("no_coroutine_fn"))) +#else +#define no_coroutine_fn +#endif + typedef struct Coroutine Coroutine; /** -- 2.38.1
[PATCH 1/2] coroutine: annotate coroutine_fn for libclang
From: Alberto Faria Clang has a generic __annotate__ attribute that can be used by static analyzers to understand properties of functions and analyze the control flow. Furthermore, unlike TSA annotations, the __annotate__ attribute applies to function pointers as well. As a first step towards static analysis of coroutine_fn markers, attach the attribute to the marker when compiling with clang. Signed-off-by: Alberto Faria Signed-off-by: Paolo Bonzini --- include/qemu/coroutine.h | 4 1 file changed, 4 insertions(+) diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index 89650a2d7fab..b0c97f6fb7ad 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -42,7 +42,11 @@ * * } */ +#ifdef __clang__ +#define coroutine_fn __attribute__((__annotate__("coroutine_fn"))) +#else #define coroutine_fn +#endif typedef struct Coroutine Coroutine; -- 2.38.1
Re: [PULL 00/19] Next 8.0 patches
On Thu, 15 Dec 2022 at 09:39, Juan Quintela wrote: > > The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: > > mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) > > are available in the Git repository at: > > https://gitlab.com/juan.quintela/qemu.git tags/next-8.0-pull-request > > for you to fetch changes up to 7f401b80445e8746202a6d643410ba1b9eeb3cb1: > > migration: Drop rs->f (2022-12-15 10:30:37 +0100) > > > Migration patches for 8.0 > > Hi > > This are the patches that I had to drop form the last PULL request because > they werent fixes: > - AVX2 is dropped, intel posted a fix, I have to redo it > - Fix for out of order channels is out > Daniel nacked it and I need to redo it > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0 for any user-visible changes. -- PMM
[PULL v3 00/50] Block layer patches
The following changes since commit 48804eebd4a327e4b11f902ba80a00876ee53a43: Merge tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru into staging (2022-12-15 10:13:46 +) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to 1b3ff9feb942c2ad0b01ac931e99ad451ab0ef39: block: GRAPH_RDLOCK for functions only called by co_wrappers (2022-12-15 16:08:23 +0100) v3: - Dropped "configure: Enable -Wthread-safety if present" because FreeBSD has TSA annotations in its pthread locking functions, so we would have to annotate the use of every lock in QEMU first before we can enable it. v2: - Changed TSA capability name to "mutex" to work with older clang versions. The tsan-build CI job succeeds now. Block layer patches - Code cleanups around block graph modification - Simplify drain - coroutine_fn correctness fixes, including splitting generated coroutine wrappers into co_wrapper (to be called only from non-coroutine context) and co_wrapper_mixed (both coroutine and non-coroutine context) - Introduce a block graph rwlock Emanuele Giuseppe Esposito (21): block-io: introduce coroutine_fn duplicates for bdrv_common_block_status_above callers block-copy: add coroutine_fn annotations nbd/server.c: add coroutine_fn annotations block-backend: replace bdrv_*_above with blk_*_above block/vmdk: add coroutine_fn annotations block: avoid duplicating filename string in bdrv_create block: distinguish between bdrv_create running in coroutine and not block: bdrv_create_file is a coroutine_fn block: rename generated_co_wrapper in co_wrapper_mixed block-coroutine-wrapper.py: introduce co_wrapper block-coroutine-wrapper.py: support functions without bs arg block-coroutine-wrapper.py: support also basic return types block: convert bdrv_create to co_wrapper block/dirty-bitmap: convert coroutine-only functions to co_wrapper graph-lock: Implement guard macros async: Register/unregister aiocontext in graph lock list block: wrlock in bdrv_replace_child_noperm block: remove unnecessary assert_bdrv_graph_writable() block: assert that graph read and writes are performed correctly block-coroutine-wrapper.py: introduce annotations that take the graph rdlock block: use co_wrapper_mixed_bdrv_rdlock in functions taking the rdlock Kevin Wolf (24): qed: Don't yield in bdrv_qed_co_drain_begin() test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end() block: Revert .bdrv_drained_begin/end to non-coroutine_fn block: Remove drained_end_counter block: Inline bdrv_drain_invoke() block: Fix locking for bdrv_reopen_queue_child() block: Drain individual nodes during reopen block: Don't use subtree drains in bdrv_drop_intermediate() stream: Replace subtree drain with a single node drain block: Remove subtree drains block: Call drain callbacks only once block: Remove ignore_bds_parents parameter from drain_begin/end. block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() block: Don't poll in bdrv_replace_child_noperm() block: Remove poll parameter from bdrv_parent_drained_begin_single() block: Factor out bdrv_drain_all_begin_nopoll() Import clang-tsa.h clang-tsa: Add TSA_ASSERT() macro clang-tsa: Add macros for shared locks test-bdrv-drain: Fix incorrrect drain assumptions block: Fix locking in external_snapshot_prepare() graph-lock: TSA annotations for lock/unlock functions Mark assert_bdrv_graph_readable/writable() GRAPH_RD/WRLOCK block: GRAPH_RDLOCK for functions only called by co_wrappers Paolo Bonzini (1): graph-lock: Introduce a lock to protect block graph operations Vladimir Sementsov-Ogievskiy (4): block: Inline bdrv_detach_child() block: drop bdrv_remove_filter_or_cow_child block: bdrv_refresh_perms(): allow external tran block: refactor bdrv_list_refresh_perms to allow any list of nodes docs/devel/block-coroutine-wrapper.rst | 6 +- block/block-gen.h | 11 +- block/coroutines.h | 21 +- include/block/aio.h| 9 + include/block/block-common.h | 27 ++- include/block/block-copy.h | 5 +- include/block/block-global-state.h | 15 +- include/block/block-io.h | 136 +-- include/block/block_int-common.h | 49 ++-- include/block/block_int-global-state.h | 17 -- include/block/block_int-io.h | 12 - include/block/block_int.h | 1 + include/block/dirty-bitmap.h | 10 +- include/block/graph-lock.h | 280 +++ include/qemu/c
Re: [PULL v2 00/51] Block layer patches
Am 15.12.2022 um 15:44 hat Peter Maydell geschrieben: > On Thu, 15 Dec 2022 at 11:59, Kevin Wolf wrote: > > > > The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: > > > > mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) > > > > are available in the Git repository at: > > > > https://repo.or.cz/qemu/kevin.git tags/for-upstream > > > > for you to fetch changes up to 347fe9e156a3e00c40ae1802978276a1f7d5545f: > > > > block: GRAPH_RDLOCK for functions only called by co_wrappers (2022-12-15 > > 10:11:45 +0100) > > > > v2: > > - Changed TSA capability name to "mutex" to work with older clang > > versions. The tsan-build CI job succeeds now. > > > > > > Block layer patches > > > > - Code cleanups around block graph modification > > - Simplify drain > > - coroutine_fn correctness fixes, including splitting generated > > coroutine wrappers into co_wrapper (to be called only from > > non-coroutine context) and co_wrapper_mixed (both coroutine and > > non-coroutine context) > > - Introduce a block graph rwlock > > This fails to compile on the FreeBSD 12 and 13 jobs: > https://gitlab.com/qemu-project/qemu/-/jobs/3479763741 > https://gitlab.com/qemu-project/qemu/-/jobs/3479763746 > > The compiler is producing -Wthread-safety-analysis > warnings on code in qemu-thread-posix.c, which are a > compile failure because of -Werror. Hmm... FreeBSD actually annotates it pthread locking functions for TSA, so all callers need to be annotated as well. I guess it's nice in theory, but hard to enable for a huge codebase like QEMU... I'll just drop "configure: Enable -Wthread-safety if present" for now. Maybe we can have a configure check later to enable it by default on glibc at least. Or we really need to go through all locks in QEMU and annotate them properly. This might be a bit too painful, though, so we may end up leaving FreeBSD unchecked even if that seems to be the OS to care most about it... Kevin
Re: [PULL v2 00/51] Block layer patches
On Thu, 15 Dec 2022 at 11:59, Kevin Wolf wrote: > > The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: > > mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) > > are available in the Git repository at: > > https://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to 347fe9e156a3e00c40ae1802978276a1f7d5545f: > > block: GRAPH_RDLOCK for functions only called by co_wrappers (2022-12-15 > 10:11:45 +0100) > > v2: > - Changed TSA capability name to "mutex" to work with older clang > versions. The tsan-build CI job succeeds now. > > > Block layer patches > > - Code cleanups around block graph modification > - Simplify drain > - coroutine_fn correctness fixes, including splitting generated > coroutine wrappers into co_wrapper (to be called only from > non-coroutine context) and co_wrapper_mixed (both coroutine and > non-coroutine context) > - Introduce a block graph rwlock This fails to compile on the FreeBSD 12 and 13 jobs: https://gitlab.com/qemu-project/qemu/-/jobs/3479763741 https://gitlab.com/qemu-project/qemu/-/jobs/3479763746 The compiler is producing -Wthread-safety-analysis warnings on code in qemu-thread-posix.c, which are a compile failure because of -Werror. thanks -- PMM
Re: [PATCH] blkdebug: ignore invalid rules in non-coroutine context
Am 15.12.2022 um 14:02 hat Paolo Bonzini geschrieben: > blkdebug events can be called from either non-coroutine or coroutine > contexts. However, suspend actions only make sense from within > a coroutine. Currently, using those action would lead to an abort() in > qemu_coroutine_yield() ("Co-routine is yielding to no one"). Catch them > and print an error instead. > > Signed-off-by: Paolo Bonzini > --- > block.c | 2 +- > block/blkdebug.c | 10 -- > include/block/block-io.h | 2 +- > include/block/block_int-common.h | 3 ++- > 4 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/block.c b/block.c > index 3f2bd128570e..49c66475c73e 100644 > --- a/block.c > +++ b/block.c > @@ -6334,7 +6334,7 @@ BlockStatsSpecific > *bdrv_get_specific_stats(BlockDriverState *bs) > return drv->bdrv_get_specific_stats(bs); > } > > -void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event) > +void coroutine_mixed_fn bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent > event) coroutine_mixed_fn isn't a thing. I assume this depends on some patch you haven't sent yet? Kevin
[PATCH] qemu-io: do not reinvent the blk_pwrite_zeroes wheel
qemu-io's do_co_pwrite_zeroes is reinventing the coroutine wrapper blk_pwrite_zeroes. Just use the real thing directly. Signed-off-by: Paolo Bonzini --- qemu-io-cmds.c | 55 +- 1 file changed, 9 insertions(+), 46 deletions(-) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 952dc940f1df..7a412d6512fb 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -572,54 +572,17 @@ static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset, return 1; } -typedef struct { -BlockBackend *blk; -int64_t offset; -int64_t bytes; -int64_t *total; -int flags; -int ret; -bool done; -} CoWriteZeroes; - -static void coroutine_fn co_pwrite_zeroes_entry(void *opaque) -{ -CoWriteZeroes *data = opaque; - -data->ret = blk_co_pwrite_zeroes(data->blk, data->offset, data->bytes, - data->flags); -data->done = true; -if (data->ret < 0) { -*data->total = data->ret; -return; -} - -*data->total = data->bytes; -} - -static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset, +static int do_pwrite_zeroes(BlockBackend *blk, int64_t offset, int64_t bytes, int flags, int64_t *total) { -Coroutine *co; -CoWriteZeroes data = { -.blk= blk, -.offset = offset, -.bytes = bytes, -.total = total, -.flags = flags, -.done = false, -}; +int ret = blk_pwrite_zeroes(blk, offset, bytes, +flags | BDRV_REQ_ZERO_WRITE); -co = qemu_coroutine_create(co_pwrite_zeroes_entry, &data); -bdrv_coroutine_enter(blk_bs(blk), co); -while (!data.done) { -aio_poll(blk_get_aio_context(blk), true); -} -if (data.ret < 0) { -return data.ret; -} else { -return 1; +if (ret < 0) { +return ret; } +*total = bytes; +return 1; } static int do_write_compressed(BlockBackend *blk, char *buf, int64_t offset, @@ -1042,7 +1005,7 @@ static void write_help(void) " -C, -- report statistics in a machine parsable format\n" " -q, -- quiet mode, do not show I/O statistics\n" " -u, -- with -z, allow unmapping\n" -" -z, -- write zeroes using blk_co_pwrite_zeroes\n" +" -z, -- write zeroes using blk_pwrite_zeroes\n" "\n"); } @@ -1199,7 +1162,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv) if (bflag) { ret = do_save_vmstate(blk, buf, offset, count, &total); } else if (zflag) { -ret = do_co_pwrite_zeroes(blk, offset, count, flags, &total); +ret = do_pwrite_zeroes(blk, offset, count, flags, &total); } else if (cflag) { ret = do_write_compressed(blk, buf, offset, count, &total); } else { -- 2.38.1
[PATCH] blkdebug: ignore invalid rules in non-coroutine context
blkdebug events can be called from either non-coroutine or coroutine contexts. However, suspend actions only make sense from within a coroutine. Currently, using those action would lead to an abort() in qemu_coroutine_yield() ("Co-routine is yielding to no one"). Catch them and print an error instead. Signed-off-by: Paolo Bonzini --- block.c | 2 +- block/blkdebug.c | 10 -- include/block/block-io.h | 2 +- include/block/block_int-common.h | 3 ++- 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/block.c b/block.c index 3f2bd128570e..49c66475c73e 100644 --- a/block.c +++ b/block.c @@ -6334,7 +6334,7 @@ BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs) return drv->bdrv_get_specific_stats(bs); } -void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event) +void coroutine_mixed_fn bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event) { IO_CODE(); if (!bs || !bs->drv || !bs->drv->bdrv_debug_event) { diff --git a/block/blkdebug.c b/block/blkdebug.c index 4265ca125e25..ce297961b7db 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -31,6 +31,7 @@ #include "block/qdict.h" #include "qemu/module.h" #include "qemu/option.h" +#include "qemu/error-report.h" #include "qapi/qapi-visit-block-core.h" #include "qapi/qmp/qdict.h" #include "qapi/qmp/qlist.h" @@ -837,7 +838,7 @@ static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, } } -static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) +static void coroutine_mixed_fn blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) { BDRVBlkdebugState *s = bs->opaque; struct BlkdebugRule *rule, *next; @@ -855,7 +856,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) } while (actions_count[ACTION_SUSPEND] > 0) { -qemu_coroutine_yield(); +if (qemu_in_coroutine()) { +qemu_coroutine_yield(); +} else { +error_report("Non-coroutine event %s cannot suspend\n", + BlkdebugEvent_lookup.array[event]); +} actions_count[ACTION_SUSPEND]--; } } diff --git a/include/block/block-io.h b/include/block/block-io.h index 1fa717a545a0..0e7032a23936 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -175,7 +175,7 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size); void bdrv_enable_copy_on_read(BlockDriverState *bs); void bdrv_disable_copy_on_read(BlockDriverState *bs); -void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event); +void coroutine_mixed_fn bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event); #define BLKDBG_EVENT(child, evt) \ do { \ diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index c34c525fa6ba..1d4fd5094a5b 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -726,7 +726,8 @@ struct BlockDriver { int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_check)( BlockDriverState *bs, BdrvCheckResult *result, BdrvCheckMode fix); -void (*bdrv_debug_event)(BlockDriverState *bs, BlkdebugEvent event); +void coroutine_mixed_fn (*bdrv_debug_event)(BlockDriverState *bs, +BlkdebugEvent event); /* io queue for linux-aio */ void (*bdrv_io_plug)(BlockDriverState *bs); -- 2.38.1
[PATCH] block: remove bdrv_coroutine_enter
It has only one caller---inline it and remove the function. Signed-off-by: Paolo Bonzini --- block.c | 6 -- block/block-backend.c| 2 +- include/block/block-io.h | 5 - 3 files changed, 1 insertion(+), 12 deletions(-) diff --git a/block.c b/block.c index 9c2ac757e495..3f2bd128570e 100644 --- a/block.c +++ b/block.c @@ -7177,12 +7177,6 @@ void coroutine_fn bdrv_co_unlock(BlockDriverState *bs) } } -void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co) -{ -IO_CODE(); -aio_co_enter(bdrv_get_aio_context(bs), co); -} - static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban) { GLOBAL_STATE_CODE(); diff --git a/block/block-backend.c b/block/block-backend.c index 2852a892de6c..a3e7901f291e 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1555,7 +1555,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); -bdrv_coroutine_enter(blk_bs(blk), co); +aio_co_enter(blk_get_aio_context(blk), co); acb->has_returned = true; if (acb->rwco.ret != NOT_DONE) { diff --git a/include/block/block-io.h b/include/block/block-io.h index 2ed6214909d8..1fa717a545a0 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -210,11 +210,6 @@ AioContext *coroutine_fn bdrv_co_enter(BlockDriverState *bs); */ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx); -/** - * Transfer control to @co in the aio context of @bs - */ -void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co); - AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c); void bdrv_io_plug(BlockDriverState *bs); -- 2.38.1
Re: [PATCH 00/14] block: Move more functions to coroutines
Am 13/12/2022 um 09:53 schrieb Kevin Wolf: > This series converts some IO_CODE() functions to coroutine_fn because > they access the graph and will need to hold the graph lock in the > future. IO_CODE() functions can be called from iothreads, so taking the > graph lock requires the function to run in coroutine context. > > Pretty much all of the changes in this series were posted by Emanuele > before as part of "Protect the block layer with a rwlock: part 3". The > major difference is that in the old version, the patches did two things > at once: Converting functions to coroutine_fn, and adding the locking to > them. This series does only the coroutine conversion. The locking part > will be in another series which now comes with TSA annotations and makes > the locking related changes big enough to have separate patches. > Reviewed-by: Emanuele Giuseppe Esposito > Emanuele Giuseppe Esposito (14): > block-coroutine-wrapper: support void functions > block: Convert bdrv_io_plug() to co_wrapper > block: Convert bdrv_io_unplug() to co_wrapper > block: Rename refresh_total_sectors to bdrv_refresh_total_sectors > block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed > block-backend: use bdrv_getlength instead of blk_getlength > block: use bdrv_co_refresh_total_sectors when possible > block: Convert bdrv_get_allocated_file_size() to co_wrapper > block: Convert bdrv_get_info() to co_wrapper_mixed > block: Convert bdrv_is_inserted() to co_wrapper > block: Convert bdrv_eject() to co_wrapper > block: convert bdrv_lock_medium in co_wrapper > block: Convert bdrv_debug_event to co_wrapper_mixed > block: Rename newly converted BlockDriver IO coroutine functions > > include/block/block-io.h | 36 + > include/block/block_int-common.h | 26 ++ > include/block/block_int-io.h | 5 +- > include/sysemu/block-backend-io.h | 31 --- > block.c| 82 ++ > block/blkdebug.c | 4 +- > block/blkio.c | 6 +-- > block/blklogwrites.c | 2 +- > block/blkreplay.c | 2 +- > block/blkverify.c | 2 +- > block/block-backend.c | 36 ++--- > block/commit.c | 4 +- > block/copy-on-read.c | 12 ++--- > block/crypto.c | 6 +-- > block/curl.c | 8 +-- > block/file-posix.c | 48 - > block/file-win32.c | 12 ++--- > block/filter-compress.c| 10 ++-- > block/gluster.c| 16 +++--- > block/io.c | 76 +-- > block/iscsi.c | 8 +-- > block/mirror.c | 6 +-- > block/nbd.c| 6 +-- > block/nfs.c| 2 +- > block/null.c | 8 +-- > block/nvme.c | 6 +-- > block/preallocate.c| 2 +- > block/qcow.c | 2 +- > block/qcow2-refcount.c | 2 +- > block/qcow2.c | 6 +-- > block/qed.c| 4 +- > block/quorum.c | 2 +- > block/raw-format.c | 14 ++--- > block/rbd.c| 4 +- > block/replication.c| 2 +- > block/ssh.c| 2 +- > block/throttle.c | 2 +- > block/vdi.c| 2 +- > block/vhdx.c | 2 +- > block/vmdk.c | 4 +- > block/vpc.c| 2 +- > blockdev.c | 8 ++- > hw/scsi/scsi-disk.c| 5 ++ > tests/unit/test-block-iothread.c | 3 ++ > scripts/block-coroutine-wrapper.py | 20 ++-- > block/meson.build | 1 + > 46 files changed, 316 insertions(+), 233 deletions(-) >
[PULL v2 00/51] Block layer patches
The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to 347fe9e156a3e00c40ae1802978276a1f7d5545f: block: GRAPH_RDLOCK for functions only called by co_wrappers (2022-12-15 10:11:45 +0100) v2: - Changed TSA capability name to "mutex" to work with older clang versions. The tsan-build CI job succeeds now. Block layer patches - Code cleanups around block graph modification - Simplify drain - coroutine_fn correctness fixes, including splitting generated coroutine wrappers into co_wrapper (to be called only from non-coroutine context) and co_wrapper_mixed (both coroutine and non-coroutine context) - Introduce a block graph rwlock Emanuele Giuseppe Esposito (21): block-io: introduce coroutine_fn duplicates for bdrv_common_block_status_above callers block-copy: add coroutine_fn annotations nbd/server.c: add coroutine_fn annotations block-backend: replace bdrv_*_above with blk_*_above block/vmdk: add coroutine_fn annotations block: avoid duplicating filename string in bdrv_create block: distinguish between bdrv_create running in coroutine and not block: bdrv_create_file is a coroutine_fn block: rename generated_co_wrapper in co_wrapper_mixed block-coroutine-wrapper.py: introduce co_wrapper block-coroutine-wrapper.py: support functions without bs arg block-coroutine-wrapper.py: support also basic return types block: convert bdrv_create to co_wrapper block/dirty-bitmap: convert coroutine-only functions to co_wrapper graph-lock: Implement guard macros async: Register/unregister aiocontext in graph lock list block: wrlock in bdrv_replace_child_noperm block: remove unnecessary assert_bdrv_graph_writable() block: assert that graph read and writes are performed correctly block-coroutine-wrapper.py: introduce annotations that take the graph rdlock block: use co_wrapper_mixed_bdrv_rdlock in functions taking the rdlock Kevin Wolf (25): qed: Don't yield in bdrv_qed_co_drain_begin() test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end() block: Revert .bdrv_drained_begin/end to non-coroutine_fn block: Remove drained_end_counter block: Inline bdrv_drain_invoke() block: Fix locking for bdrv_reopen_queue_child() block: Drain individual nodes during reopen block: Don't use subtree drains in bdrv_drop_intermediate() stream: Replace subtree drain with a single node drain block: Remove subtree drains block: Call drain callbacks only once block: Remove ignore_bds_parents parameter from drain_begin/end. block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() block: Don't poll in bdrv_replace_child_noperm() block: Remove poll parameter from bdrv_parent_drained_begin_single() block: Factor out bdrv_drain_all_begin_nopoll() Import clang-tsa.h clang-tsa: Add TSA_ASSERT() macro clang-tsa: Add macros for shared locks configure: Enable -Wthread-safety if present test-bdrv-drain: Fix incorrrect drain assumptions block: Fix locking in external_snapshot_prepare() graph-lock: TSA annotations for lock/unlock functions Mark assert_bdrv_graph_readable/writable() GRAPH_RD/WRLOCK block: GRAPH_RDLOCK for functions only called by co_wrappers Paolo Bonzini (1): graph-lock: Introduce a lock to protect block graph operations Vladimir Sementsov-Ogievskiy (4): block: Inline bdrv_detach_child() block: drop bdrv_remove_filter_or_cow_child block: bdrv_refresh_perms(): allow external tran block: refactor bdrv_list_refresh_perms to allow any list of nodes docs/devel/block-coroutine-wrapper.rst | 6 +- configure | 1 + block/block-gen.h | 11 +- block/coroutines.h | 21 +- include/block/aio.h| 9 + include/block/block-common.h | 27 ++- include/block/block-copy.h | 5 +- include/block/block-global-state.h | 15 +- include/block/block-io.h | 136 +-- include/block/block_int-common.h | 49 ++-- include/block/block_int-global-state.h | 17 -- include/block/block_int-io.h | 12 - include/block/block_int.h | 1 + include/block/dirty-bitmap.h | 10 +- include/block/graph-lock.h | 280 +++ include/qemu/clang-tsa.h | 114 ++ include/sysemu/block-backend-io.h | 77 --- block.c| 404 ++
[PULL 05/19] migration: Take bitmap mutex when completing ram migration
From: Peter Xu Any call to ram_find_and_save_block() needs to take the bitmap mutex. We used to not take it for most of ram_save_complete() because we thought we're the only one left using the bitmap, but it's not true after the preempt full patchset applied, since the return path can be taking it too. Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Juan Quintela --- migration/ram.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 8aad17c429..cc72c24c18 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3406,6 +3406,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) /* try transferring iterative blocks of memory */ /* flush all remaining blocks regardless of rate limiting */ +qemu_mutex_lock(&rs->bitmap_mutex); while (true) { int pages; @@ -3419,6 +3420,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) break; } } +qemu_mutex_unlock(&rs->bitmap_mutex); flush_compressed_data(rs); ram_control_after_iterate(f, RAM_CONTROL_FINISH); -- 2.38.1
Re: [PULL 00/51] Block layer patches
Am 14.12.2022 um 23:35 hat Peter Maydell geschrieben: > On Wed, 14 Dec 2022 at 13:45, Kevin Wolf wrote: > > > > The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: > > > > mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) > > > > are available in the Git repository at: > > > > https://repo.or.cz/qemu/kevin.git tags/for-upstream > > > > for you to fetch changes up to 2ad19e5dc950d4b340894846b9e71c0b20f9a1cc: > > > > block: GRAPH_RDLOCK for functions only called by co_wrappers (2022-12-14 > > 13:13:07 +0100) > > > > > > Block layer patches > > > > - Code cleanups around block graph modification > > - Simplify drain > > - coroutine_fn correctness fixes, including splitting generated > > coroutine wrappers into co_wrapper (to be called only from > > non-coroutine context) and co_wrapper_mixed (both coroutine and > > non-coroutine context) > > - Introduce a block graph rwlock > > > > > > Fails to build on the tsan-build job: > https://gitlab.com/qemu-project/qemu/-/jobs/3476176683 > > In file included from ../hw/nvram/fw_cfg-interface.c:10: > In file included from /builds/qemu-project/qemu/include/hw/nvram/fw_cfg.h:7: > In file included from /builds/qemu-project/qemu/include/sysemu/dma.h:15: > In file included from /builds/qemu-project/qemu/include/block/block.h:27: > In file included from > /builds/qemu-project/qemu/include/block/block-global-state.h:27: > In file included from > /builds/qemu-project/qemu/include/block/block-common.h:27: > In file included from /builds/qemu-project/qemu/include/block/aio.h:25: > /builds/qemu-project/qemu/include/block/graph-lock.h:62:31: error: > invalid capability name 'graph-lock'; capability name must be 'mutex' > or 'role' [-Werror,-Wthread-safety-attributes] > typedef struct TSA_CAPABILITY("graph-lock") BdrvGraphLock { >^ > > (I see the same error on my x86 macos system.) Ah, surprise, clang 11 lifted this arbitrary restriction for capability names and that it existed in older compiler versions isn't documented (any more?). We can either just name it "mutex" and have slightly misleading error messages (it's semantically not a mutex, but an rwlock), or add a configure check and leave TSA disabled if it doesn't work. I think I'll try the former for now, "mutex 'graph_lock'" should still be good enough to know what it means. Kevin
[PULL 07/19] migration: Cleanup xbzrle zero page cache update logic
From: Peter Xu The major change is to replace "!save_page_use_compression()" with "xbzrle_enabled" to make it clear. Reasonings: (1) When compression enabled, "!save_page_use_compression()" is exactly the same as checking "xbzrle_enabled". (2) When compression disabled, "!save_page_use_compression()" always return true. We used to try calling the xbzrle code, but after this change we won't, and we shouldn't need to. Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page() because with this change it's not needed anymore. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 00a2e30322..7124ff531c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -741,10 +741,6 @@ void mig_throttle_counter_reset(void) */ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr) { -if (!rs->xbzrle_enabled) { -return; -} - /* We don't care if this fails to allocate a new cache page * as long as it updated an old one */ cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page, @@ -2301,7 +2297,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) /* Must let xbzrle know, otherwise a previous (now 0'd) cached * page would be stale */ -if (!save_page_use_compression(rs)) { +if (rs->xbzrle_enabled) { XBZRLE_cache_lock(); xbzrle_cache_zero_page(rs, block->offset + offset); XBZRLE_cache_unlock(); -- 2.38.1
[PULL 16/19] migration: Move last_sent_block into PageSearchStatus
From: Peter Xu Since we use PageSearchStatus to represent a channel, it makes perfect sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be per-channel rather than global because each channel can be sending different pages on ramblocks. Hence move it from RAMState into PageSearchStatus. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 71 - 1 file changed, 41 insertions(+), 30 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 3194997738..1233ff53ac 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -89,6 +89,8 @@ XBZRLECacheStats xbzrle_counters; struct PageSearchStatus { /* The migration channel used for a specific host page */ QEMUFile*pss_channel; +/* Last block from where we have sent data */ +RAMBlock *last_sent_block; /* Current block being searched */ RAMBlock*block; /* Current page to search from */ @@ -368,8 +370,6 @@ struct RAMState { int uffdio_fd; /* Last block that we have visited searching for dirty pages */ RAMBlock *last_seen_block; -/* Last block from where we have sent data */ -RAMBlock *last_sent_block; /* Last dirty target page we have sent */ ram_addr_t last_page; /* last ram version we have seen */ @@ -684,16 +684,17 @@ exit: * * Returns the number of bytes written * - * @f: QEMUFile where to send the data + * @pss: current PSS channel status * @block: block that contains the page we want to send * @offset: offset inside the block for the page * in the lower bits, it contains flags */ -static size_t save_page_header(RAMState *rs, QEMUFile *f, RAMBlock *block, +static size_t save_page_header(PageSearchStatus *pss, RAMBlock *block, ram_addr_t offset) { size_t size, len; -bool same_block = (block == rs->last_sent_block); +bool same_block = (block == pss->last_sent_block); +QEMUFile *f = pss->pss_channel; if (same_block) { offset |= RAM_SAVE_FLAG_CONTINUE; @@ -706,7 +707,7 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f, RAMBlock *block, qemu_put_byte(f, len); qemu_put_buffer(f, (uint8_t *)block->idstr, len); size += 1 + len; -rs->last_sent_block = block; +pss->last_sent_block = block; } return size; } @@ -790,17 +791,19 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr) * -1 means that xbzrle would be longer than normal * * @rs: current RAM state + * @pss: current PSS channel * @current_data: pointer to the address of the page contents * @current_addr: addr of the page * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_xbzrle_page(RAMState *rs, QEMUFile *file, +static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss, uint8_t **current_data, ram_addr_t current_addr, RAMBlock *block, ram_addr_t offset) { int encoded_len = 0, bytes_xbzrle; uint8_t *prev_cached_page; +QEMUFile *file = pss->pss_channel; if (!cache_is_cached(XBZRLE.cache, current_addr, ram_counters.dirty_sync_count)) { @@ -865,7 +868,7 @@ static int save_xbzrle_page(RAMState *rs, QEMUFile *file, } /* Send XBZRLE based compressed page */ -bytes_xbzrle = save_page_header(rs, file, block, +bytes_xbzrle = save_page_header(pss, block, offset | RAM_SAVE_FLAG_XBZRLE); qemu_put_byte(file, ENCODING_FLAG_XBZRLE); qemu_put_be16(file, encoded_len); @@ -1296,19 +1299,19 @@ void ram_release_page(const char *rbname, uint64_t offset) * Returns the size of data written to the file, 0 means the page is not * a zero page * - * @rs: current RAM state - * @file: the file where the data is saved + * @pss: current PSS channel * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_zero_page_to_file(RAMState *rs, QEMUFile *file, +static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block, ram_addr_t offset) { uint8_t *p = block->host + offset; +QEMUFile *file = pss->pss_channel; int len = 0; if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { -len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO); +len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO); qemu_put_byte(file, 0); len += 1; ram_release_page(block->idstr, offset); @@ -1321,14 +1324,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file, * * Returns the number of pages written. * - * @rs: current RAM state + * @pss: current PSS channel * @block
[PULL 03/19] migration: Export ram_transferred_ram()
Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert Reviewed-by: David Edmondson Reviewed-by: Leonardo Bras --- migration/ram.h | 2 ++ migration/ram.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/ram.h b/migration/ram.h index c7af65ac74..e844966f69 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -65,6 +65,8 @@ int ram_load_postcopy(QEMUFile *f, int channel); void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); +void ram_transferred_add(uint64_t bytes); + int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr); bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset); void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); diff --git a/migration/ram.c b/migration/ram.c index 1338e47665..2cbe707bfc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -422,7 +422,7 @@ uint64_t ram_bytes_remaining(void) MigrationStats ram_counters; -static void ram_transferred_add(uint64_t bytes) +void ram_transferred_add(uint64_t bytes) { if (runstate_is_running()) { ram_counters.precopy_bytes += bytes; -- 2.38.1
[PULL 10/19] migration: Yield bitmap_mutex properly when sending/sleeping
From: Peter Xu Don't take the bitmap mutex when sending pages, or when being throttled by migration_rate_limit() (which is a bit tricky to call it here in ram code, but seems still helpful). It prepares for the possibility of concurrently sending pages in >1 threads using the function ram_save_host_page() because all threads may need the bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of qemu_sem_wait() blocking for one thread will not block the other from progressing. Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Juan Quintela --- migration/ram.c | 46 +++--- 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 6e3dc845c5..5379164749 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2452,9 +2452,14 @@ static void postcopy_preempt_reset_channel(RAMState *rs) * a host page in which case the remainder of the hostpage is sent. * Only dirty target pages are sent. Note that the host page size may * be a huge page for this block. + * * The saving stops at the boundary of the used_length of the block * if the RAMBlock isn't a multiple of the host page size. * + * The caller must be with ram_state.bitmap_mutex held to call this + * function. Note that this function can temporarily release the lock, but + * when the function is returned it'll make sure the lock is still held. + * * Returns the number of pages written or negative on error * * @rs: current RAM state @@ -2462,6 +2467,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs) */ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) { +bool page_dirty, preempt_active = postcopy_preempt_active(); int tmppages, pages = 0; size_t pagesize_bits = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; @@ -2485,22 +2491,40 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) break; } +page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page); + /* Check the pages is dirty and if it is send it */ -if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) { +if (page_dirty) { +/* + * Properly yield the lock only in postcopy preempt mode + * because both migration thread and rp-return thread can + * operate on the bitmaps. + */ +if (preempt_active) { +qemu_mutex_unlock(&rs->bitmap_mutex); +} tmppages = ram_save_target_page(rs, pss); -if (tmppages < 0) { -return tmppages; +if (tmppages >= 0) { +pages += tmppages; +/* + * Allow rate limiting to happen in the middle of huge pages if + * something is sent in the current iteration. + */ +if (pagesize_bits > 1 && tmppages > 0) { +migration_rate_limit(); +} } - -pages += tmppages; -/* - * Allow rate limiting to happen in the middle of huge pages if - * something is sent in the current iteration. - */ -if (pagesize_bits > 1 && tmppages > 0) { -migration_rate_limit(); +if (preempt_active) { +qemu_mutex_lock(&rs->bitmap_mutex); } +} else { +tmppages = 0; +} + +if (tmppages < 0) { +return tmppages; } + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page); } while ((pss->page < hostpage_boundary) && offset_in_ramblock(pss->block, -- 2.38.1
[PULL 00/19] Next 8.0 patches
The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b: mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500) are available in the Git repository at: https://gitlab.com/juan.quintela/qemu.git tags/next-8.0-pull-request for you to fetch changes up to 7f401b80445e8746202a6d643410ba1b9eeb3cb1: migration: Drop rs->f (2022-12-15 10:30:37 +0100) Migration patches for 8.0 Hi This are the patches that I had to drop form the last PULL request because they werent fixes: - AVX2 is dropped, intel posted a fix, I have to redo it - Fix for out of order channels is out Daniel nacked it and I need to redo it Juan Quintela (4): multifd: Create page_size fields into both MultiFD{Recv,Send}Params multifd: Create page_count fields into both MultiFD{Recv,Send}Params migration: Export ram_transferred_ram() migration: Export ram_release_page() Peter Xu (15): migration: Take bitmap mutex when completing ram migration migration: Add postcopy_preempt_active() migration: Cleanup xbzrle zero page cache update logic migration: Trivial cleanup save_page_header() on same block check migration: Remove RAMState.f references in compression code migration: Yield bitmap_mutex properly when sending/sleeping migration: Use atomic ops properly for page accountings migration: Teach PSS about host page migration: Introduce pss_channel migration: Add pss_init() migration: Make PageSearchStatus part of RAMState migration: Move last_sent_block into PageSearchStatus migration: Send requested page directly in rp-return thread migration: Remove old preempt code around state maintainance migration: Drop rs->f migration/migration.h| 7 - migration/multifd.h | 8 + migration/ram.h | 23 ++ migration/migration.c| 47 +-- migration/multifd-zlib.c | 14 +- migration/multifd-zstd.c | 12 +- migration/multifd.c | 27 +- migration/ram.c | 735 ++- 8 files changed, 422 insertions(+), 451 deletions(-) -- 2.38.1
[PULL 06/19] migration: Add postcopy_preempt_active()
From: Peter Xu Add the helper to show that postcopy preempt enabled, meanwhile active. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index cc72c24c18..00a2e30322 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -162,6 +162,11 @@ out: return ret; } +static bool postcopy_preempt_active(void) +{ +return migrate_postcopy_preempt() && migration_in_postcopy(); +} + bool ramblock_is_ignored(RAMBlock *block) { return !qemu_ram_is_migratable(block) || @@ -2433,7 +2438,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss) /* We need to make sure rs->f always points to the default channel elsewhere */ static void postcopy_preempt_reset_channel(RAMState *rs) { -if (migrate_postcopy_preempt() && migration_in_postcopy()) { +if (postcopy_preempt_active()) { rs->postcopy_channel = RAM_CHANNEL_PRECOPY; rs->f = migrate_get_current()->to_dst_file; trace_postcopy_preempt_reset_channel(); @@ -2471,7 +2476,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) return 0; } -if (migrate_postcopy_preempt() && migration_in_postcopy()) { +if (postcopy_preempt_active()) { postcopy_preempt_choose_channel(rs, pss); } -- 2.38.1
[PULL 12/19] migration: Teach PSS about host page
From: Peter Xu Migration code has a lot to do with host pages. Teaching PSS core about the idea of host page helps a lot and makes the code clean. Meanwhile, this prepares for the future changes that can leverage the new PSS helpers that this patch introduces to send host page in another thread. Three more fields are introduced for this: (1) host_page_sending: this is set to true when QEMU is sending a host page, false otherwise. (2) host_page_{start|end}: these point to the start/end of host page we're sending, and it's only valid when host_page_sending==true. For example, when we look up the next dirty page on the ramblock, with host_page_sending==true, we'll not try to look for anything beyond the current host page boundary. This can be slightly efficient than current code because currently we'll set pss->page to next dirty bit (which can be over current host page boundary) and reset it to host page boundary if we found it goes beyond that. With above, we can easily make migration_bitmap_find_dirty() self contained by updating pss->page properly. rs* parameter is removed because it's not even used in old code. When sending a host page, we should use the pss helpers like this: - pss_host_page_prepare(pss): called before sending host page - pss_within_range(pss): whether we're still working on the cur host page? - pss_host_page_finish(pss): called after sending a host page Then we can use ram_save_target_page() to save one small page. Currently ram_save_host_page() is still the only user. If there'll be another function to send host page (e.g. in return path thread) in the future, it should follow the same style. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 95 +++-- 1 file changed, 76 insertions(+), 19 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index f4cd9038f4..4d7b50ef79 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -481,6 +481,11 @@ struct PageSearchStatus { * postcopy pages via postcopy preempt channel. */ bool postcopy_target_channel; +/* Whether we're sending a host page */ +bool host_page_sending; +/* The start/end of current host page. Only valid if host_page_sending==true */ +unsigned long host_page_start; +unsigned long host_page_end; }; typedef struct PageSearchStatus PageSearchStatus; @@ -858,26 +863,38 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data, } /** - * migration_bitmap_find_dirty: find the next dirty page from start + * pss_find_next_dirty: find the next dirty page of current ramblock * - * Returns the page offset within memory region of the start of a dirty page + * This function updates pss->page to point to the next dirty page index + * within the ramblock to migrate, or the end of ramblock when nothing + * found. Note that when pss->host_page_sending==true it means we're + * during sending a host page, so we won't look for dirty page that is + * outside the host page boundary. * - * @rs: current RAM state - * @rb: RAMBlock where to search for dirty pages - * @start: page where we start the search + * @pss: the current page search status */ -static inline -unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, - unsigned long start) +static void pss_find_next_dirty(PageSearchStatus *pss) { +RAMBlock *rb = pss->block; unsigned long size = rb->used_length >> TARGET_PAGE_BITS; unsigned long *bitmap = rb->bmap; if (ramblock_is_ignored(rb)) { -return size; +/* Points directly to the end, so we know no dirty page */ +pss->page = size; +return; } -return find_next_bit(bitmap, size, start); +/* + * If during sending a host page, only look for dirty pages within the + * current host page being send. + */ +if (pss->host_page_sending) { +assert(pss->host_page_end); +size = MIN(size, pss->host_page_end); +} + +pss->page = find_next_bit(bitmap, size, pss->page); } static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb, @@ -1563,7 +1580,9 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again) pss->postcopy_requested = false; pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY; -pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page); +/* Update pss->page for the next dirty bit in ramblock */ +pss_find_next_dirty(pss); + if (pss->complete_round && pss->block == rs->last_seen_block && pss->page >= rs->last_page) { /* @@ -2452,6 +2471,44 @@ static void postcopy_preempt_reset_channel(RAMState *rs) } } +/* Should be called before sending a host page */ +static void pss_host_page_prepare(PageSearchStatus *pss) +{ +/* How many guest pa
[PULL 01/19] multifd: Create page_size fields into both MultiFD{Recv, Send}Params
We were calling qemu_target_page_size() left and right. Signed-off-by: Juan Quintela Reviewed-by: Leonardo Bras --- migration/multifd.h | 4 migration/multifd-zlib.c | 14 ++ migration/multifd-zstd.c | 12 +--- migration/multifd.c | 18 -- 4 files changed, 23 insertions(+), 25 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 519f498643..86fb9982b3 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -80,6 +80,8 @@ typedef struct { bool registered_yank; /* packet allocated len */ uint32_t packet_len; +/* guest page size */ +uint32_t page_size; /* multifd flags for sending ram */ int write_flags; @@ -143,6 +145,8 @@ typedef struct { QIOChannel *c; /* packet allocated len */ uint32_t packet_len; +/* guest page size */ +uint32_t page_size; /* syncs main thread and channels */ QemuSemaphore sem_sync; diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 18213a9513..37770248e1 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -116,7 +116,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp) static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) { struct zlib_data *z = p->data; -size_t page_size = qemu_target_page_size(); z_stream *zs = &z->zs; uint32_t out_size = 0; int ret; @@ -135,8 +134,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) * with compression. zlib does not guarantee that this is safe, * therefore copy the page before calling deflate(). */ -memcpy(z->buf, p->pages->block->host + p->normal[i], page_size); -zs->avail_in = page_size; +memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size); +zs->avail_in = p->page_size; zs->next_in = z->buf; zs->avail_out = available; @@ -242,12 +241,11 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p) static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp) { struct zlib_data *z = p->data; -size_t page_size = qemu_target_page_size(); z_stream *zs = &z->zs; uint32_t in_size = p->next_packet_size; /* we measure the change of total_out */ uint32_t out_size = zs->total_out; -uint32_t expected_size = p->normal_num * page_size; +uint32_t expected_size = p->normal_num * p->page_size; uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; int ret; int i; @@ -274,7 +272,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp) flush = Z_SYNC_FLUSH; } -zs->avail_out = page_size; +zs->avail_out = p->page_size; zs->next_out = p->host + p->normal[i]; /* @@ -288,8 +286,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp) do { ret = inflate(zs, flush); } while (ret == Z_OK && zs->avail_in - && (zs->total_out - start) < page_size); -if (ret == Z_OK && (zs->total_out - start) < page_size) { + && (zs->total_out - start) < p->page_size); +if (ret == Z_OK && (zs->total_out - start) < p->page_size) { error_setg(errp, "multifd %u: inflate generated too few output", p->id); return -1; diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index d788d309f2..f4a8e1ed1f 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -113,7 +113,6 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp) static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) { struct zstd_data *z = p->data; -size_t page_size = qemu_target_page_size(); int ret; uint32_t i; @@ -128,7 +127,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) flush = ZSTD_e_flush; } z->in.src = p->pages->block->host + p->normal[i]; -z->in.size = page_size; +z->in.size = p->page_size; z->in.pos = 0; /* @@ -241,8 +240,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp) { uint32_t in_size = p->next_packet_size; uint32_t out_size = 0; -size_t page_size = qemu_target_page_size(); -uint32_t expected_size = p->normal_num * page_size; +uint32_t expected_size = p->normal_num * p->page_size; uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK; struct zstd_data *z = p->data; int ret; @@ -265,7 +263,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp) for (i = 0; i < p->normal_num; i++) { z->out.dst = p->host + p->normal[i]; -z->out.size = page_size; +z->out.size = p->page_size; z->out.pos = 0; /* @@ -279,8 +277,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp) do {
[PULL 14/19] migration: Add pss_init()
From: Peter Xu Helper to init PSS structures. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 571d780987..d81bf7b183 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -542,6 +542,14 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss, bool postcopy_requested); +/* NOTE: page is the PFN not real ram_addr_t. */ +static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page) +{ +pss->block = rb; +pss->page = page; +pss->complete_round = false; +} + static void *do_data_compress(void *opaque) { CompressParam *param = opaque; @@ -2650,9 +2658,7 @@ static int ram_find_and_save_block(RAMState *rs) rs->last_page = 0; } -pss.block = rs->last_seen_block; -pss.page = rs->last_page; -pss.complete_round = false; +pss_init(&pss, rs->last_seen_block, rs->last_page); do { again = true; -- 2.38.1
[PULL 15/19] migration: Make PageSearchStatus part of RAMState
From: Peter Xu We used to allocate PSS structure on the stack for precopy when sending pages. Make it static, so as to describe per-channel ram migration status. Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use it, even though this patch has not yet to start using the 2nd instance. This should not have any functional change per se, but it already starts to export PSS information via the RAMState, so that e.g. one PSS channel can start to reference the other PSS channel. Always protect PSS access using the same RAMState.bitmap_mutex. We already do so, so no code change needed, just some comment update. Maybe we should consider renaming bitmap_mutex some day as it's going to be a more commonly and big mutex we use for ram states, but just leave it for later. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 112 ++-- 1 file changed, 61 insertions(+), 51 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d81bf7b183..3194997738 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -85,6 +85,46 @@ XBZRLECacheStats xbzrle_counters; +/* used by the search for pages to send */ +struct PageSearchStatus { +/* The migration channel used for a specific host page */ +QEMUFile*pss_channel; +/* Current block being searched */ +RAMBlock*block; +/* Current page to search from */ +unsigned long page; +/* Set once we wrap around */ +bool complete_round; +/* + * [POSTCOPY-ONLY] Whether current page is explicitly requested by + * postcopy. When set, the request is "urgent" because the dest QEMU + * threads are waiting for us. + */ +bool postcopy_requested; +/* + * [POSTCOPY-ONLY] The target channel to use to send current page. + * + * Note: This may _not_ match with the value in postcopy_requested + * above. Let's imagine the case where the postcopy request is exactly + * the page that we're sending in progress during precopy. In this case + * we'll have postcopy_requested set to true but the target channel + * will be the precopy channel (so that we don't split brain on that + * specific page since the precopy channel already contains partial of + * that page data). + * + * Besides that specific use case, postcopy_target_channel should + * always be equal to postcopy_requested, because by default we send + * postcopy pages via postcopy preempt channel. + */ +bool postcopy_target_channel; +/* Whether we're sending a host page */ +bool host_page_sending; +/* The start/end of current host page. Invalid if host_page_sending==false */ +unsigned long host_page_start; +unsigned long host_page_end; +}; +typedef struct PageSearchStatus PageSearchStatus; + /* struct contains XBZRLE cache and a static page used by the compression */ static struct { @@ -319,6 +359,11 @@ typedef struct { struct RAMState { /* QEMUFile used for this migration */ QEMUFile *f; +/* + * PageSearchStatus structures for the channels when send pages. + * Protected by the bitmap_mutex. + */ +PageSearchStatus pss[RAM_CHANNEL_MAX]; /* UFFD file descriptor, used in 'write-tracking' migration */ int uffdio_fd; /* Last block that we have visited searching for dirty pages */ @@ -362,7 +407,12 @@ struct RAMState { uint64_t target_page_count; /* number of dirty bits in the bitmap */ uint64_t migration_dirty_pages; -/* Protects modification of the bitmap and migration dirty pages */ +/* + * Protects: + * - dirty/clear bitmap + * - migration_dirty_pages + * - pss structures + */ QemuMutex bitmap_mutex; /* The RAMBlock used in the last src_page_requests */ RAMBlock *last_req_rb; @@ -451,46 +501,6 @@ void dirty_sync_missed_zero_copy(void) ram_counters.dirty_sync_missed_zero_copy++; } -/* used by the search for pages to send */ -struct PageSearchStatus { -/* The migration channel used for a specific host page */ -QEMUFile*pss_channel; -/* Current block being searched */ -RAMBlock*block; -/* Current page to search from */ -unsigned long page; -/* Set once we wrap around */ -bool complete_round; -/* - * [POSTCOPY-ONLY] Whether current page is explicitly requested by - * postcopy. When set, the request is "urgent" because the dest QEMU - * threads are waiting for us. - */ -bool postcopy_requested; -/* - * [POSTCOPY-ONLY] The target channel to use to send current page. - * - * Note: This may _not_ match with the value in postcopy_requested - * above. Let's imagine the case where the postcopy request is exactly - * the page that we're sending in progress during precopy. In this case - * we'l
[PULL 19/19] migration: Drop rs->f
From: Peter Xu Now with rs->pss we can already cache channels in pss->pss_channels. That pss_channel contains more infromation than rs->f because it's per-channel. So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel, while rs->f itself is a bit vague now. Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY], that's slightly confusing but it reflects the reality. Then, after the replacement we can safely drop rs->f. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 1ae093fb61..334309f1c6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -323,8 +323,6 @@ struct RAMSrcPageRequest { /* State of RAM for migration */ struct RAMState { -/* QEMUFile used for this migration */ -QEMUFile *f; /* * PageSearchStatus structures for the channels when send pages. * Protected by the bitmap_mutex. @@ -2532,8 +2530,6 @@ static int ram_find_and_save_block(RAMState *rs) } if (found) { -/* Cache rs->f in pss_channel (TODO: remove rs->f) */ -pss->pss_channel = rs->f; pages = ram_save_host_page(rs, pss); } } while (!pages && again); @@ -3089,7 +3085,7 @@ static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out) ram_state_reset(rs); /* Update RAMState cache of output QEMUFile */ -rs->f = out; +rs->pss[RAM_CHANNEL_PRECOPY].pss_channel = out; trace_ram_state_resume_prepare(pages); } @@ -3180,7 +3176,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) return -1; } } -(*rsp)->f = f; +(*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f; WITH_RCU_READ_LOCK_GUARD() { qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE); @@ -3315,7 +3311,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { -ret = multifd_send_sync_main(rs->f); +ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); if (ret < 0) { return ret; } @@ -3385,7 +3381,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) return ret; } -ret = multifd_send_sync_main(rs->f); +ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); if (ret < 0) { return ret; } -- 2.38.1
[PULL 17/19] migration: Send requested page directly in rp-return thread
From: Peter Xu With all the facilities ready, send the requested page directly in the rp-return thread rather than queuing it in the request queue, if and only if postcopy preempt is enabled. It can achieve so because it uses separate channel for sending urgent pages. The only shared data is bitmap and it's protected by the bitmap_mutex. Note that since we're moving the ownership of the urgent channel from the migration thread to rp thread it also means the rp thread is responsible for managing the qemufile, e.g. properly close it when pausing migration happens. For this, let migration_release_from_dst_file to cover shutdown of the urgent channel too, renaming it as migration_release_dst_files() to better show what it does. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/migration.c | 35 +++-- migration/ram.c | 112 ++ 2 files changed, 131 insertions(+), 16 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index de83c50f51..c1d4d76d0c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2848,8 +2848,11 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) return 0; } -/* Release ms->rp_state.from_dst_file in a safe way */ -static void migration_release_from_dst_file(MigrationState *ms) +/* + * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if + * existed) in a safe way. + */ +static void migration_release_dst_files(MigrationState *ms) { QEMUFile *file; @@ -2862,6 +2865,18 @@ static void migration_release_from_dst_file(MigrationState *ms) ms->rp_state.from_dst_file = NULL; } +/* + * Do the same to postcopy fast path socket too if there is. No + * locking needed because this qemufile should only be managed by + * return path thread. + */ +if (ms->postcopy_qemufile_src) { +migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src); +qemu_file_shutdown(ms->postcopy_qemufile_src); +qemu_fclose(ms->postcopy_qemufile_src); +ms->postcopy_qemufile_src = NULL; +} + qemu_fclose(file); } @@ -3006,7 +3021,7 @@ out: * Maybe there is something we can do: it looks like a * network down issue, and we pause for a recovery. */ -migration_release_from_dst_file(ms); +migration_release_dst_files(ms); rp = NULL; if (postcopy_pause_return_path_thread(ms)) { /* @@ -3024,7 +3039,7 @@ out: } trace_source_return_path_thread_end(); -migration_release_from_dst_file(ms); +migration_release_dst_files(ms); rcu_unregister_thread(); return NULL; } @@ -3547,18 +3562,6 @@ static MigThrError postcopy_pause(MigrationState *s) qemu_file_shutdown(file); qemu_fclose(file); -/* - * Do the same to postcopy fast path socket too if there is. No - * locking needed because no racer as long as we do this before setting - * status to paused. - */ -if (s->postcopy_qemufile_src) { -migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src); -qemu_file_shutdown(s->postcopy_qemufile_src); -qemu_fclose(s->postcopy_qemufile_src); -s->postcopy_qemufile_src = NULL; -} - migrate_set_state(&s->state, s->state, MIGRATION_STATUS_POSTCOPY_PAUSED); diff --git a/migration/ram.c b/migration/ram.c index 1233ff53ac..16ade7cb70 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -546,6 +546,8 @@ static QemuThread *decompress_threads; static QemuMutex decomp_done_lock; static QemuCond decomp_done_cond; +static int ram_save_host_page_urgent(PageSearchStatus *pss); + static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, ram_addr_t offset, uint8_t *source_buf); @@ -560,6 +562,16 @@ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page) pss->complete_round = false; } +/* + * Check whether two PSSs are actively sending the same page. Return true + * if it is, false otherwise. + */ +static bool pss_overlap(PageSearchStatus *pss1, PageSearchStatus *pss2) +{ +return pss1->host_page_sending && pss2->host_page_sending && +(pss1->host_page_start == pss2->host_page_start); +} + static void *do_data_compress(void *opaque) { CompressParam *param = opaque; @@ -2260,6 +2272,57 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) return -1; } +/* + * When with postcopy preempt, we send back the page directly in the + * rp-return thread. + */ +if (postcopy_preempt_active()) { +ram_addr_t page_start = start >> TARGET_PAGE_BITS; +size_t page_size = qemu_ram_page
[PULL 11/19] migration: Use atomic ops properly for page accountings
From: Peter Xu To prepare for thread-safety on page accountings, at least below counters need to be accessed only atomically, they are: ram_counters.transferred ram_counters.duplicate ram_counters.normal ram_counters.postcopy_bytes There are a lot of other counters but they won't be accessed outside migration thread, then they're still safe to be accessed without atomic ops. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.h | 20 migration/migration.c | 10 +- migration/multifd.c | 4 ++-- migration/ram.c | 40 4 files changed, 51 insertions(+), 23 deletions(-) diff --git a/migration/ram.h b/migration/ram.h index 038d52f49f..81cbb0947c 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -32,7 +32,27 @@ #include "qapi/qapi-types-migration.h" #include "exec/cpu-common.h" #include "io/channel.h" +#include "qemu/stats64.h" +/* + * These are the migration statistic counters that need to be updated using + * atomic ops (can be accessed by more than one thread). Here since we + * cannot modify MigrationStats directly to use Stat64 as it was defined in + * the QAPI scheme, we define an internal structure to hold them, and we + * propagate the real values when QMP queries happen. + * + * IOW, the corresponding fields within ram_counters on these specific + * fields will be always zero and not being used at all; they're just + * placeholders to make it QAPI-compatible. + */ +typedef struct { +Stat64 transferred; +Stat64 duplicate; +Stat64 normal; +Stat64 postcopy_bytes; +} MigrationAtomicStats; + +extern MigrationAtomicStats ram_atomic_counters; extern MigrationStats ram_counters; extern XBZRLECacheStats xbzrle_counters; extern CompressionStats compression_counters; diff --git a/migration/migration.c b/migration/migration.c index f485eea5fb..de83c50f51 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1049,13 +1049,13 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s) info->has_ram = true; info->ram = g_malloc0(sizeof(*info->ram)); -info->ram->transferred = ram_counters.transferred; +info->ram->transferred = stat64_get(&ram_atomic_counters.transferred); info->ram->total = ram_bytes_total(); -info->ram->duplicate = ram_counters.duplicate; +info->ram->duplicate = stat64_get(&ram_atomic_counters.duplicate); /* legacy value. It is not used anymore */ info->ram->skipped = 0; -info->ram->normal = ram_counters.normal; -info->ram->normal_bytes = ram_counters.normal * page_size; +info->ram->normal = stat64_get(&ram_atomic_counters.normal); +info->ram->normal_bytes = info->ram->normal * page_size; info->ram->mbps = s->mbps; info->ram->dirty_sync_count = ram_counters.dirty_sync_count; info->ram->dirty_sync_missed_zero_copy = @@ -1066,7 +1066,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s) info->ram->pages_per_second = s->pages_per_second; info->ram->precopy_bytes = ram_counters.precopy_bytes; info->ram->downtime_bytes = ram_counters.downtime_bytes; -info->ram->postcopy_bytes = ram_counters.postcopy_bytes; +info->ram->postcopy_bytes = stat64_get(&ram_atomic_counters.postcopy_bytes); if (migrate_use_xbzrle()) { info->has_xbzrle_cache = true; diff --git a/migration/multifd.c b/migration/multifd.c index b8dc559d24..000ca4d4ec 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -432,7 +432,7 @@ static int multifd_send_pages(QEMUFile *f) transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len; qemu_file_acct_rate_limit(f, transferred); ram_counters.multifd_bytes += transferred; -ram_counters.transferred += transferred; +stat64_add(&ram_atomic_counters.transferred, transferred); qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); @@ -624,7 +624,7 @@ int multifd_send_sync_main(QEMUFile *f) p->pending_job++; qemu_file_acct_rate_limit(f, p->packet_len); ram_counters.multifd_bytes += p->packet_len; -ram_counters.transferred += p->packet_len; +stat64_add(&ram_atomic_counters.transferred, p->packet_len); qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); diff --git a/migration/ram.c b/migration/ram.c index 5379164749..f4cd9038f4 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -425,18 +425,25 @@ uint64_t ram_bytes_remaining(void) 0; } +/* + * NOTE: not all stats in ram_counters are used in reality. See comments + * for struct MigrationAtomicStats. The ultimate result of ram migration + * counters will be a merged version with both ram_counters and the atomic + * fields in ram_atomic_counters. + */ MigrationStats ram_counters; +MigrationAtomicStats ram_atomic_counter
[PULL 09/19] migration: Remove RAMState.f references in compression code
From: Peter Xu Removing referencing to RAMState.f in compress_page_with_multi_thread() and flush_compressed_data(). Compression code by default isn't compatible with having >1 channels (or it won't currently know which channel to flush the compressed data), so to make it simple we always flush on the default to_dst_file port until someone wants to add >1 ports support, as rs->f right now can really change (after postcopy preempt is introduced). There should be no functional change at all after patch applied, since as long as rs->f referenced in compression code, it must be to_dst_file. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Signed-off-by: Juan Quintela --- migration/ram.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 41475431fc..6e3dc845c5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1461,6 +1461,7 @@ static bool save_page_use_compression(RAMState *rs); static void flush_compressed_data(RAMState *rs) { +MigrationState *ms = migrate_get_current(); int idx, len, thread_count; if (!save_page_use_compression(rs)) { @@ -1479,7 +1480,7 @@ static void flush_compressed_data(RAMState *rs) for (idx = 0; idx < thread_count; idx++) { qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { -len = qemu_put_qemu_file(rs->f, comp_param[idx].file); +len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file); /* * it's safe to fetch zero_page without holding comp_done_lock * as there is no further request submitted to the thread, @@ -1498,11 +1499,11 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, param->offset = offset; } -static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block, - ram_addr_t offset) +static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) { int idx, thread_count, bytes_xmit = -1, pages = -1; bool wait = migrate_compress_wait_thread(); +MigrationState *ms = migrate_get_current(); thread_count = migrate_compress_threads(); qemu_mutex_lock(&comp_done_lock); @@ -1510,7 +1511,8 @@ retry: for (idx = 0; idx < thread_count; idx++) { if (comp_param[idx].done) { comp_param[idx].done = false; -bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file); +bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, +comp_param[idx].file); qemu_mutex_lock(&comp_param[idx].mutex); set_compress_params(&comp_param[idx], block, offset); qemu_cond_signal(&comp_param[idx].cond); @@ -2263,7 +2265,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset) return false; } -if (compress_page_with_multi_thread(rs, block, offset) > 0) { +if (compress_page_with_multi_thread(block, offset) > 0) { return true; } -- 2.38.1
[PULL 18/19] migration: Remove old preempt code around state maintainance
From: Peter Xu With the new code to send pages in rp-return thread, there's little help to keep lots of the old code on maintaining the preempt state in migration thread, because the new way should always be faster.. Then if we'll always send pages in the rp-return thread anyway, we don't need those logic to maintain preempt state anymore because now we serialize things using the mutex directly instead of using those fields. It's very unfortunate to have those code for a short period, but that's still one intermediate step that we noticed the next bottleneck on the migration thread. Now what we can do best is to drop unnecessary code as long as the new code is stable to reduce the burden. It's actually a good thing because the new "sending page in rp-return thread" model is (IMHO) even cleaner and with better performance. Remove the old code that was responsible for maintaining preempt states, at the meantime also remove x-postcopy-preempt-break-huge parameter because with concurrent sender threads we don't really need to break-huge anymore. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/migration.h | 7 - migration/migration.c | 2 - migration/ram.c | 291 +- 3 files changed, 3 insertions(+), 297 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index cdad8aceaa..ae4ffd3454 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -340,13 +340,6 @@ struct MigrationState { bool send_configuration; /* Whether we send section footer during migration */ bool send_section_footer; -/* - * Whether we allow break sending huge pages when postcopy preempt is - * enabled. When disabled, we won't interrupt precopy within sending a - * host huge page, which is the old behavior of vanilla postcopy. - * NOTE: this parameter is ignored if postcopy preempt is not enabled. - */ -bool postcopy_preempt_break_huge; /* Needed by postcopy-pause state */ QemuSemaphore postcopy_pause_sem; diff --git a/migration/migration.c b/migration/migration.c index c1d4d76d0c..c3490c495d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -4402,8 +4402,6 @@ static Property migration_properties[] = { DEFINE_PROP_SIZE("announce-step", MigrationState, parameters.announce_step, DEFAULT_MIGRATE_ANNOUNCE_STEP), -DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState, - postcopy_preempt_break_huge, true), DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds), DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname), DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz), diff --git a/migration/ram.c b/migration/ram.c index 16ade7cb70..1ae093fb61 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -97,28 +97,6 @@ struct PageSearchStatus { unsigned long page; /* Set once we wrap around */ bool complete_round; -/* - * [POSTCOPY-ONLY] Whether current page is explicitly requested by - * postcopy. When set, the request is "urgent" because the dest QEMU - * threads are waiting for us. - */ -bool postcopy_requested; -/* - * [POSTCOPY-ONLY] The target channel to use to send current page. - * - * Note: This may _not_ match with the value in postcopy_requested - * above. Let's imagine the case where the postcopy request is exactly - * the page that we're sending in progress during precopy. In this case - * we'll have postcopy_requested set to true but the target channel - * will be the precopy channel (so that we don't split brain on that - * specific page since the precopy channel already contains partial of - * that page data). - * - * Besides that specific use case, postcopy_target_channel should - * always be equal to postcopy_requested, because by default we send - * postcopy pages via postcopy preempt channel. - */ -bool postcopy_target_channel; /* Whether we're sending a host page */ bool host_page_sending; /* The start/end of current host page. Invalid if host_page_sending==false */ @@ -343,20 +321,6 @@ struct RAMSrcPageRequest { QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req; }; -typedef struct { -/* - * Cached ramblock/offset values if preempted. They're only meaningful if - * preempted==true below. - */ -RAMBlock *ram_block; -unsigned long ram_page; -/* - * Whether a postcopy preemption just happened. Will be reset after - * precopy recovered to background migration. - */ -bool preempted; -} PostcopyPreemptState; - /* State of RAM for migration */ struct RAMState { /* QEMUFile used for this migration */ @@ -419,14 +383,6 @@ struct RAMState { /*
[PULL 08/19] migration: Trivial cleanup save_page_header() on same block check
From: Peter Xu The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant. Use a boolean to be clearer. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 7124ff531c..41475431fc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -661,14 +661,15 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f, RAMBlock *block, ram_addr_t offset) { size_t size, len; +bool same_block = (block == rs->last_sent_block); -if (block == rs->last_sent_block) { +if (same_block) { offset |= RAM_SAVE_FLAG_CONTINUE; } qemu_put_be64(f, offset); size = 8; -if (!(offset & RAM_SAVE_FLAG_CONTINUE)) { +if (!same_block) { len = strlen(block->idstr); qemu_put_byte(f, len); qemu_put_buffer(f, (uint8_t *)block->idstr, len); -- 2.38.1
[PULL 13/19] migration: Introduce pss_channel
From: Peter Xu Introduce pss_channel for PageSearchStatus, define it as "the migration channel to be used to transfer this host page". We used to have rs->f, which is a mirror to MigrationState.to_dst_file. After postcopy preempt initial version, rs->f can be dynamically changed depending on which channel we want to use. But that later work still doesn't grant full concurrency of sending pages in e.g. different threads, because rs->f can either be the PRECOPY channel or POSTCOPY channel. This needs to be per-thread too. PageSearchStatus is actually a good piece of struct which we can leverage if we want to have multiple threads sending pages. Sending a single guest page may not make sense, so we make the granule to be "host page", and in the PSS structure we allow specify a QEMUFile* to migrate a specific host page. Then we open the possibility to specify different channels in different threads with different PSS structures. The PSS prefix can be slightly misleading here because e.g. for the upcoming usage of postcopy channel/thread it's not "searching" (or, scanning) at all but sending the explicit page that was requested. However since PSS existed for some years keep it as-is until someone complains. This patch mostly (simply) replace rs->f with pss->pss_channel only. No functional change intended for this patch yet. But it does prepare to finally drop rs->f, and make ram_save_guest_page() thread safe. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 70 +++-- 1 file changed, 38 insertions(+), 32 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 4d7b50ef79..571d780987 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -453,6 +453,8 @@ void dirty_sync_missed_zero_copy(void) /* used by the search for pages to send */ struct PageSearchStatus { +/* The migration channel used for a specific host page */ +QEMUFile*pss_channel; /* Current block being searched */ RAMBlock*block; /* Current page to search from */ @@ -775,9 +777,9 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr) * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_xbzrle_page(RAMState *rs, uint8_t **current_data, -ram_addr_t current_addr, RAMBlock *block, -ram_addr_t offset) +static int save_xbzrle_page(RAMState *rs, QEMUFile *file, +uint8_t **current_data, ram_addr_t current_addr, +RAMBlock *block, ram_addr_t offset) { int encoded_len = 0, bytes_xbzrle; uint8_t *prev_cached_page; @@ -845,11 +847,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data, } /* Send XBZRLE based compressed page */ -bytes_xbzrle = save_page_header(rs, rs->f, block, +bytes_xbzrle = save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_XBZRLE); -qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE); -qemu_put_be16(rs->f, encoded_len); -qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len); +qemu_put_byte(file, ENCODING_FLAG_XBZRLE); +qemu_put_be16(file, encoded_len); +qemu_put_buffer(file, XBZRLE.encoded_buf, encoded_len); bytes_xbzrle += encoded_len + 1 + 2; /* * Like compressed_size (please see update_compress_thread_counts), @@ -1305,9 +1307,10 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file, * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset) +static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block, + ram_addr_t offset) { -int len = save_zero_page_to_file(rs, rs->f, block, offset); +int len = save_zero_page_to_file(rs, file, block, offset); if (len) { stat64_add(&ram_atomic_counters.duplicate, 1); @@ -1324,15 +1327,15 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset) * * Return true if the pages has been saved, otherwise false is returned. */ -static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset, - int *pages) +static bool control_save_page(PageSearchStatus *pss, RAMBlock *block, + ram_addr_t offset, int *pages) { uint64_t bytes_xmit = 0; int ret; *pages = -1; -ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE, -&bytes_xmit); +ret = ram_control_save_page(pss->pss_channel, block->offset, offset, +TARGET_PAGE_SIZE, &bytes_xmit); if (ret == RAM_SAVE_CONTROL_NOT_SUPP) { return
[PULL 04/19] migration: Export ram_release_page()
Signed-off-by: Juan Quintela Reviewed-by: Leonardo Bras --- migration/ram.h | 1 + migration/ram.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/ram.h b/migration/ram.h index e844966f69..038d52f49f 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -66,6 +66,7 @@ int ram_load_postcopy(QEMUFile *f, int channel); void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); void ram_transferred_add(uint64_t bytes); +void ram_release_page(const char *rbname, uint64_t offset); int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr); bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset); diff --git a/migration/ram.c b/migration/ram.c index 2cbe707bfc..8aad17c429 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1234,7 +1234,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs) } } -static void ram_release_page(const char *rbname, uint64_t offset) +void ram_release_page(const char *rbname, uint64_t offset) { if (!migrate_release_ram() || !migration_in_postcopy()) { return; -- 2.38.1
[PULL 02/19] multifd: Create page_count fields into both MultiFD{Recv, Send}Params
We were recalculating it left and right. We plan to change that values on next patches. Signed-off-by: Juan Quintela Reviewed-by: Leonardo Bras --- migration/multifd.h | 4 migration/multifd.c | 7 --- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 86fb9982b3..e2802a9ce2 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -82,6 +82,8 @@ typedef struct { uint32_t packet_len; /* guest page size */ uint32_t page_size; +/* number of pages in a full packet */ +uint32_t page_count; /* multifd flags for sending ram */ int write_flags; @@ -147,6 +149,8 @@ typedef struct { uint32_t packet_len; /* guest page size */ uint32_t page_size; +/* number of pages in a full packet */ +uint32_t page_count; /* syncs main thread and channels */ QemuSemaphore sem_sync; diff --git a/migration/multifd.c b/migration/multifd.c index efffa77a76..b8dc559d24 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -279,7 +279,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p) static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) { MultiFDPacket_t *packet = p->packet; -uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size; RAMBlock *block; int i; @@ -306,10 +305,10 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) * If we received a packet that is 100 times bigger than expected * just stop migration. It is a magic number. */ -if (packet->pages_alloc > page_count) { +if (packet->pages_alloc > p->page_count) { error_setg(errp, "multifd: received packet " "with size %u and expected a size of %u", - packet->pages_alloc, page_count) ; + packet->pages_alloc, p->page_count) ; return -1; } @@ -944,6 +943,7 @@ int multifd_save_setup(Error **errp) p->iov = g_new0(struct iovec, page_count + 1); p->normal = g_new0(ram_addr_t, page_count); p->page_size = qemu_target_page_size(); +p->page_count = page_count; if (migrate_use_zero_copy_send()) { p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; @@ -1191,6 +1191,7 @@ int multifd_load_setup(Error **errp) p->name = g_strdup_printf("multifdrecv_%d", i); p->iov = g_new0(struct iovec, page_count); p->normal = g_new0(ram_addr_t, page_count); +p->page_count = page_count; p->page_size = qemu_target_page_size(); } -- 2.38.1