Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
Am 12.11.2015 um 14:36 schrieb Markus Armbruster: > Peter Maydellwrites: > >> [...] it's usually painful to get a >> backtrace out of this kind of qtest, because it's clearly starting >> a whole pile of QEMUs and there's no way I know of to say "only >> run a few of these tests, not the whole huge pile". > > Not sure this helps, but here goes anyway: run the make with V=1, fish > out the gtester line, add -p /name/of/the/test. Here's an example from > my shell history: > > $ MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k > --verbose -m=quick -p /dicts/large_dict tests/check-qjson You can just run the test from the command line. For QTests you also need to pass a variable telling it which QEMU binary to use and since recently also which qemu-img or so. If someone has an idea how we could implement make check-qtest-- or check-unit- that would be appreciated. check-qtest- would seem to collide with our wildcard handling of targets, but might be handy, too. Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
On 10 November 2015 at 14:25, Juan Quintelawrote: > From: "Dr. David Alan Gilbert" > > When transmitting RAM pages, consume pages that have been queued by > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. > > Note: > a) After a queued page the linear walk carries on from after the > unqueued page; there is a reasonable chance that the destination > was about to ask for other closeby pages anyway. > > b) We have to be careful of any assumptions that the page walking > code makes, in particular it does some short cuts on its first linear > walk that break as soon as we do a queued page. > > c) We have to be careful to not break up host-page size chunks, since > this makes it harder to place the pages on the destination. > > Signed-off-by: Dr. David Alan Gilbert > Reviewed-by: Juan Quintela > Signed-off-by: Juan Quintela I've just discovered that this is causing 'make check' failures on my OSX host (unfortunately something in my setup is causing 'make check' failures to not always cause a build failure, so I didn't notice earlier): manooth$ (make -C build/x86 -j8 && cd build/x86 && QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} tests/ahci-test) [...] /x86_64/ahci/flush/simple: OK /x86_64/ahci/flush/retry: OK /x86_64/ahci/flush/migrate: qemu: qemu_mutex_lock: Invalid argument qemu-system-x86_64:Broken pipe Not a migration stream qemu-system-x86_64: load of migration failed: Invalid argument thanks -- PMM
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
On 12 November 2015 at 12:04, Dr. David Alan Gilbertwrote: > * Peter Maydell (peter.mayd...@linaro.org) wrote: >> On 10 November 2015 at 14:25, Juan Quintela wrote: >> > From: "Dr. David Alan Gilbert" >> > >> > When transmitting RAM pages, consume pages that have been queued by >> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. >> > >> > Note: >> > a) After a queued page the linear walk carries on from after the >> > unqueued page; there is a reasonable chance that the destination >> > was about to ask for other closeby pages anyway. >> > >> > b) We have to be careful of any assumptions that the page walking >> > code makes, in particular it does some short cuts on its first linear >> > walk that break as soon as we do a queued page. >> > >> > c) We have to be careful to not break up host-page size chunks, since >> > this makes it harder to place the pages on the destination. >> > >> > Signed-off-by: Dr. David Alan Gilbert >> > Reviewed-by: Juan Quintela >> > Signed-off-by: Juan Quintela >> >> I've just discovered that this is causing 'make check' failures on >> my OSX host (unfortunately something in my setup is causing >> 'make check' failures to not always cause a build failure, so I >> didn't notice earlier): > > It's only failing on OSX? Every time or only sometimes? Only OSX, and always. I think OSX is pickier about mutexes really needing to be initialized before use. > If you can find a way to get a backtrace off that qemu_mutex_lock case > that would be great; I'd assume the later errors are the fall out from that. I'll have a look after lunch, but it's usually painful to get a backtrace out of this kind of qtest, because it's clearly starting a whole pile of QEMUs and there's no way I know of to say "only run a few of these tests, not the whole huge pile". thanks -- PMM
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On 12 November 2015 at 12:04, Dr. David Alan Gilbert >wrote: > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > >> On 10 November 2015 at 14:25, Juan Quintela wrote: > >> > From: "Dr. David Alan Gilbert" > >> > > >> > When transmitting RAM pages, consume pages that have been queued by > >> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. > >> > > >> > Note: > >> > a) After a queued page the linear walk carries on from after the > >> > unqueued page; there is a reasonable chance that the destination > >> > was about to ask for other closeby pages anyway. > >> > > >> > b) We have to be careful of any assumptions that the page walking > >> > code makes, in particular it does some short cuts on its first linear > >> > walk that break as soon as we do a queued page. > >> > > >> > c) We have to be careful to not break up host-page size chunks, since > >> > this makes it harder to place the pages on the destination. > >> > > >> > Signed-off-by: Dr. David Alan Gilbert > >> > Reviewed-by: Juan Quintela > >> > Signed-off-by: Juan Quintela > >> > >> I've just discovered that this is causing 'make check' failures on > >> my OSX host (unfortunately something in my setup is causing > >> 'make check' failures to not always cause a build failure, so I > >> didn't notice earlier): > > > > It's only failing on OSX? Every time or only sometimes? > > Only OSX, and always. I think OSX is pickier about mutexes really > needing to be initialized before use. OK, at least an 'always' should be easier to debug. > > If you can find a way to get a backtrace off that qemu_mutex_lock case > > that would be great; I'd assume the later errors are the fall out from that. > > I'll have a look after lunch, but it's usually painful to get a > backtrace out of this kind of qtest, because it's clearly starting > a whole pile of QEMUs and there's no way I know of to say "only > run a few of these tests, not the whole huge pile". You could add an abort/assert into util/qemu-thread-posix.c qemu_mutex_lock in the error path. Could you also add: diff --git a/migration/migration.c b/migration/migration.c index 9bd2ce7..85e5766 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -93,6 +93,7 @@ MigrationState *migrate_get_current(void) }; if (!once) { +fprintf(stderr,"migrate_get_current do init of current_migration %d\n", getpid()); qemu_mutex_init(_migration.src_page_req_mutex); once = true; } diff --git a/migration/ram.c b/migration/ram.c index 4266687..72b46f2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1036,6 +1036,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset, { RAMBlock *block = NULL; +fprintf(stderr,"unqueue_page %d\n", getpid()); qemu_mutex_lock(>src_page_req_mutex); if (!QSIMPLEQ_EMPTY(>src_page_requests)) { struct MigrationSrcPageRequest *entry = and make sure that the init happens before the first unqueue (you'll get loads of calls to unqueue). Dave > > thanks > -- PMM -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
On 12 November 2015 at 12:23, Dr. David Alan Gilbertwrote: > * Peter Maydell (peter.mayd...@linaro.org) wrote: >> On 12 November 2015 at 12:04, Dr. David Alan Gilbert >> wrote: >> > * Peter Maydell (peter.mayd...@linaro.org) wrote: >> >> On 10 November 2015 at 14:25, Juan Quintela wrote: >> >> > From: "Dr. David Alan Gilbert" >> >> > >> >> > When transmitting RAM pages, consume pages that have been queued by >> >> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. >> >> > >> >> > Note: >> >> > a) After a queued page the linear walk carries on from after the >> >> > unqueued page; there is a reasonable chance that the destination >> >> > was about to ask for other closeby pages anyway. >> >> > >> >> > b) We have to be careful of any assumptions that the page walking >> >> > code makes, in particular it does some short cuts on its first linear >> >> > walk that break as soon as we do a queued page. >> >> > >> >> > c) We have to be careful to not break up host-page size chunks, since >> >> > this makes it harder to place the pages on the destination. >> >> > >> >> > Signed-off-by: Dr. David Alan Gilbert >> >> > Reviewed-by: Juan Quintela >> >> > Signed-off-by: Juan Quintela >> >> >> >> I've just discovered that this is causing 'make check' failures on >> >> my OSX host (unfortunately something in my setup is causing >> >> 'make check' failures to not always cause a build failure, so I >> >> didn't notice earlier): >> > >> > It's only failing on OSX? Every time or only sometimes? >> >> Only OSX, and always. I think OSX is pickier about mutexes really >> needing to be initialized before use. > > OK, at least an 'always' should be easier to debug. > >> > If you can find a way to get a backtrace off that qemu_mutex_lock case >> > that would be great; I'd assume the later errors are the fall out from >> > that. >> >> I'll have a look after lunch, but it's usually painful to get a >> backtrace out of this kind of qtest, because it's clearly starting >> a whole pile of QEMUs and there's no way I know of to say "only >> run a few of these tests, not the whole huge pile". > > You could add an abort/assert into util/qemu-thread-posix.c qemu_mutex_lock > in the error path. abort/assert doesn't print a backtrace. I added some OSX backtrace-gathering/printing functions to the errorpath, and got this: 0 qemu-system-x86_64 0x00010c66d203 qemu_mutex_lock + 83 1 qemu-system-x86_64 0x00010c2ac7af unqueue_page + 47 2 qemu-system-x86_64 0x00010c2ac386 get_queued_page + 54 3 qemu-system-x86_64 0x00010c2ac135 ram_find_and_save_block + 165 4 qemu-system-x86_64 0x00010c2ab5a2 ram_save_iterate + 130 5 qemu-system-x86_64 0x00010c2afa2e qemu_savevm_state_iterate + 302 6 qemu-system-x86_64 0x00010c53acbb migration_thread + 571 7 libsystem_pthread.dylib 0x7fff9146c05a _pthread_body + 131 8 libsystem_pthread.dylib 0x7fff9146bfd7 _pthread_body + 0 9 libsystem_pthread.dylib 0x7fff914693ed thread_start + 13 > > Could you also add: > > diff --git a/migration/migration.c b/migration/migration.c > index 9bd2ce7..85e5766 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -93,6 +93,7 @@ MigrationState *migrate_get_current(void) > }; > > if (!once) { > +fprintf(stderr,"migrate_get_current do init of current_migration > %d\n", getpid()); > qemu_mutex_init(_migration.src_page_req_mutex); > once = true; > } > diff --git a/migration/ram.c b/migration/ram.c > index 4266687..72b46f2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1036,6 +1036,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, > ram_addr_t *offset, > { > RAMBlock *block = NULL; > > +fprintf(stderr,"unqueue_page %d\n", getpid()); > qemu_mutex_lock(>src_page_req_mutex); > if (!QSIMPLEQ_EMPTY(>src_page_requests)) { > struct MigrationSrcPageRequest *entry = > > > and make sure that the init happens before the first unqueue (you'll get > loads of calls to unqueue). With that change, plus the backtracing: /x86_64/ahci/flush/retry: OK /x86_64/ahci/flush/migrate: migrate_get_current do init of current_migration 60427 migrate_get_current do init of current_migration 60428 unqueue_page 60427 0 qemu-system-x86_64 0x000101a751c3 qemu_mutex_lock + 83 1 qemu-system-x86_64 0x0001016b4749 unqueue_page + 89 2 qemu-system-x86_64 0x0001016b42f6 get_queued_page + 54 3 qemu-system-x86_64 0x0001016b40a5 ram_find_and_save_block + 165 4 qemu-system-x86_64 0x0001016b3512 ram_save_iterate +
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On 10 November 2015 at 14:25, Juan Quintelawrote: > > From: "Dr. David Alan Gilbert" > > > > When transmitting RAM pages, consume pages that have been queued by > > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. > > > > Note: > > a) After a queued page the linear walk carries on from after the > > unqueued page; there is a reasonable chance that the destination > > was about to ask for other closeby pages anyway. > > > > b) We have to be careful of any assumptions that the page walking > > code makes, in particular it does some short cuts on its first linear > > walk that break as soon as we do a queued page. > > > > c) We have to be careful to not break up host-page size chunks, since > > this makes it harder to place the pages on the destination. > > > > Signed-off-by: Dr. David Alan Gilbert > > Reviewed-by: Juan Quintela > > Signed-off-by: Juan Quintela > > I've just discovered that this is causing 'make check' failures on > my OSX host (unfortunately something in my setup is causing > 'make check' failures to not always cause a build failure, so I > didn't notice earlier): It's only failing on OSX? Every time or only sometimes? > manooth$ (make -C build/x86 -j8 && cd build/x86 && > QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 > QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % > 255 + 1))} tests/ahci-test) > [...] > > /x86_64/ahci/flush/simple: OK > /x86_64/ahci/flush/retry: OK > /x86_64/ahci/flush/migrate: qemu: qemu_mutex_lock: Invalid argument > qemu-system-x86_64:Broken pipe > Not a migration stream > qemu-system-x86_64: load of migration failed: Invalid argument OK, let me see if I can find - I do add a mutex in there (src_page_req_mutex) that I guess is the most likely culprit; it's initialised by migrate_get_current (once using a bool) so it should be OK. If you can find a way to get a backtrace off that qemu_mutex_lock case that would be great; I'd assume the later errors are the fall out from that. Dave > > thanks > -- PMM -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On 12 November 2015 at 13:18, Peter Maydellwrote: > > On 12 November 2015 at 13:08, Dr. David Alan Gilbert > > wrote: > >> OK, can you try a simple migration by hand outside of the test harness; > >> just something simple like: > >> > >> ./bin/qemu-system-x86_64 -M pc -nographic > >> (qemu) migrate "exec: cat > /dev/null" > >> > >> and the same with q35 ? > > > > (qemu) migrate "exec: cat > /dev/null" > > migrate_get_current do init of current_migration 65307 > > unqueue_page 65307 > > 0 qemu-system-x86_64 0x0001067c01c3 qemu_mutex_lock > > + 83 > > This turns out to be because migrate_init() is corrupting the > mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably > Linux's initialized-mutex is all-zeroes, but OSX's is not. OK, thanks for finding that; I've just smoke tested the following patch and will post it properly after I test it more thoroughly in a couple of hours. Dave commit 689d4964442c3ee34a2dac77411a30b96c214288 Author: Dr. David Alan Gilbert Date: Thu Nov 12 14:10:33 2015 + migration_init: Fix lock initialisation/make it explicit Peter reported a lock error on MacOS after my a82d593b patch. migrate_get_current does one-time initialisation of a bunch of variables. migrate_init does reinitialisation even on a 2nd migrate after a cancel. The problem here was that I'd initialised the mutex in migrate_get_current, and the memset in migrate_init corrupted it. Remove the memset and replace it by explicit initialisation of fields that need initialising; this also turns out to be simpler than the old code that had to preserve some fields. Reported-by: Peter Maydell Signed-off-by: Dr. David Alan Gilbert Fixes: a82d593b diff --git a/migration/migration.c b/migration/migration.c index 9bd2ce7..7e4e27b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -902,38 +902,31 @@ bool migration_in_postcopy(MigrationState *s) MigrationState *migrate_init(const MigrationParams *params) { MigrationState *s = migrate_get_current(); -int64_t bandwidth_limit = s->bandwidth_limit; -bool enabled_capabilities[MIGRATION_CAPABILITY_MAX]; -int64_t xbzrle_cache_size = s->xbzrle_cache_size; -int compress_level = s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL]; -int compress_thread_count = -s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS]; -int decompress_thread_count = -s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS]; -int x_cpu_throttle_initial = -s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL]; -int x_cpu_throttle_increment = -s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT]; - -memcpy(enabled_capabilities, s->enabled_capabilities, - sizeof(enabled_capabilities)); -memset(s, 0, sizeof(*s)); +/* + * Reinitialise all migration state, except + * parameters/capabilities that the user set, and + * locks. + */ +s->bytes_xfer = 0; +s->xfer_limit = 0; +s->cleanup_bh = 0; +s->file = NULL; +s->state = MIGRATION_STATUS_NONE; s->params = *params; -memcpy(s->enabled_capabilities, enabled_capabilities, - sizeof(enabled_capabilities)); -s->xbzrle_cache_size = xbzrle_cache_size; - -s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level; -s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS] = - compress_thread_count; -s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] = - decompress_thread_count; -s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL] = -x_cpu_throttle_initial; -s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] = -x_cpu_throttle_increment; -s->bandwidth_limit = bandwidth_limit; +s->rp_state.from_dst_file = NULL; +s->rp_state.error = false; +s->mbps = 0.0; +s->downtime = 0; +s->expected_downtime = 0; +s->dirty_pages_rate = 0; +s->dirty_bytes_rate = 0; +s->setup_time = 0; +s->dirty_sync_count = 0; +s->start_postcopy = false; +s->migration_thread_running = false; +s->last_req_rb = NULL; + migrate_set_state(s, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP); QSIMPLEQ_INIT(>src_page_requests); -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
* Peter Maydell (peter.mayd...@linaro.org) wrote: > abort/assert doesn't print a backtrace. > > I added some OSX backtrace-gathering/printing functions to the errorpath, > and got this: > > 0 qemu-system-x86_64 0x00010c66d203 qemu_mutex_lock + > 83 > 1 qemu-system-x86_64 0x00010c2ac7af unqueue_page + 47 OK, it is the mutex that I added in that patch. > 2 qemu-system-x86_64 0x00010c2ac386 get_queued_page + > 54 > 3 qemu-system-x86_64 0x00010c2ac135 > ram_find_and_save_block + 165 > 4 qemu-system-x86_64 0x00010c2ab5a2 > ram_save_iterate + 130 > 5 qemu-system-x86_64 0x00010c2afa2e > qemu_savevm_state_iterate + 302 > 6 qemu-system-x86_64 0x00010c53acbb > migration_thread + 571 > 7 libsystem_pthread.dylib 0x7fff9146c05a _pthread_body + 131 > 8 libsystem_pthread.dylib 0x7fff9146bfd7 _pthread_body + 0 > 9 libsystem_pthread.dylib 0x7fff914693ed thread_start + 13 > > > > > > Could you also add: > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 9bd2ce7..85e5766 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -93,6 +93,7 @@ MigrationState *migrate_get_current(void) > > }; > > > > if (!once) { > > +fprintf(stderr,"migrate_get_current do init of current_migration > > %d\n", getpid()); > > qemu_mutex_init(_migration.src_page_req_mutex); > > once = true; > > } > > diff --git a/migration/ram.c b/migration/ram.c > > index 4266687..72b46f2 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1036,6 +1036,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, > > ram_addr_t *offset, > > { > > RAMBlock *block = NULL; > > > > +fprintf(stderr,"unqueue_page %d\n", getpid()); > > qemu_mutex_lock(>src_page_req_mutex); > > if (!QSIMPLEQ_EMPTY(>src_page_requests)) { > > struct MigrationSrcPageRequest *entry = > > > > > > and make sure that the init happens before the first unqueue (you'll get > > loads of calls to unqueue). > > With that change, plus the backtracing: > > /x86_64/ahci/flush/retry: OK > /x86_64/ahci/flush/migrate: migrate_get_current do init of > current_migration 60427 > migrate_get_current do init of current_migration 60428 > unqueue_page 60427 > 0 qemu-system-x86_64 0x000101a751c3 qemu_mutex_lock + > 83 > 1 qemu-system-x86_64 0x0001016b4749 unqueue_page + 89 OK, so the lock I added is apparently being initialised before it's being locked, so that's good. OK, can you try a simple migration by hand outside of the test harness; just something simple like: ./bin/qemu-system-x86_64 -M pc -nographic (qemu) migrate "exec: cat > /dev/null" and the same with q35 ? Dave > 2 qemu-system-x86_64 0x0001016b42f6 get_queued_page + > 54 > 3 qemu-system-x86_64 0x0001016b40a5 > ram_find_and_save_block + 165 > 4 qemu-system-x86_64 0x0001016b3512 > ram_save_iterate + 130 > 5 qemu-system-x86_64 0x0001016b79be > qemu_savevm_state_iterate + 302 > 6 qemu-system-x86_64 0x000101942c7b > migration_thread + 571 > 7 libsystem_pthread.dylib 0x7fff9146c05a _pthread_body + 131 > 8 libsystem_pthread.dylib 0x7fff9146bfd7 _pthread_body + 0 > 9 libsystem_pthread.dylib 0x7fff914693ed thread_start + 13 > qemu: qemu_mutex_lock: Invalid argument > qemu-system-x86_64:Broken pipe > Not a migration stream > qemu-system-x86_64: load of migration failed: Invalid argument > > thanks > -- PMM -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
Peter Maydellwrites: > On 12 November 2015 at 12:04, Dr. David Alan Gilbert > wrote: >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>> On 10 November 2015 at 14:25, Juan Quintela wrote: >>> > From: "Dr. David Alan Gilbert" >>> > >>> > When transmitting RAM pages, consume pages that have been queued by >>> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. >>> > >>> > Note: >>> > a) After a queued page the linear walk carries on from after the >>> > unqueued page; there is a reasonable chance that the destination >>> > was about to ask for other closeby pages anyway. >>> > >>> > b) We have to be careful of any assumptions that the page walking >>> > code makes, in particular it does some short cuts on its first linear >>> > walk that break as soon as we do a queued page. >>> > >>> > c) We have to be careful to not break up host-page size chunks, since >>> > this makes it harder to place the pages on the destination. >>> > >>> > Signed-off-by: Dr. David Alan Gilbert >>> > Reviewed-by: Juan Quintela >>> > Signed-off-by: Juan Quintela >>> >>> I've just discovered that this is causing 'make check' failures on >>> my OSX host (unfortunately something in my setup is causing >>> 'make check' failures to not always cause a build failure, so I >>> didn't notice earlier): >> >> It's only failing on OSX? Every time or only sometimes? > > Only OSX, and always. I think OSX is pickier about mutexes really > needing to be initialized before use. > >> If you can find a way to get a backtrace off that qemu_mutex_lock case >> that would be great; I'd assume the later errors are the fall out from that. > > I'll have a look after lunch, but it's usually painful to get a > backtrace out of this kind of qtest, because it's clearly starting > a whole pile of QEMUs and there's no way I know of to say "only > run a few of these tests, not the whole huge pile". Not sure this helps, but here goes anyway: run the make with V=1, fish out the gtester line, add -p /name/of/the/test. Here's an example from my shell history: $ MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k --verbose -m=quick -p /dicts/large_dict tests/check-qjson
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
On 12 November 2015 at 13:08, Dr. David Alan Gilbertwrote: > OK, can you try a simple migration by hand outside of the test harness; > just something simple like: > > ./bin/qemu-system-x86_64 -M pc -nographic > (qemu) migrate "exec: cat > /dev/null" > > and the same with q35 ? (qemu) migrate "exec: cat > /dev/null" migrate_get_current do init of current_migration 65307 unqueue_page 65307 0 qemu-system-x86_64 0x0001067c01c3 qemu_mutex_lock + 83 1 qemu-system-x86_64 0x0001063ff749 unqueue_page + 89 2 qemu-system-x86_64 0x0001063ff2f6 get_queued_page + 54 3 qemu-system-x86_64 0x0001063ff0a5 ram_find_and_save_block + 165 4 qemu-system-x86_64 0x0001063fe512 ram_save_iterate + 130 5 qemu-system-x86_64 0x0001064029be qemu_savevm_state_iterate + 302 6 qemu-system-x86_64 0x00010668dc7b migration_thread + 571 7 libsystem_pthread.dylib 0x7fff9146c05a _pthread_body + 131 8 libsystem_pthread.dylib 0x7fff9146bfd7 _pthread_body + 0 9 libsystem_pthread.dylib 0x7fff914693ed thread_start + 13 (identical behaviour for q35 and pc). thanks -- PMM
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
On 12 November 2015 at 13:18, Peter Maydellwrote: > On 12 November 2015 at 13:08, Dr. David Alan Gilbert > wrote: >> OK, can you try a simple migration by hand outside of the test harness; >> just something simple like: >> >> ./bin/qemu-system-x86_64 -M pc -nographic >> (qemu) migrate "exec: cat > /dev/null" >> >> and the same with q35 ? > > (qemu) migrate "exec: cat > /dev/null" > migrate_get_current do init of current_migration 65307 > unqueue_page 65307 > 0 qemu-system-x86_64 0x0001067c01c3 qemu_mutex_lock + > 83 This turns out to be because migrate_init() is corrupting the mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably Linux's initialized-mutex is all-zeroes, but OSX's is not. thanks -- PMM
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
Peter Maydellwrote: > On 12 November 2015 at 13:18, Peter Maydell wrote: >> On 12 November 2015 at 13:08, Dr. David Alan Gilbert >> wrote: >>> OK, can you try a simple migration by hand outside of the test harness; >>> just something simple like: >>> >>> ./bin/qemu-system-x86_64 -M pc -nographic >>> (qemu) migrate "exec: cat > /dev/null" >>> >>> and the same with q35 ? >> >> (qemu) migrate "exec: cat > /dev/null" >> migrate_get_current do init of current_migration 65307 >> unqueue_page 65307 >> 0 qemu-system-x86_64 0x0001067c01c3 qemu_mutex_lock + >> 83 > > This turns out to be because migrate_init() is corrupting the > mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably > Linux's initialized-mutex is all-zeroes, but OSX's is not. grrr, that is probably the case. Thanks for pinponting this. > > thanks > -- PMM
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
* Juan Quintela (quint...@redhat.com) wrote: > "Dr. David Alan Gilbert"wrote: > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > >> On 12 November 2015 at 13:18, Peter Maydell > >> wrote: > >> > On 12 November 2015 at 13:08, Dr. David Alan Gilbert > >> > wrote: > >> >> OK, can you try a simple migration by hand outside of the test harness; > >> >> just something simple like: > >> >> > >> >> ./bin/qemu-system-x86_64 -M pc -nographic > >> >> (qemu) migrate "exec: cat > /dev/null" > >> >> > >> >> and the same with q35 ? > >> > > >> > (qemu) migrate "exec: cat > /dev/null" > >> > migrate_get_current do init of current_migration 65307 > >> > unqueue_page 65307 > >> > 0 qemu-system-x86_64 0x0001067c01c3 > >> > qemu_mutex_lock + 83 > >> > >> This turns out to be because migrate_init() is corrupting the > >> mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably > >> Linux's initialized-mutex is all-zeroes, but OSX's is not. > > > > OK, thanks for finding that; I've just smoke tested the following > > patch and will post it properly after I test it more thoroughly in > > a couple of hours. > > I did a patch that was almost identical. It is passing for me virt-test. and the one I posted seems to survive postcopy as well; so looks good. Dave > > Later, Juan. -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
"Dr. David Alan Gilbert"wrote: > * Peter Maydell (peter.mayd...@linaro.org) wrote: >> On 12 November 2015 at 13:18, Peter Maydell wrote: >> > On 12 November 2015 at 13:08, Dr. David Alan Gilbert >> > wrote: >> >> OK, can you try a simple migration by hand outside of the test harness; >> >> just something simple like: >> >> >> >> ./bin/qemu-system-x86_64 -M pc -nographic >> >> (qemu) migrate "exec: cat > /dev/null" >> >> >> >> and the same with q35 ? >> > >> > (qemu) migrate "exec: cat > /dev/null" >> > migrate_get_current do init of current_migration 65307 >> > unqueue_page 65307 >> > 0 qemu-system-x86_64 0x0001067c01c3 qemu_mutex_lock >> > + 83 >> >> This turns out to be because migrate_init() is corrupting the >> mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably >> Linux's initialized-mutex is all-zeroes, but OSX's is not. > > OK, thanks for finding that; I've just smoke tested the following > patch and will post it properly after I test it more thoroughly in > a couple of hours. I did a patch that was almost identical. It is passing for me virt-test. Later, Juan.
[Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
From: "Dr. David Alan Gilbert"When transmitting RAM pages, consume pages that have been queued by MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. Note: a) After a queued page the linear walk carries on from after the unqueued page; there is a reasonable chance that the destination was about to ask for other closeby pages anyway. b) We have to be careful of any assumptions that the page walking code makes, in particular it does some short cuts on its first linear walk that break as soon as we do a queued page. c) We have to be careful to not break up host-page size chunks, since this makes it harder to place the pages on the destination. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 249 +--- trace-events| 2 + 2 files changed, 220 insertions(+), 31 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 8302d09..d09d5ab 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -548,9 +548,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data, * Returns: byte offset within memory region of the start of a dirty page */ static inline -ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb, - ram_addr_t start, - ram_addr_t *ram_addr_abs) +ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb, + ram_addr_t start, + ram_addr_t *ram_addr_abs) { unsigned long base = rb->offset >> TARGET_PAGE_BITS; unsigned long nr = base + (start >> TARGET_PAGE_BITS); @@ -567,15 +567,24 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb, next = find_next_bit(bitmap, size, nr); } -if (next < size) { -clear_bit(next, bitmap); -migration_dirty_pages--; -} *ram_addr_abs = next << TARGET_PAGE_BITS; return (next - base) << TARGET_PAGE_BITS; } -/* Called with rcu_read_lock() to protect migration_bitmap */ +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr) +{ +bool ret; +int nr = addr >> TARGET_PAGE_BITS; +unsigned long *bitmap = atomic_rcu_read(_bitmap_rcu)->bmap; + +ret = test_and_clear_bit(nr, bitmap); + +if (ret) { +migration_dirty_pages--; +} +return ret; +} + static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) { unsigned long *bitmap; @@ -974,9 +983,8 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block, static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss, bool *again, ram_addr_t *ram_addr_abs) { -pss->offset = migration_bitmap_find_and_reset_dirty(pss->block, - pss->offset, - ram_addr_abs); +pss->offset = migration_bitmap_find_dirty(pss->block, pss->offset, + ram_addr_abs); if (pss->complete_round && pss->block == last_seen_block && pss->offset >= last_offset) { /* @@ -1015,6 +1023,107 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss, } } +/* + * Helper for 'get_queued_page' - gets a page off the queue + * ms: MigrationState in + * *offset: Used to return the offset within the RAMBlock + * ram_addr_abs: global offset in the dirty/sent bitmaps + * + * Returns: block (or NULL if none available) + */ +static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset, + ram_addr_t *ram_addr_abs) +{ +RAMBlock *block = NULL; + +qemu_mutex_lock(>src_page_req_mutex); +if (!QSIMPLEQ_EMPTY(>src_page_requests)) { +struct MigrationSrcPageRequest *entry = +QSIMPLEQ_FIRST(>src_page_requests); +block = entry->rb; +*offset = entry->offset; +*ram_addr_abs = (entry->offset + entry->rb->offset) & +TARGET_PAGE_MASK; + +if (entry->len > TARGET_PAGE_SIZE) { +entry->len -= TARGET_PAGE_SIZE; +entry->offset += TARGET_PAGE_SIZE; +} else { +memory_region_unref(block->mr); +QSIMPLEQ_REMOVE_HEAD(>src_page_requests, next_req); +g_free(entry); +} +} +qemu_mutex_unlock(>src_page_req_mutex); + +return block; +} + +/* + * Unqueue a page from the queue fed by postcopy page requests; skips pages + * that are already sent (!dirty) + * + * ms: MigrationState in + * pss: PageSearchStatus structure updated with found block/offset + * ram_addr_abs: global offset in the dirty/sent bitmaps + * + * Returns: true if a queued page is
[Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
From: "Dr. David Alan Gilbert"When transmitting RAM pages, consume pages that have been queued by MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning. Note: a) After a queued page the linear walk carries on from after the unqueued page; there is a reasonable chance that the destination was about to ask for other closeby pages anyway. b) We have to be careful of any assumptions that the page walking code makes, in particular it does some short cuts on its first linear walk that break as soon as we do a queued page. c) We have to be careful to not break up host-page size chunks, since this makes it harder to place the pages on the destination. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 249 +--- trace-events| 2 + 2 files changed, 220 insertions(+), 31 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 8302d09..d09d5ab 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -548,9 +548,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data, * Returns: byte offset within memory region of the start of a dirty page */ static inline -ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb, - ram_addr_t start, - ram_addr_t *ram_addr_abs) +ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb, + ram_addr_t start, + ram_addr_t *ram_addr_abs) { unsigned long base = rb->offset >> TARGET_PAGE_BITS; unsigned long nr = base + (start >> TARGET_PAGE_BITS); @@ -567,15 +567,24 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb, next = find_next_bit(bitmap, size, nr); } -if (next < size) { -clear_bit(next, bitmap); -migration_dirty_pages--; -} *ram_addr_abs = next << TARGET_PAGE_BITS; return (next - base) << TARGET_PAGE_BITS; } -/* Called with rcu_read_lock() to protect migration_bitmap */ +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr) +{ +bool ret; +int nr = addr >> TARGET_PAGE_BITS; +unsigned long *bitmap = atomic_rcu_read(_bitmap_rcu)->bmap; + +ret = test_and_clear_bit(nr, bitmap); + +if (ret) { +migration_dirty_pages--; +} +return ret; +} + static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) { unsigned long *bitmap; @@ -974,9 +983,8 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block, static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss, bool *again, ram_addr_t *ram_addr_abs) { -pss->offset = migration_bitmap_find_and_reset_dirty(pss->block, - pss->offset, - ram_addr_abs); +pss->offset = migration_bitmap_find_dirty(pss->block, pss->offset, + ram_addr_abs); if (pss->complete_round && pss->block == last_seen_block && pss->offset >= last_offset) { /* @@ -1015,6 +1023,107 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss, } } +/* + * Helper for 'get_queued_page' - gets a page off the queue + * ms: MigrationState in + * *offset: Used to return the offset within the RAMBlock + * ram_addr_abs: global offset in the dirty/sent bitmaps + * + * Returns: block (or NULL if none available) + */ +static RAMBlock *unqueue_page(MigrationState *ms, ram_addr_t *offset, + ram_addr_t *ram_addr_abs) +{ +RAMBlock *block = NULL; + +qemu_mutex_lock(>src_page_req_mutex); +if (!QSIMPLEQ_EMPTY(>src_page_requests)) { +struct MigrationSrcPageRequest *entry = +QSIMPLEQ_FIRST(>src_page_requests); +block = entry->rb; +*offset = entry->offset; +*ram_addr_abs = (entry->offset + entry->rb->offset) & +TARGET_PAGE_MASK; + +if (entry->len > TARGET_PAGE_SIZE) { +entry->len -= TARGET_PAGE_SIZE; +entry->offset += TARGET_PAGE_SIZE; +} else { +memory_region_unref(block->mr); +QSIMPLEQ_REMOVE_HEAD(>src_page_requests, next_req); +g_free(entry); +} +} +qemu_mutex_unlock(>src_page_req_mutex); + +return block; +} + +/* + * Unqueue a page from the queue fed by postcopy page requests; skips pages + * that are already sent (!dirty) + * + * ms: MigrationState in + * pss: PageSearchStatus structure updated with found block/offset + * ram_addr_abs: global offset in the dirty/sent bitmaps + * + * Returns: true if a queued page is