Re: [Qemu-devel] [PATCH v3 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Hi, I had time to investigate more on this problem. On 16/05/2018 15:43, Laurent Vivier wrote: > Hi Bala, > > I've tested you patch migrating a pseries between a P9 host and a P8 > host with 1G huge page size on the P9 side and 16MB on P8 side and the > information are strange now. > > "remaining ram" doesn't change, and after a while it can be set to "0" > and estimated downtime is 0 too, but the migration is not completed and > "transferred ram" continues to increase. > > so think there is a problem somewhere... > > thanks, > Laurent > > On 01/05/2018 16:37, Balamuruhan S wrote: >> Hi, >> >> Dave, David and Juan if you guys are okay with the patch, please >> help to merge it. >> >> Thanks, >> Bala >> >> On Wed, Apr 25, 2018 at 12:40:40PM +0530, Balamuruhan S wrote: >>> expected_downtime value is not accurate with dirty_pages_rate * page_size, >>> using ram_bytes_remaining would yeild it correct. It will initially be a >>> gross over-estimate, but for for non-converging migrations it should >>> approach a reasonable estimate later on. >>> >>> currently bandwidth and expected_downtime value are calculated in >>> migration_update_counters() during each iteration from >>> migration_thread(), where as remaining ram is calculated in >>> qmp_query_migrate() when we actually call "info migrate". Due to this >>> there is some difference in expected_downtime value being calculated. >>> >>> with this patch bandwidth, expected_downtime and remaining ram are >>> calculated in migration_update_counters(), retrieve the same value during >>> "info migrate". By this approach we get almost close enough value. >>> >>> Reported-by: Michael Roth >>> Signed-off-by: Balamuruhan S >>> --- >>> migration/migration.c | 11 --- >>> migration/migration.h | 1 + >>> 2 files changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/migration/migration.c b/migration/migration.c >>> index 52a5092add..5d721ee481 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, >>> MigrationState *s) >>> } >>> >>> if (s->state != MIGRATION_STATUS_COMPLETED) { >>> -info->ram->remaining = ram_bytes_remaining(); >>> +info->ram->remaining = s->ram_bytes_remaining; Don't remove the ram_byte_remaining(), it is updated more often, and give a better information about the state of memory. (this why in my test case I have a "remaining" ram" freezed) >>> info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate; >>> } >>> } >>> @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState >>> *s, >>> transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes; >>> time_spent = current_time - s->iteration_start_time; >>> bandwidth = (double)transferred / time_spent; >>> +s->ram_bytes_remaining = ram_bytes_remaining(); >>> s->threshold_size = bandwidth * s->parameters.downtime_limit; To have an accurate value, we must read the remaining ram just after having updated the dirty pages count, so I think after migration_bitmap_sync_range() in migration_bitmap_sync() >>> >>> s->mbps = (((double) transferred * 8.0) / >>> @@ -2237,8 +2238,12 @@ static void migration_update_counters(MigrationState >>> *s, >>> * recalculate. 1 is a small enough number for our purposes >>> */ >>> if (ram_counters.dirty_pages_rate && transferred > 1) { >>> -s->expected_downtime = ram_counters.dirty_pages_rate * >>> -qemu_target_page_size() / bandwidth; >>> +/* >>> + * It will initially be a gross over-estimate, but for for >>> + * non-converging migrations it should approach a reasonable >>> estimate >>> + * later on >>> + */ >>> +s->expected_downtime = s->ram_bytes_remaining / bandwidth; >>> } >>> >>> qemu_file_reset_rate_limit(s->to_dst_file); >>> diff --git a/migration/migration.h b/migration/migration.h >>> index 8d2f320c48..8584f8e22e 100644 >>> --- a/migration/migration.h >>> +++ b/migration/migration.h >>> @@ -128,6 +128,7 @@ struct MigrationState >>> int64_t downtime_start; >>> int64_t downtime; >>> int64_t expected_downtime; >>> +int64_t ram_bytes_remaining; >>> bool enabled_capabilities[MIGRATION_CAPABILITY__MAX]; >>> int64_t setup_time; >>> /* >>> -- I think you don't need to add ram_byte_remaining, there is in ram_counters a "remaining" field that seems unused. I think this fix can be as simple as: diff --git a/migration/migration.c b/migration/migration.c index 1e99ec9..25b26f3 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2712,14 +2712,7 @@ static void migration_update_counters(MigrationState *s, s->mbps = (((double) transferred * 8.0) / ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; -/* - * if we haven't sent anything, we don't want to - * recalculate. 1 is a smal
Re: [Qemu-devel] [PATCH v3 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
On Wed, May 16, 2018 at 03:43:48PM +0200, Laurent Vivier wrote: > Hi Bala, > > I've tested you patch migrating a pseries between a P9 host and a P8 > host with 1G huge page size on the P9 side and 16MB on P8 side and the > information are strange now. Hi Laurent, Thank you for testing the patch, I too have worked on recreate the same setup and my observation is that remaining ram is reducing where as expected_downtime remains to be same as 300 which is same as downtime-limit because, it gets assigned in migrate_fd_connect() as 300, s->expected_downtime = s->parameters.downtime_limit; expected_downtime is not calculated immediately after migration is started, it takes time to calculate expected_downtime even without this patch because of the condition in migration_update_counters(), /* * if we haven't sent anything, we don't want to * recalculate. 1 is a small enough number for our purposes */ if (ram_counters.dirty_pages_rate && transferred > 1) { calculate expected_downtime } > "remaining ram" doesn't change, and after a while it can be set to "0" > and estimated downtime is 0 too, but the migration is not completed and I see remaining ram reduces continuously to a point and bumps up again. migration completes successfully after setting downtime-limit same as expected_dowtime which is calculated after it enters the condition mentioned above, Tested with this patch, (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off Migration status: active total time: 50753 milliseconds expected downtime: 46710 milliseconds setup: 15 milliseconds transferred ram: 582332 kbytes throughput: 95.33 mbps remaining ram: 543552 kbytes total ram: 8388864 kbytes duplicate: 1983194 pages skipped: 0 pages normal: 140950 pages normal bytes: 563800 kbytes dirty sync count: 2 page size: 4 kbytes dirty pages rate: 49351 pages (qemu) migrate_set_parameter downtime-limit 46710 (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off Migration status: completed total time: 118389 milliseconds downtime: 20324 milliseconds setup: 15 milliseconds transferred ram: 1355349 kbytes throughput: 94.07 mbps remaining ram: 0 kbytes total ram: 8388864 kbytes duplicate: 2139396 pages skipped: 0 pages normal: 333485 pages normal bytes: 1333940 kbytes dirty sync count: 6 page size: 4 kbytes > "transferred ram" continues to increase. If we do not set the downtime-limit, the remaining ram and transferred ram gets bumped up and migration continues infinitely. -- Bala > > so think there is a problem somewhere... > > thanks, > Laurent > > On 01/05/2018 16:37, Balamuruhan S wrote: > > Hi, > > > > Dave, David and Juan if you guys are okay with the patch, please > > help to merge it. > > > > Thanks, > > Bala > > > > On Wed, Apr 25, 2018 at 12:40:40PM +0530, Balamuruhan S wrote: > >> expected_downtime value is not accurate with dirty_pages_rate * page_size, > >> using ram_bytes_remaining would yeild it correct. It will initially be a > >> gross over-estimate, but for for non-converging migrations it should > >> approach a reasonable estimate later on. > >> > >> currently bandwidth and expected_downtime value are calculated in > >> migration_update_counters() during each iteration from > >> migration_thread(), where as remaining ram is calculated in > >> qmp_query_migrate() when we actually call "info migrate". Due to this > >> there is some difference in expected_downtime value being calculated. > >> > >> with this patch bandwidth, expected_downtime and remaining ram are > >> calculated in migration_update_counters(), retrieve the same value during > >> "info migrate". By this approach we get almost close enough value. > >> > >> Reported-by: Michael Roth > >> Signed-off-by: Balamuruhan S > >> --- > >> migration/migration.c | 11 --- > >> migration/migration.h | 1 + > >> 2 files changed, 9 insertions(+), 3 deletions(-) > >> > >> diff --git a/migration/migration.c b/migration/migration.c > >> index 52a5092add..5d721ee481 100644 > >> --- a/migration/migration.c > >> +++ b/migration/migration.c > >> @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, > >> MigrationState *s) > >> } > >> > >> if (s->state != MIGRATION_STATUS_COMPLETED) { > >> -info->ram->remaining = ram_bytes_remaining(); > >> +info->ram->remain
Re: [Qemu-devel] [PATCH v3 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Hi Bala, I've tested you patch migrating a pseries between a P9 host and a P8 host with 1G huge page size on the P9 side and 16MB on P8 side and the information are strange now. "remaining ram" doesn't change, and after a while it can be set to "0" and estimated downtime is 0 too, but the migration is not completed and "transferred ram" continues to increase. so think there is a problem somewhere... thanks, Laurent On 01/05/2018 16:37, Balamuruhan S wrote: > Hi, > > Dave, David and Juan if you guys are okay with the patch, please > help to merge it. > > Thanks, > Bala > > On Wed, Apr 25, 2018 at 12:40:40PM +0530, Balamuruhan S wrote: >> expected_downtime value is not accurate with dirty_pages_rate * page_size, >> using ram_bytes_remaining would yeild it correct. It will initially be a >> gross over-estimate, but for for non-converging migrations it should >> approach a reasonable estimate later on. >> >> currently bandwidth and expected_downtime value are calculated in >> migration_update_counters() during each iteration from >> migration_thread(), where as remaining ram is calculated in >> qmp_query_migrate() when we actually call "info migrate". Due to this >> there is some difference in expected_downtime value being calculated. >> >> with this patch bandwidth, expected_downtime and remaining ram are >> calculated in migration_update_counters(), retrieve the same value during >> "info migrate". By this approach we get almost close enough value. >> >> Reported-by: Michael Roth >> Signed-off-by: Balamuruhan S >> --- >> migration/migration.c | 11 --- >> migration/migration.h | 1 + >> 2 files changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 52a5092add..5d721ee481 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, >> MigrationState *s) >> } >> >> if (s->state != MIGRATION_STATUS_COMPLETED) { >> -info->ram->remaining = ram_bytes_remaining(); >> +info->ram->remaining = s->ram_bytes_remaining; >> info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate; >> } >> } >> @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState >> *s, >> transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes; >> time_spent = current_time - s->iteration_start_time; >> bandwidth = (double)transferred / time_spent; >> +s->ram_bytes_remaining = ram_bytes_remaining(); >> s->threshold_size = bandwidth * s->parameters.downtime_limit; >> >> s->mbps = (((double) transferred * 8.0) / >> @@ -2237,8 +2238,12 @@ static void migration_update_counters(MigrationState >> *s, >> * recalculate. 1 is a small enough number for our purposes >> */ >> if (ram_counters.dirty_pages_rate && transferred > 1) { >> -s->expected_downtime = ram_counters.dirty_pages_rate * >> -qemu_target_page_size() / bandwidth; >> +/* >> + * It will initially be a gross over-estimate, but for for >> + * non-converging migrations it should approach a reasonable >> estimate >> + * later on >> + */ >> +s->expected_downtime = s->ram_bytes_remaining / bandwidth; >> } >> >> qemu_file_reset_rate_limit(s->to_dst_file); >> diff --git a/migration/migration.h b/migration/migration.h >> index 8d2f320c48..8584f8e22e 100644 >> --- a/migration/migration.h >> +++ b/migration/migration.h >> @@ -128,6 +128,7 @@ struct MigrationState >> int64_t downtime_start; >> int64_t downtime; >> int64_t expected_downtime; >> +int64_t ram_bytes_remaining; >> bool enabled_capabilities[MIGRATION_CAPABILITY__MAX]; >> int64_t setup_time; >> /* >> -- >> 2.14.3 >> >> > >
Re: [Qemu-devel] [PATCH v3 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
Hi, Dave, David and Juan if you guys are okay with the patch, please help to merge it. Thanks, Bala On Wed, Apr 25, 2018 at 12:40:40PM +0530, Balamuruhan S wrote: > expected_downtime value is not accurate with dirty_pages_rate * page_size, > using ram_bytes_remaining would yeild it correct. It will initially be a > gross over-estimate, but for for non-converging migrations it should > approach a reasonable estimate later on. > > currently bandwidth and expected_downtime value are calculated in > migration_update_counters() during each iteration from > migration_thread(), where as remaining ram is calculated in > qmp_query_migrate() when we actually call "info migrate". Due to this > there is some difference in expected_downtime value being calculated. > > with this patch bandwidth, expected_downtime and remaining ram are > calculated in migration_update_counters(), retrieve the same value during > "info migrate". By this approach we get almost close enough value. > > Reported-by: Michael Roth > Signed-off-by: Balamuruhan S > --- > migration/migration.c | 11 --- > migration/migration.h | 1 + > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 52a5092add..5d721ee481 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, > MigrationState *s) > } > > if (s->state != MIGRATION_STATUS_COMPLETED) { > -info->ram->remaining = ram_bytes_remaining(); > +info->ram->remaining = s->ram_bytes_remaining; > info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate; > } > } > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s, > transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes; > time_spent = current_time - s->iteration_start_time; > bandwidth = (double)transferred / time_spent; > +s->ram_bytes_remaining = ram_bytes_remaining(); > s->threshold_size = bandwidth * s->parameters.downtime_limit; > > s->mbps = (((double) transferred * 8.0) / > @@ -2237,8 +2238,12 @@ static void migration_update_counters(MigrationState > *s, > * recalculate. 1 is a small enough number for our purposes > */ > if (ram_counters.dirty_pages_rate && transferred > 1) { > -s->expected_downtime = ram_counters.dirty_pages_rate * > -qemu_target_page_size() / bandwidth; > +/* > + * It will initially be a gross over-estimate, but for for > + * non-converging migrations it should approach a reasonable estimate > + * later on > + */ > +s->expected_downtime = s->ram_bytes_remaining / bandwidth; > } > > qemu_file_reset_rate_limit(s->to_dst_file); > diff --git a/migration/migration.h b/migration/migration.h > index 8d2f320c48..8584f8e22e 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -128,6 +128,7 @@ struct MigrationState > int64_t downtime_start; > int64_t downtime; > int64_t expected_downtime; > +int64_t ram_bytes_remaining; > bool enabled_capabilities[MIGRATION_CAPABILITY__MAX]; > int64_t setup_time; > /* > -- > 2.14.3 > >
[Qemu-devel] [PATCH v3 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining would yeild it correct. It will initially be a gross over-estimate, but for for non-converging migrations it should approach a reasonable estimate later on. currently bandwidth and expected_downtime value are calculated in migration_update_counters() during each iteration from migration_thread(), where as remaining ram is calculated in qmp_query_migrate() when we actually call "info migrate". Due to this there is some difference in expected_downtime value being calculated. with this patch bandwidth, expected_downtime and remaining ram are calculated in migration_update_counters(), retrieve the same value during "info migrate". By this approach we get almost close enough value. Reported-by: Michael Roth Signed-off-by: Balamuruhan S --- migration/migration.c | 11 --- migration/migration.h | 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 52a5092add..5d721ee481 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s) } if (s->state != MIGRATION_STATUS_COMPLETED) { -info->ram->remaining = ram_bytes_remaining(); +info->ram->remaining = s->ram_bytes_remaining; info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate; } } @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s, transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes; time_spent = current_time - s->iteration_start_time; bandwidth = (double)transferred / time_spent; +s->ram_bytes_remaining = ram_bytes_remaining(); s->threshold_size = bandwidth * s->parameters.downtime_limit; s->mbps = (((double) transferred * 8.0) / @@ -2237,8 +2238,12 @@ static void migration_update_counters(MigrationState *s, * recalculate. 1 is a small enough number for our purposes */ if (ram_counters.dirty_pages_rate && transferred > 1) { -s->expected_downtime = ram_counters.dirty_pages_rate * -qemu_target_page_size() / bandwidth; +/* + * It will initially be a gross over-estimate, but for for + * non-converging migrations it should approach a reasonable estimate + * later on + */ +s->expected_downtime = s->ram_bytes_remaining / bandwidth; } qemu_file_reset_rate_limit(s->to_dst_file); diff --git a/migration/migration.h b/migration/migration.h index 8d2f320c48..8584f8e22e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -128,6 +128,7 @@ struct MigrationState int64_t downtime_start; int64_t downtime; int64_t expected_downtime; +int64_t ram_bytes_remaining; bool enabled_capabilities[MIGRATION_CAPABILITY__MAX]; int64_t setup_time; /* -- 2.14.3