[Qemu-devel] [PATCH 1/1] migration: calculate expected_downtime considering redirtied ram
From: Balamuruhan S currently we calculate expected_downtime by time taken to transfer remaining ram, but during the time we had transferred remaining ram few pages of ram might be redirtied and we need to retransfer it, so it is better to consider them for calculating expected_downtime for getting more accurate values. Total ram to be transferred = remaining ram + (redirtied ram at the time when the remaining ram gets transferred) redirtied ram = dirty_pages_rate * time taken to transfer remaining ram redirtied ram = dirty_pages_rate * (remaining ram / bandwidth) expected_downtime = (remaining ram + redirtied ram) / bandwidth Suggested-by: David Gibson Suggested-by: Dr. David Alan Gilbert Signed-off-by: Balamuruhan S --- migration/migration.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index ffc4d9e556..dc38e9a380 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s, * recalculate. 1 is a small enough number for our purposes */ if (ram_counters.dirty_pages_rate && transferred > 1) { -s->expected_downtime = ram_counters.remaining / bandwidth; +/* Time required to transfer remaining ram */ +remaining_ram_transfer_time = ram_counters.remaining / bandwidth + +/* redirty of ram at the time remaining ram gets transferred*/ +newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time + +s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth; } qemu_file_reset_rate_limit(s->to_dst_file); -- 2.14.5
[Qemu-devel] [PATCH 0/1] migration: calculate expected_downtime considering redirtied ram
From: Balamuruhan S Based on the discussion with Dave and David Gibson earlier with respect to expected_downtime calculation, https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02418.html got suggestions that the calculation is of not accurate and we need to consider the ram that gets redirtied during the time when we would have actually transferred ram in the current iteration. so I have came up with a calculation by considering the ram that could get redirtied during the current iteration at the time we would have transferred the remaining ram in current iteration. By this way, the total ram to be transferred will be remaining ram + redirtied ram and dividing with bandwidth would yield us better expected_downtime value. Please help to review and suggest about this approach. Balamuruhan S (1): migration: calculate expected_downtime considering redirtied ram migration/migration.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.14.5
Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
On 2018-04-03 11:40, Peter Xu wrote: On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote: expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining would yeild it correct. Signed-off-by: Balamuruhan S--- migration/migration.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 58bd382730..4e43dc4f92 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s, * recalculate. 1 is a small enough number for our purposes */ if (ram_counters.dirty_pages_rate && transferred > 1) { -s->expected_downtime = ram_counters.dirty_pages_rate * -qemu_target_page_size() / bandwidth; +s->expected_downtime = ram_bytes_remaining() / bandwidth; This field was removed in e4ed1541ac ("savevm: New save live migration method: pending", 2012-12-20), in which remaing RAM was used. And it was added back in 90f8ae724a ("migration: calculate expected_downtime", 2013-02-22), in which dirty rate was used. However I didn't find a clue on why we changed from using remaining RAM to using dirty rate... So I'll leave this question to Juan. Besides, I'm a bit confused on when we'll want such a value. AFAIU precopy is mostly used by setting up the target downtime before hand, so we should already know the downtime before hand. Then why we want to observe such a thing? Thanks Peter Xu for reviewing, I tested precopy migration with 16M hugepage backed ppc guest and granularity of page size in migration is 4K so any page dirtied would result in 4096 pages to be transmitted again, this led for migration to continue endless, default migrate_parameters: downtime-limit: 300 milliseconds info migrate: expected downtime: 1475 milliseconds Migration status: active total time: 130874 milliseconds expected downtime: 1475 milliseconds setup: 3475 milliseconds transferred ram: 18197383 kbytes throughput: 866.83 mbps remaining ram: 376892 kbytes total ram: 8388864 kbytes duplicate: 1678265 pages skipped: 0 pages normal: 4536795 pages normal bytes: 18147180 kbytes dirty sync count: 6 page size: 4 kbytes dirty pages rate: 39044 pages In order to complete migration I configured downtime-limit to 1475 milliseconds but still migration was endless. Later calculated expected downtime by remaining ram 376892 Kbytes / 866.83 mbps yeilded 3478.34 milliseconds and configuring it as downtime-limit succeeds the migration to complete. This led to the conclusion that expected downtime is not accurate. Regards, Balamuruhan S Thanks,