* Igor Redko (red...@virtuozzo.com) wrote:
> On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> >* Denis V. Lunev (d...@openvz.org) wrote:
> >>From: Igor Redko <red...@virtuozzo.com>
> >>
> >>To get this estimation we must divide pending_size by bandwidth
> >>according to description of expected-downtime ("qmp-commands.hx:3246"):
> >>   "expected-downtime": only present while migration is active
> >>               total amount in ms for downtime that was calculated on
> >>               the last bitmap round (json-int)
> >>
> >>Previous version was just wrong because dirty_bytes_rate and bandwidth
> >>are measured in Bytes/ms, so dividing first by second we get some
> >>dimensionless quantity.
> >>As it said in description above this value is showed during active
> >>migration phase and recalculated only after transferring all memory
> >>and if this process took more than 1 sec. So maybe just nobody noticed
> >>that bug.
> >
> >While I agree the existing code looks wrong, I don't see how this is
> >any more correct.
> 
> This patch is aimed to fix units of expected_downtime. It is reasonable that
> expected_downtime should be measured in milliseconds. While the existing
> implementation lacks of any units.

I agree your units are correct where the old one isn't; and I agree
it needs fixing.
However I'm worrying about whether the value in your fix is correct.

> >  I think 'pending_size' is an estimate of the number of bytes left
> >to transfer, the intention being that most of those are transferred
> >prior to pausing the machine, if those are transferred before pausing
> >then they aren't part of the downtime.
> >
> Yes, 'pending_size' is an estimate of the number of bytes left to transfer,
> indeed.
> But the condition:
> >     if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> slightly modifies the meaning of pending_size correspondingly.
> dirty_bytes_rate is set in migration_sync() that is called when pending_size
> < max_downtime * bandwidth. This estimation is higher than max_downtime by
> design

I don't think that check really modifies the meaning of pending_size;
it's just a sanity check that means we don't start trying to predict downtime
when we've not transmitted much yet.

> >It feels that:
> >    * If the guest wasn't dirtying pages, then you wouldn't have to
> >      pause the guest; if it was just dirtying them a little then you
> >      wouldn't have much to transfer after the pages you'd already
> >      sent; so if the guest dirty pages fast then the estimate should be
> >      larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
> >
> >    * If the bandwidth is higher then the estimate should be smaller; so
> >      'bandwidth' being on the bottom of the fraction feels right.
> >
> >Dave
> >
> The 'expected_downtime' in the existing code takes two types of values:
>   * positive - dirty_bytes_rate is higher than bandwidth. In this
>     case migration doesn't complete.
>   * zero - bandwidth is higher than dirty_bytes_rate. In this case
>     migration is possible, but we don’t have the downtime value.

OK, I don't think that should give zero; my argument being that given
each pass throuhg RAM takes some time, you're always going to some proportion
of RAM that's dirty.

> This patch has some imperfections. But if we would look back into history,
> it seems that this patch just restores the broken logic.
> The existing code is introduced by commit
> https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327
> which claims, that it just restores the mistakenly deleted estimation
> (commit
> https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
> Meanwhile, the estimation has changed during this restore operation. The
> estimation before the removal (before
> e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

Yes; remember I believe that the current code is wrong - I'm just not
sure your suggested fix is right.

> So maybe we should think about improvement of this estimation.
> I'm suggest using something like:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth
> 
> In my opinion this is more correct than the existing approach since the last
> step of the migration process (before pause) is transferring of max_size
> bytes (max_size =  bandwidth * migrate_max_downtime() / 1000000). So the
> bytes that were dirtied at this step will be transferred during downtime.
> The transferred bytes count is dirty_bytes_rate * max_size/bandwidth or
> migrate_max_downtime * dirty_bytes_rate and division by the bandwidth
> results in a formula:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Are you sure - I thought migrate_max_downtime is governing the time *after* 
pause;
and it doesn't make sense to me for the expectation of the system to be
related to the expectation of the user.

Dave

> Igor
> 
> >>Signed-off-by: Igor Redko <red...@virtuozzo.com>
> >>Reviewed-by: Anna Melekhova <an...@virtuozzo.com>
> >>Signed-off-by: Denis V. Lunev <d...@openvz.org>
> >>CC: Juan Quintela <quint...@redhat.com>
> >>CC: Amit Shah <amit.s...@redhat.com>
> >>---
> >>  migration/migration.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index 662e77e..d55d545 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
> >>              /* if we haven't sent anything, we don't want to recalculate
> >>                 10000 is a small enough number for our purposes */
> >>              if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> >>-                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
> >>+                s->expected_downtime = pending_size / bandwidth;
> >>              }
> >>
> >>              qemu_file_reset_rate_limit(s->file);
> >>--
> >>2.1.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Reply via email to