> On 25 May 2017, at 01:36, Peter Xu <pet...@redhat.com> wrote: > > On Wed, May 24, 2017 at 01:02:25PM +0000, Felipe Franciosi wrote: >> >>> On 23 May 2017, at 05:27, Peter Xu <pet...@redhat.com> wrote: >>> >>> On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote: >>>> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set >>>> to ram_state.bytes_transferred which is, at this point, zero. The next >>>> time migration_bitmap_sync() is called, an iteration has happened and >>>> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second >>>> has passed, so the auto converge logic will be triggered and >>>> bytes_xfer_now will also be set to 'x' bytes. >>>> >>>> This condition is currently masked by dirty_rate_high_cnt, which will >>>> wait for a few iterations before throttling. It would otherwise always >>>> assume zero bytes have been copied and therefore throttle the guest >>>> (possibly) prematurely. >>>> >>>> Given bytes_xfer_prev is only used by the auto convergence logic, it >>>> makes sense to only set its value after a check has been made against >>>> bytes_xfer_now. >>>> >>>> Signed-off-by: Felipe Franciosi <fel...@nutanix.com> >>>> ~ >>>> --- >>>> migration/ram.c | 4 ---- >>>> 1 file changed, 4 deletions(-) >>>> >>>> diff --git a/migration/ram.c b/migration/ram.c >>>> index f59fdd4..793af39 100644 >>>> --- a/migration/ram.c >>>> +++ b/migration/ram.c >>>> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs) >>>> >>>> rs->bitmap_sync_count++; >>>> >>>> - if (!rs->bytes_xfer_prev) { >>>> - rs->bytes_xfer_prev = ram_bytes_transferred(); >>>> - } >>>> - >>>> if (!rs->time_last_bitmap_sync) { >>>> rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); >>>> } >>>> -- >>>> 1.9.5 >>>> >>>> >>> >>> I feel like this patch wants to correctly initialize bytes_xfer_prev, >>> however I still see problem. E.g., when user specify auto-convergence >>> during migration, and in the first iteration we'll always have a very >>> small bytes_xfer_prev (with this patch, it'll be zero) with a very big >>> bytes_xfer_now (which is the ram_bytes_transferred() value). >> >> Interesting point. Worth noting, that's no different than what happens today >> anyway (bytes_xfer_prev would be initialised to the first non-zero >> bytes_transferred). I therefore don't think it should stop or slow this >> patch's acceptance. >> >>> If so, do >>> you think squash below change together with current one would be more >>> accurate? >> >> As a matter of fact I had a different idea (below) to fix what you are >> describing. I was still experimenting with this code, so haven't sent a >> patch yet. But I'm going to send a series soon for comments. Basically, I >> think some other changes are required to make sure these numbers are correct: > > I agree on most points. > >> >> 1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2 >> - The original commit msg from 070afca25 (Jason J. Herne) says that >> convergence should be triggered after two passes. Current code does it after >> three passes (and four passes the first time around; see number 2 below). >> - I personally feel this counter should go away altogether. If the migration >> is not converging and the VM is going to be throttled, there's no point >> stressing the network any further; just start throttling straight away. >> >> 2) dirty_pages_rate should be updated before the autoconverge logic. >> - Right now, we delay throttling by a further iteration, as dirty_pages_rate >> is set after the first pass through the autoconverge logic (it is zero the >> first time around). >> - The "if (rs->dirty_pages_rate &&..." part of the conditional can then be >> removed, as it won't ever be zero. > > For this one: why dirty_pages_rate cannot be zero?
Yeah good point. If _nothing_ got dirtied in the period (possible I suppose), then the dirty rate would be zero. > But I agree with > you that it can be removed since even if it's zero, then the next > check would fail as well (rs->num_dirty_pages_period * > TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2). Exactly. I suppose that check was introduced to consider the first iteration where the counter hadn't been initialised. But it's not needed. > >> >> 3) bytes_xfer counters should be updated alongside dirty_pages counters (for >> the same period). >> - This fixes the issue you described, as bytes_xfer_* will correspond to the >> period. >> >> I'll send the series shortly. Thoughts in the meantime? > > I'll reply to that patchset then. Thanks. Thanks for the quick feedback on this. I've included this change on the patchset, so let's continue the conversation there. Felipe > > -- > Peter Xu