Re: [Qemu-devel] [PATCH 4/4] migration: use dirty_rate_high_cnt more aggressively

Peter Xu Thu, 25 May 2017 19:45:52 -0700

On Thu, May 25, 2017 at 11:20:02AM +0000, Felipe Franciosi wrote:
> + Matthew Rosato, original reviewer of 070afca25
> 
> > On 25 May 2017, at 02:03, Peter Xu <pet...@redhat.com> wrote:
> > 
> > On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote:
> >> The commit message from 070afca25 suggests that dirty_rate_high_cnt
> >> should be used more aggressively to start throttling after two
> >> iterations instead of four. The code, however, only changes the auto
> >> convergence behaviour to throttle after three iterations. This makes the
> >> behaviour more aggressive by kicking off throttling after two iterations
> >> as originally intended.
> > 
> > For this one, I don't think fixing the code to match the commit
> > message is that important. Instead, for me this patch looks more like
> > something "changed iteration loops from 3 to 2".
> 
> I agree. If we decide a v2 is needed I can amend the commit message 
> accordingly.
> 
> > So the point is, what
> > would be the best practical number for it. And when we change an
> > existing value, we should have some reason, since it'll change
> > behavior of existing user (though I'm not sure whether this one will
> > affect much).
> 
> We've done a few tests with this internally using various workloads (DBs, 
> synthetic mem stress, &c.). In summary, and along the lines with how Qemu 
> implements auto convergence today, I would say this counter should be removed.
> 
> Consider that when live migrating a large-ish VM (100GB+ RAM), the network 
> will be under stress for (at least) several minutes. At the same time, the 
> sole purpose of this counter is to attempt convergence without having to 
> throttle the guest. That is, it defers throttling in the hope that either the 
> guest calms down or that the network gets less congested.
> 
> For real workloads, both cases are unlikely to happen. Throttling will 
> eventually be needed otherwise the migration will not converge.


I am not much experienced with using auto convergence with real
workload, but from what you explained I see it a reasonable change to
even remove it (that sounds more persuasive to me than "just try to
follow what the commit message said", with test results :), especially
considering that we have cpu-throttle-initial and
cpu-throttle-increment parameters as tunables, so we should be fine
even we want to tune the speed down a bit in the future.

> 
> > I believe with higher dirty_rate_high_cnt, we have more smooth
> > throttling, but it'll be slower in responding; While if lower or even
> > remove it, we'll get very fast throttling response speed but I guess
> > it may be more possible to report a false positive?
> 
> With a higher dirty_rate_high_cnt, migration will simply take longer (if not 
> converging). For real workloads, it doesn't change how much throttling is 
> required. For spiky or varied workloads, it gives a chance for migration to 
> converge without throttling, at the cost of massive network stress and a 
> longer overall migration time (which is bad user experience IMO).
> 
> > IMHO here 3 is
> > okay since after all we are solving the problem of unconverged
> > migration, so as long as we can converge, I think it'll be fine.
> 
> Based on 070afca25's commit message, Jason seemed to think that four was too 
> much and meant to change it to two. As explained above, I'd be in favour of 
> removing this counter altogether, or at least make it configurable (perhaps a 
> #define would be enough an improvement for now). This patch is intended as a 
> compromise by effectively using two.

If to be a tunable, maybe another MigrationParameter? But I don't know
whether it would really worth it since the other two can do more
fine-grained tunes already. So I would prefer to remove it as well if
thorough tests are carried out.

Maybe we can also wait to see how other people think about it.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 4/4] migration: use dirty_rate_high_cnt more aggressively

Reply via email to