On Wed, Jul 10, 2024 at 03:39:43PM +0000, Liu, Yuan1 wrote:
> > I don't think postcopy will trigger timeout failures - postcopy should use
> > constant time to complete a migration, that is guest memsize / bw.
> 
> Yes, the migration total time is predictable, failure due to timeout is 
> incorrect, 
> migration taking a long time may be more accurate.

It shouldn't: postcopy is run always together with precopy, so if you start
postcopy after one round of precopy, the total migration time should
alwways be smaller than if you run the precopy two rounds.

With postcopy after that migration completes, but for precopy two rounds of
migration will follow with a dirty sync which may say "there's unforunately
more dirty pages, let's move on with the 3rd round and more".

> 
> > The challenge is normally on the delay of page requests higher than
> > precopy, but in this case it might not be a big deal. And I wonder if on
> > 100G*2 cards it can also perform pretty well, as the delay might be
> > minimal
> > even if bandwidth is throttled.
> 
> I got your point, I don't have much experience in this area.
> So you mean to reserve a small amount of bandwidth on a NIC for postcopy 
> migration, and compare the migration performance with and without traffic
> on the NIC? Will data plane traffic affect page request delays in postcopy?

I'm not sure what's the "data plane" you're describing here, but logically
VMs should be migrated using mgmt networks, and should be somehow separate
from IOs within the VMs.

I'm not really asking for another test, sorry to cause confusions; it's
only about some pure discussions.  I just feel like postcopy wasn't really
seriously considered even for many valid cases, some of them postcopy can
play pretty well even without any modern hardwares requested.  There's no
need to prove which is better for this series.

Thanks,

-- 
Peter Xu


Reply via email to