"Chuang Xu" <[email protected]> writes:

> In our long-term experience in Bytedance, we've found that under
> the same load, live migration of larger VMs with more devices is
> often more difficult to converge (requiring a larger downtime limit).
>
> Through some testing and calculations, we conclude that bitmap sync time
> affects the calculation of live migration bandwidth.
>
> When the addresses processed are not aligned, a large number of
> clear_dirty ioctl occur (e.g. a 4MB misaligned memory can generate
> 2048 clear_dirty ioctls from two different memory_listener),
> which increases the time required for bitmap_sync and makes it
> more difficult for dirty pages to converge.
>
> For a 64C256G vm with 8 vhost-user-net(32 queue per nic) and
> 16 vhost-user-blk(4 queue per blk), the sync time is as high as *73ms*
> (tested with 10GBps dirty rate, the sync time increases as the dirty
> page rate increases), Here are each part of the sync time:
>
> - sync from kvm to ram_list: 2.5ms
> - vhost_log_sync:3ms
> - sync aligned memory from ram_list to RAMBlock: 5ms
> - sync misaligned memory from ram_list to RAMBlock: 61ms
>
> Attempt to merge those fragmented clear_dirty ioctls, then syncing
> misaligned memory from ram_list to RAMBlock takes only about 1ms,
> and the total sync time is only *12ms*.
>
> Signed-off-by: Chuang Xu <[email protected]>

Reviewed-by: Fabiano Rosas <[email protected]>

Reply via email to