Peter Xu <pet...@redhat.com> writes:

> On Thu, Mar 30, 2023 at 03:03:10PM -0300, Fabiano Rosas wrote:
>> Hi folks,
>
> Hi,
>
>> 
>> I'm continuing the work done last year to add a new format of
>> migration stream that can be used to migrate large guests to a single
>> file in a performant way.
>> 
>> This is an early RFC with the previous code + my additions to support
>> multifd and direct IO. Let me know what you think!
>> 
>> Here are the reference links for previous discussions:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01813.html
>> https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg01338.html
>> https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg05536.html
>> 
>> The series has 4 main parts:
>> 
>> 1) File migration: A new "file:" migration URI. So "file:mig" does the
>>    same as "exec:cat > mig". Patches 1-4 implement this;
>> 
>> 2) Fixed-ram format: A new format for the migration stream. Puts guest
>>    pages at their relative offsets in the migration file. This saves
>>    space on the worst case of RAM utilization because every page has a
>>    fixed offset in the migration file and (potentially) saves us time
>>    because we could write pages independently in parallel. It also
>>    gives alignment guarantees so we could use O_DIRECT. Patches 5-13
>>    implement this;
>> 
>> With patches 1-13 these two^ can be used with:
>> 
>> (qemu) migrate_set_capability fixed-ram on
>> (qemu) migrate[_incoming] file:mig
>
> Have you considered enabling the new fixed-ram format with postcopy when
> loading?
>
> Due to the linear offseting of pages, I think it can achieve super fast vm
> loads due to O(1) lookup of pages and local page fault resolutions.
>

I don't think we have looked that much at the loading side yet. Good to
know that it has potential to be faster. I'll look into it. Thanks for
the suggestion.

>> 
>> --> new in this series:
>> 
>> 3) MultiFD support: This is about making use of the parallelism
>>    allowed by the new format. We just need the threading and page
>>    queuing infrastructure that is already in place for
>>    multifd. Patches 14-24 implement this;
>> 
>> (qemu) migrate_set_capability fixed-ram on
>> (qemu) migrate_set_capability multifd on
>> (qemu) migrate_set_parameter multifd-channels 4
>> (qemu) migrate_set_parameter max-bandwith 0
>> (qemu) migrate[_incoming] file:mig
>> 
>> 4) Add a new "direct_io" parameter and enable O_DIRECT for the
>>    properly aligned segments of the migration (mostly ram). Patch 25.
>> 
>> (qemu) migrate_set_parameter direct-io on
>> 
>> Thanks! Some data below:
>> =====
>> 
>> Outgoing migration to file. NVMe disk. XFS filesystem.
>> 
>> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
>>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
>>   10m -v`:
>> 
>> migration type  | MB/s | pages/s |  ms
>> ----------------+------+---------+------
>> savevm io_uring |  434 |  102294 | 71473
>
> So I assume this is the non-live migration scenario.  Could you explain
> what does io_uring mean here?
>

This table is all non-live migration. This particular line is a snapshot
(hmp_savevm->save_snapshot). I thought it could be relevant because it
is another way by which we write RAM into disk.

The io_uring is noise, I was initially under the impression that the
block device aio configuration affected this scenario.

>> file:           | 3017 |  855862 | 10301
>> fixed-ram       | 1982 |  330686 | 15637
>> ----------------+------+---------+------
>> fixed-ram + multifd + O_DIRECT
>>          2 ch.  | 5565 | 1500882 |  5576
>>          4 ch.  | 5735 | 1991549 |  5412
>>          8 ch.  | 5650 | 1769650 |  5489
>>         16 ch.  | 6071 | 1832407 |  5114
>>         32 ch.  | 6147 | 1809588 |  5050
>>         64 ch.  | 6344 | 1841728 |  4895
>>        128 ch.  | 6120 | 1915669 |  5085
>> ----------------+------+---------+------
>
> Thanks,

Reply via email to