Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance

Claudio Fontana Mon, 11 Apr 2022 11:21:45 -0700

On 4/7/22 3:57 PM, Claudio Fontana wrote:
> On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
>> * Claudio Fontana (cfont...@suse.de) wrote:
>>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>>>> * Claudio Fontana (cfont...@suse.de) wrote:
>>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>> * Claudio Fontana (cfont...@suse.de) wrote:
>>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe 
>>>>>>>>>>>>>>>>>>> size (64k).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some 
>>>>>>>>>>>>>>>>>>> penalty (~15%-ish)
>>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a 
>>>>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfont...@suse.de>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance 
>>>>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current results show these experimental averages maximum 
>>>>>>>>>>>>>> throughput
>>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU 
>>>>>>>>>>>>>> QMP
>>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before 
>>>>>>>>>>>>>> migration,
>>>>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This above is the throughput out of patched libvirt with 
>>>>>>>>>>>>>> multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>>>>> not try to go higher.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", 
>>>>>>>>>>>>>> "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good 
>>>>>>>>>>>>>> value),
>>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus 
>>>>>>>>>>> and larger VM sizes, up to 200GB,
>>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the 
>>>>>>>>>>> same ~14500 Mbps.
>>>>>>>>>>>
>>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>>>>
>>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a 
>>>>>>>>>>> totally fair comparison because the VM sizes are different).
>>>>>>>>>>
>>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an 
>>>>>>>>>> extra copy.
>>>>>>>>>
>>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>>>>
>>>>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>>>>> and the output to be a socket.
>>>>>>>>
>>>>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>>>>> pipe, and the other end can be any FD afaik.
>>>>>>>>
>>>>>>>
>>>>>>> I did try splice(), but performance is worse by around 500%.
>>>>>>
>>>>>> Hmm, that's certainly unexpected !
>>>>>>
>>>>>>> Any ideas welcome,
>>>>>>
>>>>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>>>>> any better.
>>>>>>
>>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>>>>> want to copy everything IIRC.
>>>>>>
>>>>>> With regards,
>>>>>> Daniel
>>>>>>
>>>>>
>>>>> Crazy idea, would trying to use the parallel migration concept for 
>>>>> migrating to/from a file make any sense?
>>>>>
>>>>> Not sure if applying the qemu multifd implementation of this would apply, 
>>>>> maybe it could be given another implementation for "toFile", trying to 
>>>>> use more than one cpu to do the transfer?
>>>>
>>>> I can't see a way that would help; well, I could if you could
>>>> somehow have multiple io helper threads that dealt with it.
>>>
>>> The first issue I encounter here for both the "virsh save" and "virsh 
>>> restore" scenarios is that libvirt uses fd: migration, not unix: migration.
>>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
>>>
>>> Current save procedure in QMP in short:
>>>
>>> {"execute":"migrate-set-capabilities", ...}
>>> {"execute":"migrate-set-parameters", ...}
>>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
>>> QEMU_MONITOR_IO_SEND_FD: fd=26
>>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
>>>
>>>
>>> Current restore procedure in QMP in short:
>>>
>>> (start QEMU)
>>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
>>>
>>>
>>> Should I investigate changing libvirt to use unix: for save/restore?
>>> Or should I look into changing qemu to somehow accept fd: for multifd, 
>>> meaning I guess providing multiple fd: uris in the migrate command?
>>
>> So I'm not sure this is the right direction; i.e. if multifd is the
>> right answer to your problem.
> 
> Of course, just exploring the space.



I have some progress on multifd if we can call it so:

I wrote a simple program that sets up a unix socket,
listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and 
runs the migration,
spawning threads for each incoming connection from QEMU, creating a file to use 
to store the migration data coming from qemu (optionally using O_DIRECT).

This program plays the role of a "iohelper"-like thing, basically just copying 
things over, making O_DIRECT possible.

I save the data streams to multiple files; this works, for the actual results 
though I will have to migrate to a better hardware setup (enterprise nvme + 
fast cpu, under various memory configurations).

The intuition would be that if we have enough cpus to spare (no libvirt in the 
picture as mentioned for now),
say, the same 4 cpus already allocated for a certain VM to run, we can use 
those cpus (now "free" since we suspended the guest)
to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing 
the amount of stuff that needs to go to disk, making use of those cpus.

Work in progress...

> 
>> However, I think the qemu code probably really really wants to be a
>> socket.
> 
> Understood, I'll try to bend libvirt to use unix:/// and see how far I get,
> 
> Thanks,
> 
> Claudio
> 
>>
>> Dave
>>
>>>
>>> Thank you for your help,
>>>
>>> Claudio
>>>
>

Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance

Reply via email to