On 4/7/22 3:57 PM, Claudio Fontana wrote: > On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote: >> * Claudio Fontana (cfont...@suse.de) wrote: >>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote: >>>> * Claudio Fontana (cfont...@suse.de) wrote: >>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote: >>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote: >>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote: >>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote: >>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote: >>>>>>>>>> * Claudio Fontana (cfont...@suse.de) wrote: >>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote: >>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote: >>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote: >>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote: >>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote: >>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote: >>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> the first user is the qemu driver, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe >>>>>>>>>>>>>>>>>>> size (64k). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This improves the situation by 400%. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some >>>>>>>>>>>>>>>>>>> penalty (~15%-ish) >>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a >>>>>>>>>>>>>>>>>>> file. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfont...@suse.de> >>>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>>> src/qemu/qemu_driver.c | 6 +++--- >>>>>>>>>>>>>>>>>>> src/qemu/qemu_saveimage.c | 11 ++++++----- >>>>>>>>>>>>>>>>>>> src/util/virfile.c | 12 ++++++++++++ >>>>>>>>>>>>>>>>>>> src/util/virfile.h | 1 + >>>>>>>>>>>>>>>>>>> 4 files changed, 22 insertions(+), 8 deletions(-) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance >>>>>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Current results show these experimental averages maximum >>>>>>>>>>>>>> throughput >>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU >>>>>>>>>>>>>> QMP >>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each). >>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before >>>>>>>>>>>>>> migration, >>>>>>>>>>>>>> through user application allocating and touching all memory with >>>>>>>>>>>>>> pseudorandom data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 64K: 5200 Mbps (current situation) >>>>>>>>>>>>>> 128K: 5800 Mbps >>>>>>>>>>>>>> 256K: 20900 Mbps >>>>>>>>>>>>>> 512K: 21600 Mbps >>>>>>>>>>>>>> 1M: 22800 Mbps >>>>>>>>>>>>>> 2M: 22800 Mbps >>>>>>>>>>>>>> 4M: 22400 Mbps >>>>>>>>>>>>>> 8M: 22500 Mbps >>>>>>>>>>>>>> 16M: 22800 Mbps >>>>>>>>>>>>>> 32M: 22900 Mbps >>>>>>>>>>>>>> 64M: 22900 Mbps >>>>>>>>>>>>>> 128M: 22800 Mbps >>>>>>>>>>>>>> >>>>>>>>>>>>>> This above is the throughput out of patched libvirt with >>>>>>>>>>>>>> multiple Pipe Sizes for the FDWrapper. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that >>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but >>>>>>>>>>>>> not try to go higher. >>>>>>>>>>>>> >>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture, >>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP >>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt, >>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via >>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", >>>>>>>>>>>>>> "unix:///tmp/netcat.sock" } } : >>>>>>>>>>>>>> >>>>>>>>>>>>>> QMP: 37000 Mbps >>>>>>>>>>>>> >>>>>>>>>>>>>> So although the Pipe size improves things (in particular the >>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good >>>>>>>>>>>>>> value), >>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that >>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus >>>>>>>>>>> and larger VM sizes, up to 200GB, >>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the >>>>>>>>>>> same ~14500 Mbps. >>>>>>>>>>> >>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null >>>>>>>>>>> ~35500 mbps virsh save to /dev/null >>>>>>>>>>> >>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a >>>>>>>>>>> totally fair comparison because the VM sizes are different). >>>>>>>>>> >>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an >>>>>>>>>> extra copy. >>>>>>>>> >>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably >>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL. >>>>>>>> >>>>>>>> Yep, sendfile() requires the input to be a mmapable FD, >>>>>>>> and the output to be a socket. >>>>>>>> >>>>>>>> Try splice() instead which merely requires 1 end to be a >>>>>>>> pipe, and the other end can be any FD afaik. >>>>>>>> >>>>>>> >>>>>>> I did try splice(), but performance is worse by around 500%. >>>>>> >>>>>> Hmm, that's certainly unexpected ! >>>>>> >>>>>>> Any ideas welcome, >>>>>> >>>>>> I learnt there is also a newer copy_file_range call, not sure if that's >>>>>> any better. >>>>>> >>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just >>>>>> want to copy everything IIRC. >>>>>> >>>>>> With regards, >>>>>> Daniel >>>>>> >>>>> >>>>> Crazy idea, would trying to use the parallel migration concept for >>>>> migrating to/from a file make any sense? >>>>> >>>>> Not sure if applying the qemu multifd implementation of this would apply, >>>>> maybe it could be given another implementation for "toFile", trying to >>>>> use more than one cpu to do the transfer? >>>> >>>> I can't see a way that would help; well, I could if you could >>>> somehow have multiple io helper threads that dealt with it. >>> >>> The first issue I encounter here for both the "virsh save" and "virsh >>> restore" scenarios is that libvirt uses fd: migration, not unix: migration. >>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see. >>> >>> Current save procedure in QMP in short: >>> >>> {"execute":"migrate-set-capabilities", ...} >>> {"execute":"migrate-set-parameters", ...} >>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26 >>> QEMU_MONITOR_IO_SEND_FD: fd=26 >>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...} >>> >>> >>> Current restore procedure in QMP in short: >>> >>> (start QEMU) >>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...} >>> >>> >>> Should I investigate changing libvirt to use unix: for save/restore? >>> Or should I look into changing qemu to somehow accept fd: for multifd, >>> meaning I guess providing multiple fd: uris in the migrate command? >> >> So I'm not sure this is the right direction; i.e. if multifd is the >> right answer to your problem. > > Of course, just exploring the space.
I have some progress on multifd if we can call it so: I wrote a simple program that sets up a unix socket, listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and runs the migration, spawning threads for each incoming connection from QEMU, creating a file to use to store the migration data coming from qemu (optionally using O_DIRECT). This program plays the role of a "iohelper"-like thing, basically just copying things over, making O_DIRECT possible. I save the data streams to multiple files; this works, for the actual results though I will have to migrate to a better hardware setup (enterprise nvme + fast cpu, under various memory configurations). The intuition would be that if we have enough cpus to spare (no libvirt in the picture as mentioned for now), say, the same 4 cpus already allocated for a certain VM to run, we can use those cpus (now "free" since we suspended the guest) to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing the amount of stuff that needs to go to disk, making use of those cpus. Work in progress... > >> However, I think the qemu code probably really really wants to be a >> socket. > > Understood, I'll try to bend libvirt to use unix:/// and see how far I get, > > Thanks, > > Claudio > >> >> Dave >> >>> >>> Thank you for your help, >>> >>> Claudio >>> >