* Claudio Fontana (cfont...@suse.de) wrote: > On 8/18/22 14:38, Dr. David Alan Gilbert wrote: > > * Nikolay Borisov (nbori...@suse.com) wrote: > >> [adding Juan and David to cc as I had missed them. ] > > > > Hi Nikolay, > > > >> On 11.08.22 г. 16:47 ч., Nikolay Borisov wrote: > >>> Hello, > >>> > >>> I'm currently looking into implementing a 'file:' uri for migration save > >>> in qemu. Ideally the solution will be O_DIRECT compatible. I'm aware of > >>> the branch https://gitlab.com/berrange/qemu/-/tree/mig-file. In the > >>> process of brainstorming how a solution would like the a couple of > >>> questions transpired that I think warrant wider discussion in the > >>> community. > > > > OK, so this seems to be a continuation with Claudio and Daniel and co as > > of a few months back. I'd definitely be leaving libvirt sides of the > > question here to Dan, and so that also means definitely looking at that > > tree above. > > Hi Dave, yes, Nikolai is trying to continue on the qemu side. > > We have something working with libvirt for our short term needs which offers > good performance, > but it is clear that that simple solution is barred for upstream libvirt > merging. > > > > > >>> First, implementing a solution which is self-contained within qemu would > >>> be easy enough( famous last words) but the gist is one has to only care > >>> about the format within qemu. However, I'm being told that what libvirt > >>> does is prepend its own custom header to the resulting saved file, then > >>> slipstreams the migration stream from qemu. Now with the solution that I > >>> envision I intend to keep all write-related logic inside qemu, this > >>> means there's no way to incorporate the logic of libvirt. The reason I'd > >>> like to keep the write process within qemu is to avoid an extra copy of > >>> data between the two processes (qemu outging migration and libvirt), > >>> with the current fd approach qemu is passed an fd, data is copied > >>> between qemu/libvirt and finally the libvirt_iohelper writes the data. > >>> So the question which remains to be answered is how would libvirt make > >>> use of this new functionality in qemu? I was thinking something along > >>> the lines of : > >>> > >>> 1. Qemu writes its migration stream to a file, ideally on a filesystem > >>> which supports reflink - xfs/btrfs > >>> > >>> 2. Libvirt writes it's header to a separate file > >>> 2.1 Reflinks the qemu's stream right after its header > >>> 2.2 Writes its trailer > >>> > >>> 3. Unlink() qemu's file, now only libvirt's file remains on-disk. > >>> > >>> I wouldn't call this solution hacky though it definitely leaves some > >>> bitter aftertaste. > > > > Wouldn't it be simpler to tell libvirt to write it's header, then tell > > qemu to append everything? > > I would think so as well. > > > > >>> Another solution would be to extend the 'fd:' protocol to allow multiple > >>> descriptors (for multifd) support to be passed in. The reason dup() > >>> can't be used is because in order for multifd to be supported it's > >>> required to be able to write to multiple, non-overlapping regions of the > >>> file. And duplicated fd's share their offsets etc. But that really seems > >>> more or less hacky. Alternatively it's possible that pwrite() are used > >>> to write to non-overlapping regions in the file. Any feedback is > >>> welcomed. > > > > I do like the idea of letting fd: take multiple fd's. > > Fine in my view, I think we will still need then a helper process in libvirt > to merge the data into a single file, no? > In case the libvirt multifd to single file multithreaded helper I proposed > before is helpful as a reference you could reuse/modify those patches.
Eww that's messy isn't it. (You don't fancy a huge sparse file do you?) > Maybe this new way will be acceptable to libvirt, > ie avoiding the multifd code -> socket, but still merging the data from the > multiple fds into a single file? It feels to me like the problem here is really what we want is something closer to a dump than the migration code; you don't need all that overhead of the code to deal with live migration bitmaps and dirty pages that aren't going to happen. Something that just does a nice single write(2) (for each memory region); and then ties the device state on. Dave > > > > Dave > > > > Thanks for your comments, > > Claudio > >>> > >>> > >>> Regards, > >>> Nikolay > >> > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK