On 10/11/23 17:29, Daniel P. Berrangé wrote: > On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote: >> >> On 10/11/23 16:05, Daniel P. Berrangé wrote: >>> >>> Instead of using 'getfd' though we have to use 'add-fd'. >>> >>> Anyway, this lets us do FD passing as normal, whle also >>> letting us specify the offset. >>> >>> {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} >>> {"execute": "migrate", "arguments": >>> {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Hi Daniel, the "add-fd" is the part that I don't understand at all, should we actually pass an fd there like with fd-get, already open with the savevm file? Something in pseudocode like: virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", "arguments": {"fdset-id":10}} ? should we use "opaque" instead of "fdset-id" if you want to actually set it to "migrate"? And how to reference it later? virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}} ? "opaque" does not seem to get me a reachable /dev/fdset/migrate though. I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that seems to work fine, it's the file:/dev/fdset part that I am still unable to glue together. Thanks for any idea, Claudio >>> >>>> Internally, the QEMU multifd code just reads and writes using pread, >>>> pwrite, so there is in any case just one fd to worry about, >>>> but who should own it, libvirt or QEMU? >>> >>> How about both :-) >> >> I need to familiarize a bit with this, there are pieces I am missing. Can >> you correct here? >> >> OPTION 1) >> >> libvirt opens the file and has the FD, writes the header, marks the offset, >> then we dup the FD in libvirt for the benefit of QEMU, optionally set the >> flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache, >> pass the duped FD to QEMU, >> QEMU does all the pread/pwrite on it with the correct offset (since it knows >> it from the file:// URI optional offset parameter), >> then libvirt closes the duped fd >> libvirt rewrites the header using the original fd (needed to update the >> metadata), >> libvirt closes the original fd >> >> >> OPTION 2) >> >> libvirt opens the file and has the FD, writes the header, marks the offset, >> then we pass the FD to QEMU, >> QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter, >> QEMU does all the pread/pwrite on it with the correct offset (since it knows >> it from the file:// URI optional offset parameter), >> QEMU closes the duped FD, >> libvirt rewrites the header using the original fd (needed to update the >> metadata), >> libvirt closes the original fd >> >> >> I don't remember if QEMU changes for the file offsets optimization are >> already "block friendly" ie they operate correctly whatever the state of >> O_DIRECT or ~O_DIRECT, >> I think so. They have been thought with O_DIRECT in mind. > > The 'file' protocol as it exists currently is not O_DIRECT > capable. It is not writing aligned buffers to aligned offsets > in the file. It is still running the regular old migration > stream format over the file, not taking advantage of it being > random access. > > What's needed is the followup "fixed ram" format adaptation. > Use of that format should imply O_DIRECT, so in fact we > don't need an explicit 'bypass_cache' parameter in QAPI, > just a way to ask for the 'fixed ram' format. > >> So I would tend to see OPTION 1) as more attractive as QEMU does not need to >> care about another parameter, whatever has been chosen in libvirt in terms >> of bypass cache is handled in libvirt. > > The 'fixed ram' format will only take care of I/O for the > main RAM blocks which are nicely aligned and can be written > to aligned file offsets. The general device vmstate I/O > probably can't be assumed to be aligned. While we could > futz around with QEMUFile so that it bounce buffers vmstate > to an aligned region and flushes it in page sized chunks > that's probably too much of a pain. > > IOW, actually I think what QEMU would likely want to > do is > > 1. qemu_open -> get a FD *without* O_DIRECT set > 2. write some vmstate stuff > 3. turn on O_DIRECT > 4. write RAM in fixed locations > 5. turn off O_DIRECT > 6. write remaining vmstate > > With regards, > Daniel