On 10/11/23 17:29, Daniel P. Berrangé wrote:
> On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
>>
>> On 10/11/23 16:05, Daniel P. Berrangé wrote:
>>>
>>> Instead of using 'getfd' though we have to use 'add-fd'.
>>>
>>> Anyway, this lets us do FD passing as normal, whle also
>>> letting us specify the offset.
>>>
>>>  {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}}
>>>  {"execute": "migrate", "arguments": 
>>> {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'


Hi Daniel,

the "add-fd" is the part that I don't understand at all,

should we actually pass an fd there like with fd-get, already open with the 
savevm file?
Something in pseudocode like:

virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", 
"arguments": {"fdset-id":10}} ?

should we use "opaque" instead of "fdset-id" if you want to actually set it to 
"migrate"?
And how to reference it later?

virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": 
{"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}

?

"opaque" does not seem to get me a reachable /dev/fdset/migrate though.

I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that 
seems to work fine,
it's the file:/dev/fdset part that I am still unable to glue together.

Thanks for any idea,

Claudio


>>>
>>>> Internally, the QEMU multifd code just reads and writes using pread, 
>>>> pwrite, so there is in any case just one fd to worry about,
>>>> but who should own it, libvirt or QEMU?
>>>
>>> How about both :-)
>>
>> I need to familiarize a bit with this, there are pieces I am missing. Can 
>> you correct here?
>>
>> OPTION 1)
>>
>> libvirt opens the file and has the FD, writes the header, marks the offset,
>> then we dup the FD in libvirt for the benefit of QEMU, optionally set the 
>> flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache,
>> pass the duped FD to QEMU,
>> QEMU does all the pread/pwrite on it with the correct offset (since it knows 
>> it from the file:// URI optional offset parameter),
>> then libvirt closes the duped fd
>> libvirt rewrites the header using the original fd (needed to update the 
>> metadata),
>> libvirt closes the original fd
>>
>>
>> OPTION 2)
>>
>> libvirt opens the file and has the FD, writes the header, marks the offset,
>> then we pass the FD to QEMU,
>> QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter,
>> QEMU does all the pread/pwrite on it with the correct offset (since it knows 
>> it from the file:// URI optional offset parameter),
>> QEMU closes the duped FD,
>> libvirt rewrites the header using the original fd (needed to update the 
>> metadata),
>> libvirt closes the original fd
>>
>>
>> I don't remember if QEMU changes for the file offsets optimization are 
>> already "block friendly" ie they operate correctly whatever the state of 
>> O_DIRECT or ~O_DIRECT,
>> I think so. They have been thought with O_DIRECT in mind.
> 
> The 'file' protocol as it exists currently is not O_DIRECT
> capable. It is not writing aligned buffers to aligned offsets
> in the file. It is still running the regular old migration
> stream format over the file, not taking advantage of it being
> random access.
> 
> What's needed is the followup "fixed ram" format adaptation.
> Use of that format should imply O_DIRECT, so in fact we
> don't need an explicit 'bypass_cache' parameter in QAPI,
> just a way to ask for the 'fixed ram' format.
> 
>> So I would tend to see OPTION 1) as more attractive as QEMU does not need to 
>> care about another parameter, whatever has been chosen in libvirt in terms 
>> of bypass cache is handled in libvirt.
> 
> The 'fixed ram' format will only take care of I/O for the
> main RAM blocks which are nicely aligned and can be written
> to aligned file offsets. The general device vmstate I/O
> probably can't be assumed to be aligned. While we could
> futz around with QEMUFile so that it bounce buffers vmstate
> to an aligned region and flushes it in page sized chunks
> that's probably too much of a pain.
> 
> IOW, actually I think what QEMU would likely want to
> do is
> 
>  1. qemu_open  -> get a FD *without* O_DIRECT set
>  2. write some vmstate stuff
>  3. turn on O_DIRECT
>  4. write RAM in fixed locations
>  5. turn off O_DIRECT
>  6. write remaining vmstate
> 
> With regards,
> Daniel

Reply via email to