On 17.04.2024 10:36, Daniel P. Berrangé wrote:
On Tue, Apr 16, 2024 at 04:42:39PM +0200, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
VFIO device state transfer is currently done via the main migration channel.
This means that transfers from multiple VFIO devices are done sequentially
and via just a single common migration channel.
Such way of transferring VFIO device state migration data reduces
performance and severally impacts the migration downtime (~50%) for VMs
that have multiple such devices with large state size - see the test
results below.
However, we already have a way to transfer migration data using multiple
connections - that's what multifd channels are.
Unfortunately, multifd channels are currently utilized for RAM transfer
only.
This patch set adds a new framework allowing their use for device state
transfer too.
The wire protocol is based on Avihai's x-channel-header patches, which
introduce a header for migration channels that allow the migration source
to explicitly indicate the migration channel type without having the
target deduce the channel type by peeking in the channel's content.
The new wire protocol can be switch on and off via migration.x-channel-header
option for compatibility with older QEMU versions and testing.
Switching the new wire protocol off also disables device state transfer via
multifd channels.
The device state transfer can happen either via the same multifd channels
as RAM data is transferred, mixed with RAM data (when
migration.x-multifd-channels-device-state is 0) or exclusively via
dedicated device state transfer channels (when
migration.x-multifd-channels-device-state > 0).
Using dedicated device state transfer multifd channels brings further
performance benefits since these channels don't need to participate in
the RAM sync process.
I'm not convinced there's any need to introduce the new "channel header"
protocol messages. The multifd channels already have an initialization
message that is extensible to allow extra semantics to be indicated.
So if we want some of the multifd channels to be reserved for device
state, we could indicate that via some data in the MultiFDInit_t
message struct.
The reason for introducing x-channel-header was to avoid having to deduce
the channel type by peeking in the channel's content - where any channel
that does not start with QEMU_VM_FILE_MAGIC is currently treated as a
multifd one.
But if this isn't desired then, as you say, the multifd channel type can
be indicated by using some unused field of the MultiFDInit_t message.
Of course, this would still keep the QEMU_VM_FILE_MAGIC heuristic then.
That said, the idea of reserving channels specifically for VFIO doesn't
make a whole lot of sense to me either.
Once we've done the RAM transfer, and are in the switchover phase
doing device state transfer, all the multifd channels are idle.
We should just use all those channels to transfer the device state,
in parallel. Reserving channels just guarantees many idle channels
during RAM transfer, and further idle channels during vmstate
transfer.
IMHO it is more flexible to just use all available multifd channel
resources all the time.
The reason for having dedicated device state channels is that they
provide lower downtime in my tests.
With either 15 or 11 mixed multifd channels (no dedicated device state
channels) I get a downtime of about 1250 msec.
Comparing that with 15 total multifd channels / 4 dedicated device
state channels that give downtime of about 1100 ms it means that using
dedicated channels gets about 14% downtime improvement.
Again the 'MultiFDPacket_t' struct has
both 'flags' and unused fields, so it is extensible to indicate
that is it being used for new types of data.
Yeah, that's what MULTIFD_FLAG_DEVICE_STATE in packet header already
does in this patch set - it indicates that the packet contains device
state, not RAM data.
With regards,
Daniel
Best regards,
Maciej