On 10.10.23 06:00, Yajun Wu wrote:

On 10/9/2023 5:13 PM, Hanna Czenczek wrote:
External email: Use caution opening links or attachments


On 09.10.23 11:07, Hanna Czenczek wrote:
On 09.10.23 10:21, Hanna Czenczek wrote:
On 07.10.23 04:22, Yajun Wu wrote:
[...]

The main motivation of adding VHOST_USER_SET_STATUS is to let
backend DPDK know
when DRIVER_OK bit is valid. It's an indication of all VQ
configuration has sent,
otherwise DPDK has to rely on first queue pair is ready, then
receiving/applying
VQ configuration one by one.

During live migration, configuring VQ one by one is very time
consuming.
One question I have here is why it wasn’t then introduced in the live
migration code, but in the general VM stop/cont code instead. It does
seem time-consuming to do this every time the VM is paused and resumed.

Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe because there's no device level stop/cont vhost message?

No, it is because qemu will reset the status in stop/cont*, which it should not do.  Aside from guest-initiated resets, the only thing where a reset comes into play is when the back-end is changed, e.g. during migration.  In that case, the source back-end will see a disconnect on the vhost-user socket and can then do whatever uninitialization it needs to do, and the destination front-end will need to be reconfigured by qemu anyway, because it’s just a case of the destination qemu initiating a fresh connection to a new back-end (except that it will need to restore the state from the source).

*Yes, technically, dpdk will ignore that reset, but it still stops the device on a different message (when it should just pause processing vrings), so the outcome is the same.


For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to set
RSS(Receive-Side Scaling).

If you don’t want SET_STATUS message, backend can remove protocol
feature bit
VHOST_USER_PROTOCOL_F_STATUS.
The problem isn’t back-ends that don’t want the message, the problem
is that qemu uses the message wrongly, which prevents well-behaving
back-ends from implementing the message.

DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
close/reset.
So the right thing to do for back-ends is to announce STATUS support
and then not implement it correctly?

GET_VRING_BASE should not reset the close or reset the device, by the
way.  It should stop that one vring, not more.  We have a
RESET_DEVICE command for resetting.
I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE?

I don’t think it matters who came first.  What matters is the specification, and that dpdk decided to rely on implementation-specific behavior without having all involved parties agree by matters of putting that in the specification.  And now dpdk clearly deviates from the specification as a result of that action, which can result in problems if the front-end doesn’t do what qemu always used to do.  (E.g. the front-end might just send GET_VRING_BASE for all vrings when suspending the guest, and then only send kicks on resume to re-start the vrings.  dpdk would most likely be left in a state where the whole device is stopped, expecting DRIVER_OK.  Same thing in general for front-ends that don’t support F_STATUS.)

It's a compatible issue. For new backend implements, we can have better solution, right?

The fact that dpdk and qemu deviate from the specification is a problem as-is.

I'm not involved in discussion about adding SET_STATUS in Vhost
protocol. This feature
is essential for vDPA(same as vhost-vdpa implements
VHOST_VDPA_SET_STATUS).
So from what I gather from your response is that there is only a
single use for SET_STATUS, which is the DRIVER_OK bit.  If so,
documenting that all other bits are to be ignored by both back-end
and front-end would be fine by me.

I’m not fully serious about that suggestion, but I hear the strong
implication that nothing but DRIVER_OK was of any concern, and this
is really important to note when we talk about the status of the
STATUS feature in vhost today.  It seems to me now that it was not
intended to be the virtio-level status byte, but just a DRIVER_OK
signalling path from front-end to back-end.  That makes it a
vhost-level protocol feature to me.
On second thought, it just is a pure vhost-level protocol feature, and
has nothing to do with the virtio status byte as-is.  The only stated
purpose is for the front-end to send DRIVER_OK after migration, but
migration is transparent to the guest, so the guest would never change
the status byte during migration.  Therefore, if this feature is
essential, we will never be able to have a status byte that is
transparently shared between guest and back-end device, i.e. the
virtio status byte.
On third thought, scratch that.  The guest wouldn’t set it, but
naturally, after migration, the front-end will need to restore the
status byte from the source, so the front-end will always need to set
it, even if it were otherwise used controlled only by the guest and the
back-end device.  So technically, this doesn’t prevent such a use case.
(In practice, it isn’t controlled by the guest right now, but that could
be fixed.)
I only tested the feature with DPDK(the only backend use it today?). Max defined the protocol and added the corresponding code in DPDK before I added QEMU support. If other backend or different device type want to use this, we can have further discussion?

So as far as I understand, the feature is supposed to rely on implementation-specific behavior between specifically qemu as a front-end and dpdk as a back-end, nothing else.  Honestly, that to me is a very good reason to deprecate it.  That would make it clear that any implementation that implements it does so because it relies on implementation-specific behavior from other implementations.

Option 2 is to fix it.  It is not right to use this broadly defined feature with its clear protocol as given in the virtio specification just to set and clear a single bit (DRIVER_OK).  The vhost-user specification points to that virtio protocol.  We must adhere to the protocol.  And note that we must not reset devices just because the VM is paused/resumed.  (That is why I wanted to deprecate SET_STATUS, so that Stefan’s series would introduce RESET_DEVICE where we need it, and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().)

Option 3 would be to just be honest in the specification, and limit the scope of F_STATUS to say the only bit that matters is DRIVER_OK.  I would say this is not really different from deprecating, though it wouldn’t affect your case.  However, I understand Alex relies on a full status byte.  I’m still interested to know why that is.

Option 4 is of course not to do anything, and leave everything as-is, waiting for the next person to stir the hornet’s nest.

Cc-ing Alex on this mail, because to me, this seems like an important
detail when he plans on using the byte in the future. If we need a
virtio status byte, I can’t see how we could use the existing F_STATUS
for it.

Hanna



Reply via email to