> On 15 Apr 2019, at 12:37, Stefan Hajnoczi <stefa...@gmail.com> wrote:
>
> On Thu, Apr 11, 2019 at 01:45:06AM +0300, Liran Alon wrote:
>>
>>
>>> On 9 Apr 2019, at 17:29, Stefan Hajnoczi <stefa...@gmail.com> wrote:
>>>
>>> On Thu, Mar 21, 2019 at 09:55:42AM +0200, Nir Weiner wrote:
>>>> Originally migration was not possible with vhost-scsi because
>>>> as part of migration, the source host target SCSI device state
>>>> needs to be saved and loaded into the destination host target SCSI
>>>> device. This cannot be done by QEMU.
>>>>
>>>> As this can be handled either by external orchestrator or by having
>>>> shared-storage (i.e. iSCSI), there is no reason to limit the
>>>> orchestrator from being able to explictly specify it wish to enable
>>>> migration even when VM have a vhost-scsi device.
>>>>
>>>> Liran Alon (1):
>>>> vhost-scsi: Allow user to enable migration
>>>>
>>>> Nir Weiner (2):
>>>> vhost-scsi: The vhost device should be stopped when the VM is not
>>>> running
>>>> vhost-scsi: Add vmstate descriptor
>>>>
>>>> hw/scsi/vhost-scsi.c | 57 ++++++++++++++++++++++-----
>>>> include/hw/virtio/vhost-scsi-common.h | 1 +
>>>> 2 files changed, 48 insertions(+), 10 deletions(-)
>>>
>>> What happens when migration is attempted while there is in-flight I/O in
>>> the vring?
>>>
>>> Stefan
>>
>> What do you define as “in-flight I/O”? I think there is a need to separate
>> the discussion here to multiple cases:
>>
>> 1) The I/O request was written to vring but guest have not yet written to
>> doorbell:
>> This is the simplest case. No state is lost as the vring is migrated from
>> source host to dest host as part of guest memory migration.
>> Also, the vring properties (E.g. Guest base address) is transferred from
>> source host to dest host as part of vhost-scsi VMState.
>> (See patch 2/3 of series which adds VMSTATE_VIRTIO_DEVICE to vhost-scsi
>> VMState which will cause virtio_save() to save vring on migration stream).
>>
>> 2) The I/O request was written to vring and the guest have written to
>> doorbell:
>> The I/O request was submitted and processed by the vhost-scsi backend.
>> Therefore, the I/O request was sent to the remote TGT server.
>> If the the TGT server has performed the write and returned ACK but the ACK
>> was not received by guest, then guest is expected to initiate the write
>> again.
>> (Similar to what happens for a momentary TGT server downtime / network
>> downtime). So this isn’t suppose to be an issue.
>
> This scenario doesn't fit into the virtio-scsi model since each
> virtio-scsi request requires a response. Otherwise there is a vring
> descriptor leak.
>
> If the migration scheme can leak vring descriptors then it is incorrect.
> Especially repeated migrations could cause the vring to become totally
> filled with leaked descriptors.
>
> I think the case you've described cannot happen based on the
> drivers/vhost/scsi.c code. It seems that clearing the endpoint flushes
> and waits for inflight requests to complete. The
> VHOST_SCSI_CLEAR_ENDPOINT ioctl blocks until the remote target has
> returned an ACK for all in-flight requests.
Hmm interesting. I missed this part and it seems that you are right.
(vhost_scsi_ioctl() -> vhost_scsi_clear_endpoint() -> vhost_scsi_flush()
which is called by QEMU’s vhost_scsi_set_status() -> vhost_scsi_stop() ->
vhost_scsi_clear_endpoint() -> vhost_ops->vhost_scsi_clear_endpoint()).
>
> This patch series needs a clear explanation of how the running
> vhost_scsi.ko instance is quiesced such that:
>
> 1. Requests in-flight to the remote target are completed before
> migration handover. (Fancier approaches to live migration are
> possible but this patch series doesn't seem to implement them, so we
> need at least this basic guarantee for correctness.)
From what I understand, this is guaranteed by the code-path mentioned above.
> 2. No further I/O requests are submitted to the remote target after the
> vm stops running.
This is true because the VM at this point is stopped and the vhost-backend was
also stopped.
See QEMU’s vhost_scsi_set_status() -> vhost_scsi_stop() ->
vhost_scsi_common_stop() -> vhost_dev_stop().
> 3. No further guest memory access takes place after the vm stops running.
As the vhost-backend is stopped at this point, there cannot be further guest
memory accesses.
>
> These properties are critical for correct vhost live migration and I'm
> not confident yet based on this patch series. Please review the code
> (including kernel vhost code) and include an explanation in code
> comments (if possible) or commit descriptions.
I agree and we can submit a v2 patch series with better comments describing the
above.
But do you think there is something missing from the above description? I think
it should cover all your points.
Thanks for the insightful review,
-Liran
>
> Thanks,
> Stefan