Re: [Qemu-devel] [PATCH 0/3] vhost-scsi: Support live migration

Liran Alon Mon, 15 Apr 2019 03:01:42 -0700

> On 15 Apr 2019, at 12:37, Stefan Hajnoczi <stefa...@gmail.com> wrote:
> 
> On Thu, Apr 11, 2019 at 01:45:06AM +0300, Liran Alon wrote:
>> 
>> 
>>> On 9 Apr 2019, at 17:29, Stefan Hajnoczi <stefa...@gmail.com> wrote:
>>> 
>>> On Thu, Mar 21, 2019 at 09:55:42AM +0200, Nir Weiner wrote:
>>>> Originally migration was not possible with vhost-scsi because
>>>> as part of migration, the source host target SCSI device state
>>>> needs to be saved and loaded into the destination host target SCSI
>>>> device. This cannot be done by QEMU.
>>>> 
>>>> As this can be handled either by external orchestrator or by having
>>>> shared-storage (i.e. iSCSI), there is no reason to limit the
>>>> orchestrator from being able to explictly specify it wish to enable
>>>> migration even when VM have a vhost-scsi device.
>>>> 
>>>> Liran Alon (1):
>>>> vhost-scsi: Allow user to enable migration
>>>> 
>>>> Nir Weiner (2):
>>>> vhost-scsi: The vhost device should be stopped when the VM is not
>>>>   running
>>>> vhost-scsi: Add vmstate descriptor
>>>> 
>>>> hw/scsi/vhost-scsi.c                  | 57 ++++++++++++++++++++++-----
>>>> include/hw/virtio/vhost-scsi-common.h |  1 +
>>>> 2 files changed, 48 insertions(+), 10 deletions(-)
>>> 
>>> What happens when migration is attempted while there is in-flight I/O in
>>> the vring?
>>> 
>>> Stefan
>> 
>> What do you define as “in-flight I/O”? I think there is a need to separate 
>> the discussion here to multiple cases:
>> 
>> 1) The I/O request was written to vring but guest have not yet written to 
>> doorbell:
>> This is the simplest case. No state is lost as the vring is migrated from 
>> source host to dest host as part of guest memory migration.
>> Also, the vring properties (E.g. Guest base address) is transferred from 
>> source host to dest host as part of vhost-scsi VMState.
>> (See patch 2/3 of series which adds VMSTATE_VIRTIO_DEVICE to vhost-scsi 
>> VMState which will cause virtio_save() to save vring on migration stream).
>> 
>> 2) The I/O request was written to vring and the guest have written to 
>> doorbell:
>> The I/O request was submitted and processed by the vhost-scsi backend. 
>> Therefore, the I/O request was sent to the remote TGT server.
>> If the the TGT server has performed the write and returned ACK but the ACK 
>> was not received by guest, then guest is expected to initiate the write 
>> again.
>> (Similar to what happens for a momentary TGT server downtime / network 
>> downtime). So this isn’t suppose to be an issue.
> 
> This scenario doesn't fit into the virtio-scsi model since each
> virtio-scsi request requires a response.  Otherwise there is a vring
> descriptor leak.
> 
> If the migration scheme can leak vring descriptors then it is incorrect.
> Especially repeated migrations could cause the vring to become totally
> filled with leaked descriptors.
> 
> I think the case you've described cannot happen based on the
> drivers/vhost/scsi.c code.  It seems that clearing the endpoint flushes
> and waits for inflight requests to complete.  The
> VHOST_SCSI_CLEAR_ENDPOINT ioctl blocks until the remote target has
> returned an ACK for all in-flight requests.

Hmm interesting. I missed this part and it seems that you are right.
(vhost_scsi_ioctl() -> vhost_scsi_clear_endpoint() -> vhost_scsi_flush()
which is called by QEMU’s vhost_scsi_set_status() -> vhost_scsi_stop() -> 
vhost_scsi_clear_endpoint() -> vhost_ops->vhost_scsi_clear_endpoint()).

> 
> This patch series needs a clear explanation of how the running
> vhost_scsi.ko instance is quiesced such that:
> 
> 1. Requests in-flight to the remote target are completed before
>   migration handover.  (Fancier approaches to live migration are
>   possible but this patch series doesn't seem to implement them, so we
>   need at least this basic guarantee for correctness.)

From what I understand, this is guaranteed by the code-path mentioned above. 

> 2. No further I/O requests are submitted to the remote target after the
>   vm stops running.

This is true because the VM at this point is stopped and the vhost-backend was 
also stopped.
See QEMU’s vhost_scsi_set_status() -> vhost_scsi_stop() -> 
vhost_scsi_common_stop() -> vhost_dev_stop().

> 3. No further guest memory access takes place after the vm stops running.

As the vhost-backend is stopped at this point, there cannot be further guest 
memory accesses.

> 
> These properties are critical for correct vhost live migration and I'm
> not confident yet based on this patch series.  Please review the code
> (including kernel vhost code) and include an explanation in code
> comments (if possible) or commit descriptions.

I agree and we can submit a v2 patch series with better comments describing the 
above.
But do you think there is something missing from the above description? I think 
it should cover all your points.

Thanks for the insightful review,
-Liran

> 
> Thanks,
> Stefan
Re: [Qemu-devel] [PATCH 0/3] vhost-scsi: Support live migration

Reply via email to