live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC)

2015-12-07 Thread Michael S. Tsirkin
On Tue, Nov 24, 2015 at 09:35:17PM +0800, Lan Tianyu wrote:
> This patchset is to propose a solution of adding live migration
> support for SRIOV NIC.

I thought about what this is doing at the high level, and I do have some
value in what you are trying to do, but I also think we need to clarify
the motivation a bit more.  What you are saying is not really what the
patches are doing.

And with that clearer understanding of the motivation in mind (assuming
it actually captures a real need), I would also like to suggest some
changes.

TLDR:
- split this into 3 unrelated efforts/patchsets
- try implementing this host-side only using VT-d dirty tracking
- if making guest changes, make them in a way that makes many devices benefit
- measure speed before trying to improve it

---

First, this does not help to actually do migration with an
active assigned device. Guest needs to deactivate the device
before VM is moved around.

What they are actually able to do, instead, is three things.
My suggestion is to split them up, and work on them
separately.  There's really no need to have them all.

I discuss all 3 things below, but if we do need to have some discussion,
please snip and  let's have separate threads for each item please.


1. Starting live migration with device running.
This might help speed up networking during pre-copy where there is a
long warm-up phase.

Note: To complete migration, one also has to do something to stop
the device, but that's a separate item, since existing hot-unplug
request will do that just as well.


Proposed changes of approach:
One option is to write into the dma memory to make it dirty.  Your
patches do this within the driver, but doing this in the generic dma
unmap code seems more elegant as it will help all devices.  An
interesting note: on unplug, driver unmaps all memory for DMA, so this
works out fine.


Some benchmarking will be needed to show the performance overhead.
It is likely non zero, so an interface would be needed
to enable this tracking before starting migration.


According to the VT-d spec, I note that bit 6 in the PTE is the dirty
bit.  Why don't we use this to detect memory changes by the device?
Specifically, periodically scan pages that we have already
sent, test and clear atomically the dirty bit in the PTE of
the IOMMU, and if set, resend the page.
The interface could be simply an ioctl for VFIO giving
it a range of memory, and have VFIO do the scan and set
bits for userspace.

This might be slower than writing into DMA page,
since e.g. PML does not work here.

We could go for a mixed approach, where we negotiate with the
guest: if guest can write into memory on unmap, then
skip the scanning, otherwise do scanning of IOMMU PTEs
as described above.

I would suggest starting with clean IOMMU PTE polling
on host. If you see that there is a performance problem,
optimize later by enabling the updates within guest
if required.

2.  (Presumably) faster device stop.
After the warmup phase, we need to enter the stop and
copy phase. At that point, device needs to be stopped.
One way to do this is to send request to guest while
we continue to track and send memory changes.
I am not sure whether this is what you are doing,
but I'm assuming it is.

I don't know what do you do on the host,
I guesss you could send removal request to guest, and
keep sending page updates meanwhile.
After guest eject/stop acknowledge is received on the host,
you can enter stop and copy.

Your patches seem to stop device with a custom device specific
register, but using more generic interfaces, such as
e.g. device removal, could also work, even if
it's less optimal.

The way you defined the interfaces, they don't
seem device specific at all.
A new PCI capability ID reserved by the PCI SIG
could be one way to add the new interface
if it's needed.


We also need a way to know what does guest support.
With hotplug we know all modern guests support
it, but with any custom code we need negotiation,
and then fall back on either hot unplug
or blocking migration.

Additionally, hot-unplug will unmap all dma
memory so if all dma unmap callbacks do
a write, you get that memory dirtied for free.

At the moment, device removal destroys state such as IP address and arp
cache, but we could have guest move these around
if necessary. Possibly this can be done in userspace with
the guest agent. We could discuss guest kernel or firmware solutions
if we need to address corner cases such as network boot.

You might run into hotplug behaviour such as
a 5 second timeout until device is actually
detected. It always seemed silly to me.
A simple if (!kvm) in that code might be justified.

The fact that guest cooperation is needed
to complete migration is a big problem IMHO.
This practically means you need to give a lot of
CPU to a guest on an overcommitted host
in order to be able to move it out to another host.
Meanwhile, guest can abuse the extra CPU it got.

Can not surprise removal be emulated instead?

Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-04 Thread Michael S. Tsirkin
On Fri, Dec 04, 2015 at 02:42:36PM +0800, Lan, Tianyu wrote:
> 
> On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:
> >>>We hope
> >>>to find a better way to make SRIOV NIC work in these cases and this is
> >>>worth to do since SRIOV NIC provides better network performance compared
> >>>with PV NIC.
> >If this is a performance optimization as the above implies,
> >you need to include some numbers, and document how did
> >you implement the switch and how did you measure the performance.
> >
> 
> OK. Some ideas of my patches come from paper "CompSC: Live Migration with
> Pass-through Devices".
> http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf
> 
> It compared performance data between the solution of switching PV and VF and
> VF migration.(Chapter 7: Discussion)
> 

I haven't read it, but I would like to note you can't rely on research
papers.  If you propose a patch to be merged you need to measure what is
its actual effect on modern linux at the end of 2015.

> >>>Current patches have some issues. I think we can find
> >>>solution for them andimprove them step by step.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-04 Thread Lan, Tianyu


On 12/4/2015 4:05 PM, Michael S. Tsirkin wrote:

I haven't read it, but I would like to note you can't rely on research
papers.  If you propose a patch to be merged you need to measure what is
its actual effect on modern linux at the end of 2015.


Sure. Will do that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Lan, Tianyu


On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:

>We hope
>to find a better way to make SRIOV NIC work in these cases and this is
>worth to do since SRIOV NIC provides better network performance compared
>with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.



OK. Some ideas of my patches come from paper "CompSC: Live Migration with
Pass-through Devices".
http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf

It compared performance data between the solution of switching PV and VF 
and VF migration.(Chapter 7: Discussion)




>Current patches have some issues. I think we can find
>solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Lan, Tianyu


On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:

>We hope
>to find a better way to make SRIOV NIC work in these cases and this is
>worth to do since SRIOV NIC provides better network performance compared
>with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.



OK. Some ideas of my patches come from paper "CompSC: Live Migration with
Pass-through Devices".
http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf

It compared performance data between the solution of switching PV and VF 
and VF migration.(Chapter 7: Discussion)




>Current patches have some issues. I think we can find
>solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Alexander Duyck
On Wed, Dec 2, 2015 at 6:08 AM, Lan, Tianyu  wrote:
> On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:
>>>
>>> But
>>> it requires guest OS to do specific configurations inside and rely on
>>> bonding driver which blocks it work on Windows.
>>>  From performance side,
>>> putting VF and virtio NIC under bonded interface will affect their
>>> performance even when not do migration. These factors block to use VF
>>> NIC passthough in some user cases(Especially in the cloud) which require
>>> migration.
>>
>>
>> That's really up to guest. You don't need to do bonding,
>> you can just move the IP and mac from userspace, that's
>> possible on most OS-es.
>>
>> Or write something in guest kernel that is more lightweight if you are
>> so inclined. What we are discussing here is the host-guest interface,
>> not the in-guest interface.
>>
>>> Current solution we proposed changes NIC driver and Qemu. Guest Os
>>> doesn't need to do special thing for migration.
>>> It's easy to deploy
>>
>>
>>
>> Except of course these patches don't even work properly yet.
>>
>> And when they do, even minor changes in host side NIC hardware across
>> migration will break guests in hard to predict ways.
>
>
> Switching between PV and VF NIC will introduce network stop and the
> latency of hotplug VF is measurable. For some user cases(cloud service
> and OPNFV) which are sensitive to network stabilization and performance,
> these are not friend and blocks SRIOV NIC usage in these case. We hope
> to find a better way to make SRIOV NIC work in these cases and this is
> worth to do since SRIOV NIC provides better network performance compared
> with PV NIC. Current patches have some issues. I think we can find
> solution for them andimprove them step by step.

I still believe the concepts being put into use here are deeply
flawed.  You are assuming you can somehow complete the migration while
the device is active and I seriously doubt that is the case.  You are
going to cause data corruption or worse cause a kernel panic when you
end up corrupting the guest memory.

You have to halt the device at some point in order to complete the
migration.  Now I fully agree it is best to do this for as small a
window as possible.  I really think that your best approach would be
embrace and extend the current solution that is making use of bonding.
The first step being to make it so that you don't have to hot-plug the
VF until just before you halt the guest instead of before you start he
migration.  Just doing that would yield a significant gain in terms of
performance during the migration.  In addition something like that
should be able to be done without having to be overly invasive into
the drivers.  A few tweaks to the DMA API and you could probably have
that resolved.

As far as avoiding the hot-plug itself that would be better handled as
a separate follow-up, and really belongs more to the PCI layer than
the NIC device drivers.  The device drivers should already have code
for handling a suspend/resume due to a power cycle event.  If you
could make use of that then it is just a matter of implementing
something in the hot-plug or PCIe drivers that would allow QEMU to
signal when the device needs to go into D3 and when it can resume
normal operation at D0.  You could probably use the PCI Bus Master
Enable bit as the test on if the device is ready for migration or not.
If the bit is set you cannot migrate the VM, and if it is cleared than
you are ready to migrate.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-02 Thread Michael S. Tsirkin
On Wed, Dec 02, 2015 at 10:08:25PM +0800, Lan, Tianyu wrote:
> On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:
> >>But
> >>it requires guest OS to do specific configurations inside and rely on
> >>bonding driver which blocks it work on Windows.
> >> From performance side,
> >>putting VF and virtio NIC under bonded interface will affect their
> >>performance even when not do migration. These factors block to use VF
> >>NIC passthough in some user cases(Especially in the cloud) which require
> >>migration.
> >
> >That's really up to guest. You don't need to do bonding,
> >you can just move the IP and mac from userspace, that's
> >possible on most OS-es.
> >
> >Or write something in guest kernel that is more lightweight if you are
> >so inclined. What we are discussing here is the host-guest interface,
> >not the in-guest interface.
> >
> >>Current solution we proposed changes NIC driver and Qemu. Guest Os
> >>doesn't need to do special thing for migration.
> >>It's easy to deploy
> >
> >
> >Except of course these patches don't even work properly yet.
> >
> >And when they do, even minor changes in host side NIC hardware across
> >migration will break guests in hard to predict ways.
> 
> Switching between PV and VF NIC will introduce network stop and the
> latency of hotplug VF is measurable.
> For some user cases(cloud service
> and OPNFV) which are sensitive to network stabilization and performance,
> these are not friend and blocks SRIOV NIC usage in these case.

I find this hard to credit. hotplug is not normally a data path
operation.

> We hope
> to find a better way to make SRIOV NIC work in these cases and this is
> worth to do since SRIOV NIC provides better network performance compared
> with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.

> Current patches have some issues. I think we can find
> solution for them andimprove them step by step.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-02 Thread Lan, Tianyu

On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:

But
it requires guest OS to do specific configurations inside and rely on
bonding driver which blocks it work on Windows.
 From performance side,
putting VF and virtio NIC under bonded interface will affect their
performance even when not do migration. These factors block to use VF
NIC passthough in some user cases(Especially in the cloud) which require
migration.


That's really up to guest. You don't need to do bonding,
you can just move the IP and mac from userspace, that's
possible on most OS-es.

Or write something in guest kernel that is more lightweight if you are
so inclined. What we are discussing here is the host-guest interface,
not the in-guest interface.


Current solution we proposed changes NIC driver and Qemu. Guest Os
doesn't need to do special thing for migration.
It's easy to deploy



Except of course these patches don't even work properly yet.

And when they do, even minor changes in host side NIC hardware across
migration will break guests in hard to predict ways.


Switching between PV and VF NIC will introduce network stop and the
latency of hotplug VF is measurable. For some user cases(cloud service
and OPNFV) which are sensitive to network stabilization and performance,
these are not friend and blocks SRIOV NIC usage in these case. We hope
to find a better way to make SRIOV NIC work in these cases and this is
worth to do since SRIOV NIC provides better network performance compared
with PV NIC. Current patches have some issues. I think we can find
solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-01 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 02:26:57PM +0800, Lan, Tianyu wrote:
> 
> 
> On 11/30/2015 4:01 PM, Michael S. Tsirkin wrote:
> >It is still not very clear what it is you are trying to achieve, and
> >whether your patchset achieves it.  You merely say "adding live
> >migration" but it seems pretty clear this isn't about being able to
> >migrate a guest transparently, since you are adding a host/guest
> >handshake.
> >
> >This isn't about functionality either: I think that on KVM, it isn't
> >hard to live migrate if you can do a host/guest handshake, even today,
> >with no kernel changes:
> >1. before migration, expose a pv nic to guest (can be done directly on
> >   boot)
> >2. use e.g. a serial connection to move IP from an assigned device to pv nic
> >3. maybe move the mac as well
> >4. eject the assigned device
> >5. detect eject on host (QEMU generates a DEVICE_DELETED event when this
> >happens) and start migration
> >
> 
> This looks like the bonding driver solution

Why does it? Unlike bonding, this doesn't touch data path or
any kernel code. Just run a script from guest agent.

> which put pv nic and VF
> in one bonded interface under active-backup mode. The bonding driver
> will switch from VF to PV nic automatically when VF is unplugged during
> migration. This is the only available solution for VF NIC migration.

It really isn't. For one, there is also teaming.

> But
> it requires guest OS to do specific configurations inside and rely on
> bonding driver which blocks it work on Windows.
> From performance side,
> putting VF and virtio NIC under bonded interface will affect their
> performance even when not do migration. These factors block to use VF
> NIC passthough in some user cases(Especially in the cloud) which require
> migration.

That's really up to guest. You don't need to do bonding,
you can just move the IP and mac from userspace, that's
possible on most OS-es.

Or write something in guest kernel that is more lightweight if you are
so inclined. What we are discussing here is the host-guest interface,
not the in-guest interface.

> Current solution we proposed changes NIC driver and Qemu. Guest Os
> doesn't need to do special thing for migration.
> It's easy to deploy


Except of course these patches don't even work properly yet.

And when they do, even minor changes in host side NIC hardware across
migration will break guests in hard to predict ways.

> and
> all changes are in the NIC driver, NIC vendor can implement migration
> support just in the their driver.

Kernel code and hypervisor code is not easier to develop and deploy than
a userspace script.  If that is all the motivation there is, that's a
pretty small return on investment.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-11-30 Thread Lan, Tianyu



On 11/30/2015 4:01 PM, Michael S. Tsirkin wrote:

It is still not very clear what it is you are trying to achieve, and
whether your patchset achieves it.  You merely say "adding live
migration" but it seems pretty clear this isn't about being able to
migrate a guest transparently, since you are adding a host/guest
handshake.

This isn't about functionality either: I think that on KVM, it isn't
hard to live migrate if you can do a host/guest handshake, even today,
with no kernel changes:
1. before migration, expose a pv nic to guest (can be done directly on
   boot)
2. use e.g. a serial connection to move IP from an assigned device to pv nic
3. maybe move the mac as well
4. eject the assigned device
5. detect eject on host (QEMU generates a DEVICE_DELETED event when this
happens) and start migration



This looks like the bonding driver solution which put pv nic and VF
in one bonded interface under active-backup mode. The bonding driver
will switch from VF to PV nic automatically when VF is unplugged during
migration. This is the only available solution for VF NIC migration. But
it requires guest OS to do specific configurations inside and rely on
bonding driver which blocks it work on Windows. From performance side,
putting VF and virtio NIC under bonded interface will affect their
performance even when not do migration. These factors block to use VF
NIC passthough in some user cases(Especially in the cloud) which require
migration.

Current solution we proposed changes NIC driver and Qemu. Guest Os
doesn't need to do special thing for migration. It's easy to deploy and
all changes are in the NIC driver, NIC vendor can implement migration
support just in the their driver.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-11-30 Thread Michael S. Tsirkin
On Tue, Nov 24, 2015 at 09:35:17PM +0800, Lan Tianyu wrote:
> This patchset is to propose a solution of adding live migration
> support for SRIOV NIC.
> 
> During migration, Qemu needs to let VF driver in the VM to know
> migration start and end. Qemu adds faked PCI migration capability
> to help to sync status between two sides during migration.
> 
> Qemu triggers VF's mailbox irq via sending MSIX msg when migration
> status is changed. VF driver tells Qemu its mailbox vector index
> via the new PCI capability. In some cases(NIC is suspended or closed),
> VF mailbox irq is freed and VF driver can disable irq injecting via
> new capability.   
> 
> VF driver will put down nic before migration and put up again on
> the target machine.

It is still not very clear what it is you are trying to achieve, and
whether your patchset achieves it.  You merely say "adding live
migration" but it seems pretty clear this isn't about being able to
migrate a guest transparently, since you are adding a host/guest
handshake.

This isn't about functionality either: I think that on KVM, it isn't
hard to live migrate if you can do a host/guest handshake, even today,
with no kernel changes:
1. before migration, expose a pv nic to guest (can be done directly on
  boot)
2. use e.g. a serial connection to move IP from an assigned device to pv nic
3. maybe move the mac as well
4. eject the assigned device
5. detect eject on host (QEMU generates a DEVICE_DELETED event when this
   happens) and start migration

Is this patchset a performance optimization then?
If yes it needs to be accompanied with some performance numbers.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html