Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> >> The two mechanisms referenced above would likely require coordination with
> >> QEMU and as such are open to discussion.  I haven't attempted to address
> >> them as I am not sure there is a consensus as of yet.  My personal
> >> preference would be to add a vendor-specific configuration block to the
> >> emulated pci-bridge interfaces created by QEMU that would allow us to
> >> essentially extend shpc to support guest live migration with pass-through
> >> devices.
> >
> > shpc?
> 
> That is kind of what I was thinking.  We basically need some mechanism
> to allow for the host to ask the device to quiesce.  It has been
> proposed to possibly even look at something like an ACPI interface
> since I know ACPI is used by QEMU to manage hot-plug in the standard
> case.
> 
> - Alex


Start by using hot-unplug for this!

Really use your patch guest side, and write host side
to allow starting migration with the device, but
defer completing it.

So

1.- host tells guest to start tracking memory writes
2.- guest acks
3.- migration starts
4.- most memory is migrated
5.- host tells guest to eject device
6.- guest acks
7.- stop vm and migrate rest of state


It will already be a win since hot unplug after migration starts and
most memory has been migrated is better than hot unplug before migration
starts.

Then measure downtime and profile. Then we can look at ways
to quiesce device faster which really means step 5 is replaced
with "host tells guest to quiesce device and dirty (or just unmap!)
all memory mapped for write by device".

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > >> The two mechanisms referenced above would likely require coordination 
> > >> with
> > >> QEMU and as such are open to discussion.  I haven't attempted to address
> > >> them as I am not sure there is a consensus as of yet.  My personal
> > >> preference would be to add a vendor-specific configuration block to the
> > >> emulated pci-bridge interfaces created by QEMU that would allow us to
> > >> essentially extend shpc to support guest live migration with pass-through
> > >> devices.
> > >
> > > shpc?
> > 
> > That is kind of what I was thinking.  We basically need some mechanism
> > to allow for the host to ask the device to quiesce.  It has been
> > proposed to possibly even look at something like an ACPI interface
> > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > case.
> > 
> > - Alex
> 
> 
> Start by using hot-unplug for this!
> 
> Really use your patch guest side, and write host side
> to allow starting migration with the device, but
> defer completing it.
> 
> So
> 
> 1.- host tells guest to start tracking memory writes
> 2.- guest acks
> 3.- migration starts
> 4.- most memory is migrated
> 5.- host tells guest to eject device
> 6.- guest acks
> 7.- stop vm and migrate rest of state
> 
> 
> It will already be a win since hot unplug after migration starts and
> most memory has been migrated is better than hot unplug before migration
> starts.
> 
> Then measure downtime and profile. Then we can look at ways
> to quiesce device faster which really means step 5 is replaced
> with "host tells guest to quiesce device and dirty (or just unmap!)
> all memory mapped for write by device".


Doing a hot-unplug is going to upset the guests network stacks view
of the world; that's something we don't want to change.

Dave

> 
> -- 
> MST
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > >> The two mechanisms referenced above would likely require coordination 
> > > >> with
> > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > >> address
> > > >> them as I am not sure there is a consensus as of yet.  My personal
> > > >> preference would be to add a vendor-specific configuration block to the
> > > >> emulated pci-bridge interfaces created by QEMU that would allow us to
> > > >> essentially extend shpc to support guest live migration with 
> > > >> pass-through
> > > >> devices.
> > > >
> > > > shpc?
> > > 
> > > That is kind of what I was thinking.  We basically need some mechanism
> > > to allow for the host to ask the device to quiesce.  It has been
> > > proposed to possibly even look at something like an ACPI interface
> > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > case.
> > > 
> > > - Alex
> > 
> > 
> > Start by using hot-unplug for this!
> > 
> > Really use your patch guest side, and write host side
> > to allow starting migration with the device, but
> > defer completing it.
> > 
> > So
> > 
> > 1.- host tells guest to start tracking memory writes
> > 2.- guest acks
> > 3.- migration starts
> > 4.- most memory is migrated
> > 5.- host tells guest to eject device
> > 6.- guest acks
> > 7.- stop vm and migrate rest of state
> > 
> > 
> > It will already be a win since hot unplug after migration starts and
> > most memory has been migrated is better than hot unplug before migration
> > starts.
> > 
> > Then measure downtime and profile. Then we can look at ways
> > to quiesce device faster which really means step 5 is replaced
> > with "host tells guest to quiesce device and dirty (or just unmap!)
> > all memory mapped for write by device".
> 
> 
> Doing a hot-unplug is going to upset the guests network stacks view
> of the world; that's something we don't want to change.
> 
> Dave

It might but if you store the IP and restore it quickly
after migration e.g. using guest agent, as opposed to DHCP,
then it won't.

It allows calming the device down in a generic way,
specific drivers can then implement the fast quiesce.

> > 
> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Alexander Duyck
On Tue, Jan 5, 2016 at 1:40 AM, Michael S. Tsirkin  wrote:
> On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
>> >> The two mechanisms referenced above would likely require coordination with
>> >> QEMU and as such are open to discussion.  I haven't attempted to address
>> >> them as I am not sure there is a consensus as of yet.  My personal
>> >> preference would be to add a vendor-specific configuration block to the
>> >> emulated pci-bridge interfaces created by QEMU that would allow us to
>> >> essentially extend shpc to support guest live migration with pass-through
>> >> devices.
>> >
>> > shpc?
>>
>> That is kind of what I was thinking.  We basically need some mechanism
>> to allow for the host to ask the device to quiesce.  It has been
>> proposed to possibly even look at something like an ACPI interface
>> since I know ACPI is used by QEMU to manage hot-plug in the standard
>> case.
>>
>> - Alex
>
>
> Start by using hot-unplug for this!
>
> Really use your patch guest side, and write host side
> to allow starting migration with the device, but
> defer completing it.

Yeah, I'm fully on board with this idea, though I'm not really working
on this right now since last I knew the folks on this thread from
Intel were working on it.  My patches were mostly meant to be a nudge
in this direction so that we could get away from the driver specific
code.

> So
>
> 1.- host tells guest to start tracking memory writes
> 2.- guest acks
> 3.- migration starts
> 4.- most memory is migrated
> 5.- host tells guest to eject device
> 6.- guest acks
> 7.- stop vm and migrate rest of state
>

Sounds about right.  The only way this differs from what I see as the
final solution for this is that instead of fully ejecting the device
in step 5 the driver would instead pause the device and give the host
something like 10 seconds to stop the VM and resume with the same
device connected if it is available.  We would probably also need to
look at a solution that would force the device to be ejected or abort
prior to starting the migration if it doesn't give us the ack in step
2.

> It will already be a win since hot unplug after migration starts and
> most memory has been migrated is better than hot unplug before migration
> starts.

Right.  Generally the longer the VF can be maintained as a part of the
guest the longer the network performance is improved versus using a
purely virtual interface.

> Then measure downtime and profile. Then we can look at ways
> to quiesce device faster which really means step 5 is replaced
> with "host tells guest to quiesce device and dirty (or just unmap!)
> all memory mapped for write by device".

Step 5 will be the spot where we really need to start modifying
drivers.  Specifically we probably need to go through and clean-up
things so that we can reduce as many of the delays in the driver
suspend/resume path as possible.  I suspect there is quite a bit that
can be done there that would probably also improve boot and shutdown
times since those are also impacted by the devices.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > >> The two mechanisms referenced above would likely require 
> > > > >> coordination with
> > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > >> address
> > > > >> them as I am not sure there is a consensus as of yet.  My personal
> > > > >> preference would be to add a vendor-specific configuration block to 
> > > > >> the
> > > > >> emulated pci-bridge interfaces created by QEMU that would allow us to
> > > > >> essentially extend shpc to support guest live migration with 
> > > > >> pass-through
> > > > >> devices.
> > > > >
> > > > > shpc?
> > > > 
> > > > That is kind of what I was thinking.  We basically need some mechanism
> > > > to allow for the host to ask the device to quiesce.  It has been
> > > > proposed to possibly even look at something like an ACPI interface
> > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > case.
> > > > 
> > > > - Alex
> > > 
> > > 
> > > Start by using hot-unplug for this!
> > > 
> > > Really use your patch guest side, and write host side
> > > to allow starting migration with the device, but
> > > defer completing it.
> > > 
> > > So
> > > 
> > > 1.- host tells guest to start tracking memory writes
> > > 2.- guest acks
> > > 3.- migration starts
> > > 4.- most memory is migrated
> > > 5.- host tells guest to eject device
> > > 6.- guest acks
> > > 7.- stop vm and migrate rest of state
> > > 
> > > 
> > > It will already be a win since hot unplug after migration starts and
> > > most memory has been migrated is better than hot unplug before migration
> > > starts.
> > > 
> > > Then measure downtime and profile. Then we can look at ways
> > > to quiesce device faster which really means step 5 is replaced
> > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > all memory mapped for write by device".
> > 
> > 
> > Doing a hot-unplug is going to upset the guests network stacks view
> > of the world; that's something we don't want to change.
> > 
> > Dave
> 
> It might but if you store the IP and restore it quickly
> after migration e.g. using guest agent, as opposed to DHCP,
> then it won't.

I thought if you hot-unplug then it will lose any outstanding connections
on that device.

> It allows calming the device down in a generic way,
> specific drivers can then implement the fast quiesce.

Except that if it breaks the guest networking it's useless.

Dave

> 
> > > 
> > > -- 
> > > MST
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > >> The two mechanisms referenced above would likely require 
> > > > > >> coordination with
> > > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > > >> address
> > > > > >> them as I am not sure there is a consensus as of yet.  My personal
> > > > > >> preference would be to add a vendor-specific configuration block 
> > > > > >> to the
> > > > > >> emulated pci-bridge interfaces created by QEMU that would allow us 
> > > > > >> to
> > > > > >> essentially extend shpc to support guest live migration with 
> > > > > >> pass-through
> > > > > >> devices.
> > > > > >
> > > > > > shpc?
> > > > > 
> > > > > That is kind of what I was thinking.  We basically need some mechanism
> > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > proposed to possibly even look at something like an ACPI interface
> > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > case.
> > > > > 
> > > > > - Alex
> > > > 
> > > > 
> > > > Start by using hot-unplug for this!
> > > > 
> > > > Really use your patch guest side, and write host side
> > > > to allow starting migration with the device, but
> > > > defer completing it.
> > > > 
> > > > So
> > > > 
> > > > 1.- host tells guest to start tracking memory writes
> > > > 2.- guest acks
> > > > 3.- migration starts
> > > > 4.- most memory is migrated
> > > > 5.- host tells guest to eject device
> > > > 6.- guest acks
> > > > 7.- stop vm and migrate rest of state
> > > > 
> > > > 
> > > > It will already be a win since hot unplug after migration starts and
> > > > most memory has been migrated is better than hot unplug before migration
> > > > starts.
> > > > 
> > > > Then measure downtime and profile. Then we can look at ways
> > > > to quiesce device faster which really means step 5 is replaced
> > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > all memory mapped for write by device".
> > > 
> > > 
> > > Doing a hot-unplug is going to upset the guests network stacks view
> > > of the world; that's something we don't want to change.
> > > 
> > > Dave
> > 
> > It might but if you store the IP and restore it quickly
> > after migration e.g. using guest agent, as opposed to DHCP,
> > then it won't.
> 
> I thought if you hot-unplug then it will lose any outstanding connections
> on that device.

Which connections and which device?  TCP connections and an ethernet
device?  These are on different layers so of course you don't lose them.
Just do not change the IP address.

Some guests send a signal to applications to close connections
when all links go down. One can work around this
in a variety of ways.

> > It allows calming the device down in a generic way,
> > specific drivers can then implement the fast quiesce.
> 
> Except that if it breaks the guest networking it's useless.
> 
> Dave
> 
> > 
> > > > 
> > > > -- 
> > > > MST
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 12:59:54PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > >> The two mechanisms referenced above would likely require 
> > > > > > >> coordination with
> > > > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > > > >> address
> > > > > > >> them as I am not sure there is a consensus as of yet.  My 
> > > > > > >> personal
> > > > > > >> preference would be to add a vendor-specific configuration block 
> > > > > > >> to the
> > > > > > >> emulated pci-bridge interfaces created by QEMU that would allow 
> > > > > > >> us to
> > > > > > >> essentially extend shpc to support guest live migration with 
> > > > > > >> pass-through
> > > > > > >> devices.
> > > > > > >
> > > > > > > shpc?
> > > > > > 
> > > > > > That is kind of what I was thinking.  We basically need some 
> > > > > > mechanism
> > > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > > case.
> > > > > > 
> > > > > > - Alex
> > > > > 
> > > > > 
> > > > > Start by using hot-unplug for this!
> > > > > 
> > > > > Really use your patch guest side, and write host side
> > > > > to allow starting migration with the device, but
> > > > > defer completing it.
> > > > > 
> > > > > So
> > > > > 
> > > > > 1.- host tells guest to start tracking memory writes
> > > > > 2.- guest acks
> > > > > 3.- migration starts
> > > > > 4.- most memory is migrated
> > > > > 5.- host tells guest to eject device
> > > > > 6.- guest acks
> > > > > 7.- stop vm and migrate rest of state
> > > > > 
> > > > > 
> > > > > It will already be a win since hot unplug after migration starts and
> > > > > most memory has been migrated is better than hot unplug before 
> > > > > migration
> > > > > starts.
> > > > > 
> > > > > Then measure downtime and profile. Then we can look at ways
> > > > > to quiesce device faster which really means step 5 is replaced
> > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > all memory mapped for write by device".
> > > > 
> > > > 
> > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > of the world; that's something we don't want to change.
> > > > 
> > > > Dave
> > > 
> > > It might but if you store the IP and restore it quickly
> > > after migration e.g. using guest agent, as opposed to DHCP,
> > > then it won't.
> > 
> > I thought if you hot-unplug then it will lose any outstanding connections
> > on that device.
> > 
> > > It allows calming the device down in a generic way,
> > > specific drivers can then implement the fast quiesce.
> > 
> > Except that if it breaks the guest networking it's useless.
> > 
> > Dave
> 
> Is hot unplug useless then?

Actually I misunderstood the question, unplug does not
have to break guest networking.

> > > 
> > > > > 
> > > > > -- 
> > > > > MST
> > > > --
> > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > >> The two mechanisms referenced above would likely require 
> > > > > >> coordination with
> > > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > > >> address
> > > > > >> them as I am not sure there is a consensus as of yet.  My personal
> > > > > >> preference would be to add a vendor-specific configuration block 
> > > > > >> to the
> > > > > >> emulated pci-bridge interfaces created by QEMU that would allow us 
> > > > > >> to
> > > > > >> essentially extend shpc to support guest live migration with 
> > > > > >> pass-through
> > > > > >> devices.
> > > > > >
> > > > > > shpc?
> > > > > 
> > > > > That is kind of what I was thinking.  We basically need some mechanism
> > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > proposed to possibly even look at something like an ACPI interface
> > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > case.
> > > > > 
> > > > > - Alex
> > > > 
> > > > 
> > > > Start by using hot-unplug for this!
> > > > 
> > > > Really use your patch guest side, and write host side
> > > > to allow starting migration with the device, but
> > > > defer completing it.
> > > > 
> > > > So
> > > > 
> > > > 1.- host tells guest to start tracking memory writes
> > > > 2.- guest acks
> > > > 3.- migration starts
> > > > 4.- most memory is migrated
> > > > 5.- host tells guest to eject device
> > > > 6.- guest acks
> > > > 7.- stop vm and migrate rest of state
> > > > 
> > > > 
> > > > It will already be a win since hot unplug after migration starts and
> > > > most memory has been migrated is better than hot unplug before migration
> > > > starts.
> > > > 
> > > > Then measure downtime and profile. Then we can look at ways
> > > > to quiesce device faster which really means step 5 is replaced
> > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > all memory mapped for write by device".
> > > 
> > > 
> > > Doing a hot-unplug is going to upset the guests network stacks view
> > > of the world; that's something we don't want to change.
> > > 
> > > Dave
> > 
> > It might but if you store the IP and restore it quickly
> > after migration e.g. using guest agent, as opposed to DHCP,
> > then it won't.
> 
> I thought if you hot-unplug then it will lose any outstanding connections
> on that device.
> 
> > It allows calming the device down in a generic way,
> > specific drivers can then implement the fast quiesce.
> 
> Except that if it breaks the guest networking it's useless.
> 
> Dave

Is hot unplug useless then?

> > 
> > > > 
> > > > -- 
> > > > MST
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > >> The two mechanisms referenced above would likely require 
> > > > > > >> coordination with
> > > > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > > > >> address
> > > > > > >> them as I am not sure there is a consensus as of yet.  My 
> > > > > > >> personal
> > > > > > >> preference would be to add a vendor-specific configuration block 
> > > > > > >> to the
> > > > > > >> emulated pci-bridge interfaces created by QEMU that would allow 
> > > > > > >> us to
> > > > > > >> essentially extend shpc to support guest live migration with 
> > > > > > >> pass-through
> > > > > > >> devices.
> > > > > > >
> > > > > > > shpc?
> > > > > > 
> > > > > > That is kind of what I was thinking.  We basically need some 
> > > > > > mechanism
> > > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > > case.
> > > > > > 
> > > > > > - Alex
> > > > > 
> > > > > 
> > > > > Start by using hot-unplug for this!
> > > > > 
> > > > > Really use your patch guest side, and write host side
> > > > > to allow starting migration with the device, but
> > > > > defer completing it.
> > > > > 
> > > > > So
> > > > > 
> > > > > 1.- host tells guest to start tracking memory writes
> > > > > 2.- guest acks
> > > > > 3.- migration starts
> > > > > 4.- most memory is migrated
> > > > > 5.- host tells guest to eject device
> > > > > 6.- guest acks
> > > > > 7.- stop vm and migrate rest of state
> > > > > 
> > > > > 
> > > > > It will already be a win since hot unplug after migration starts and
> > > > > most memory has been migrated is better than hot unplug before 
> > > > > migration
> > > > > starts.
> > > > > 
> > > > > Then measure downtime and profile. Then we can look at ways
> > > > > to quiesce device faster which really means step 5 is replaced
> > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > all memory mapped for write by device".
> > > > 
> > > > 
> > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > of the world; that's something we don't want to change.
> > > > 
> > > > Dave
> > > 
> > > It might but if you store the IP and restore it quickly
> > > after migration e.g. using guest agent, as opposed to DHCP,
> > > then it won't.
> > 
> > I thought if you hot-unplug then it will lose any outstanding connections
> > on that device.
> > 
> > > It allows calming the device down in a generic way,
> > > specific drivers can then implement the fast quiesce.
> > 
> > Except that if it breaks the guest networking it's useless.
> > 
> > Dave
> 
> Is hot unplug useless then?

As a migration hack, yes, unless it's paired with a second network device
as a redundent route.
To do what's being suggested here it's got to be done at the device level
and not visible to the networking stack.

Dave

> 
> > > 
> > > > > 
> > > > > -- 
> > > > > MST
> > > > --
> > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 11:03:38AM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > > >> The two mechanisms referenced above would likely require 
> > > > > > > >> coordination with
> > > > > > > >> QEMU and as such are open to discussion.  I haven't attempted 
> > > > > > > >> to address
> > > > > > > >> them as I am not sure there is a consensus as of yet.  My 
> > > > > > > >> personal
> > > > > > > >> preference would be to add a vendor-specific configuration 
> > > > > > > >> block to the
> > > > > > > >> emulated pci-bridge interfaces created by QEMU that would 
> > > > > > > >> allow us to
> > > > > > > >> essentially extend shpc to support guest live migration with 
> > > > > > > >> pass-through
> > > > > > > >> devices.
> > > > > > > >
> > > > > > > > shpc?
> > > > > > > 
> > > > > > > That is kind of what I was thinking.  We basically need some 
> > > > > > > mechanism
> > > > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > > since I know ACPI is used by QEMU to manage hot-plug in the 
> > > > > > > standard
> > > > > > > case.
> > > > > > > 
> > > > > > > - Alex
> > > > > > 
> > > > > > 
> > > > > > Start by using hot-unplug for this!
> > > > > > 
> > > > > > Really use your patch guest side, and write host side
> > > > > > to allow starting migration with the device, but
> > > > > > defer completing it.
> > > > > > 
> > > > > > So
> > > > > > 
> > > > > > 1.- host tells guest to start tracking memory writes
> > > > > > 2.- guest acks
> > > > > > 3.- migration starts
> > > > > > 4.- most memory is migrated
> > > > > > 5.- host tells guest to eject device
> > > > > > 6.- guest acks
> > > > > > 7.- stop vm and migrate rest of state
> > > > > > 
> > > > > > 
> > > > > > It will already be a win since hot unplug after migration starts and
> > > > > > most memory has been migrated is better than hot unplug before 
> > > > > > migration
> > > > > > starts.
> > > > > > 
> > > > > > Then measure downtime and profile. Then we can look at ways
> > > > > > to quiesce device faster which really means step 5 is replaced
> > > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > > all memory mapped for write by device".
> > > > > 
> > > > > 
> > > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > > of the world; that's something we don't want to change.
> > > > > 
> > > > > Dave
> > > > 
> > > > It might but if you store the IP and restore it quickly
> > > > after migration e.g. using guest agent, as opposed to DHCP,
> > > > then it won't.
> > > 
> > > I thought if you hot-unplug then it will lose any outstanding connections
> > > on that device.
> > > 
> > > > It allows calming the device down in a generic way,
> > > > specific drivers can then implement the fast quiesce.
> > > 
> > > Except that if it breaks the guest networking it's useless.
> > > 
> > > Dave
> > 
> > Is hot unplug useless then?
> 
> As a migration hack, yes,

Based on a premise that it breaks connections but it does not
have to.

> unless it's paired with a second network device
> as a redundent route.

You can do this too.

But this is not a must at all.

> To do what's being suggested here it's got to be done at the device level
> and not visible to the networking stack.
> 
> Dave

Need for this was never demonstrated.

> > 
> > > > 
> > > > > > 
> > > > > > -- 
> > > > > > MST
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 12:43:03PM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > > >> The two mechanisms referenced above would likely require 
> > > > > > > >> coordination with
> > > > > > > >> QEMU and as such are open to discussion.  I haven't attempted 
> > > > > > > >> to address
> > > > > > > >> them as I am not sure there is a consensus as of yet.  My 
> > > > > > > >> personal
> > > > > > > >> preference would be to add a vendor-specific configuration 
> > > > > > > >> block to the
> > > > > > > >> emulated pci-bridge interfaces created by QEMU that would 
> > > > > > > >> allow us to
> > > > > > > >> essentially extend shpc to support guest live migration with 
> > > > > > > >> pass-through
> > > > > > > >> devices.
> > > > > > > >
> > > > > > > > shpc?
> > > > > > > 
> > > > > > > That is kind of what I was thinking.  We basically need some 
> > > > > > > mechanism
> > > > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > > since I know ACPI is used by QEMU to manage hot-plug in the 
> > > > > > > standard
> > > > > > > case.
> > > > > > > 
> > > > > > > - Alex
> > > > > > 
> > > > > > 
> > > > > > Start by using hot-unplug for this!
> > > > > > 
> > > > > > Really use your patch guest side, and write host side
> > > > > > to allow starting migration with the device, but
> > > > > > defer completing it.
> > > > > > 
> > > > > > So
> > > > > > 
> > > > > > 1.- host tells guest to start tracking memory writes
> > > > > > 2.- guest acks
> > > > > > 3.- migration starts
> > > > > > 4.- most memory is migrated
> > > > > > 5.- host tells guest to eject device
> > > > > > 6.- guest acks
> > > > > > 7.- stop vm and migrate rest of state
> > > > > > 
> > > > > > 
> > > > > > It will already be a win since hot unplug after migration starts and
> > > > > > most memory has been migrated is better than hot unplug before 
> > > > > > migration
> > > > > > starts.
> > > > > > 
> > > > > > Then measure downtime and profile. Then we can look at ways
> > > > > > to quiesce device faster which really means step 5 is replaced
> > > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > > all memory mapped for write by device".
> > > > > 
> > > > > 
> > > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > > of the world; that's something we don't want to change.
> > > > > 
> > > > > Dave
> > > > 
> > > > It might but if you store the IP and restore it quickly
> > > > after migration e.g. using guest agent, as opposed to DHCP,
> > > > then it won't.
> > > 
> > > I thought if you hot-unplug then it will lose any outstanding connections
> > > on that device.
> > 
> > Which connections and which device?  TCP connections and an ethernet
> > device?  These are on different layers so of course you don't lose them.
> > Just do not change the IP address.
> > 
> > Some guests send a signal to applications to close connections
> > when all links go down. One can work around this
> > in a variety of ways.
> 
> So, OK, I was surprised that a simple connection didn't go down when
> I tested and just removed the network card; I'd thought stuff was more
> aggressive when there was no route.
> But as you say, some stuff does close connections when the links go down/away
> so we do need to work around that; and any new outgoing connections get
> a 'no route to host'.


You can create a dummy device in guest for the duration of migration.
Use guest agent to move IP address there and that should be enough to trick 
most guests.


>  So I'm still nervous what will break.
> 
> Dave

I'm not saying nothing breaks.  Far being from it.  For example, some NAT
or firewall implementations keep state per interface and these might
lose state (if using NAT/stateful firewall within guest).


So yes it *would* be useful to teach guests, for example, that a device
is "not dead, just resting" and that another device will shortly come
and take its place.


But the simple setup is already useful and worth supporting, and merging
things gradually will help this project finally get off the ground.


> > 
> > > > It allows calming the device down in a generic way,
> > > > specific drivers can then implement the fast quiesce.
> > > 
> > > Except that if it breaks the guest networking it's useless.
> > > 
> > > Dave
> > > 
> > > > 
> > > > > > 
> > > > > > -- 
> > > > > > MST
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > > --
> > > Dr. David Alan Gilbert / 

Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-05 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Tue, Jan 05, 2016 at 10:45:25AM +, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Tue, Jan 05, 2016 at 10:01:04AM +, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > >> The two mechanisms referenced above would likely require 
> > > > > > >> coordination with
> > > > > > >> QEMU and as such are open to discussion.  I haven't attempted to 
> > > > > > >> address
> > > > > > >> them as I am not sure there is a consensus as of yet.  My 
> > > > > > >> personal
> > > > > > >> preference would be to add a vendor-specific configuration block 
> > > > > > >> to the
> > > > > > >> emulated pci-bridge interfaces created by QEMU that would allow 
> > > > > > >> us to
> > > > > > >> essentially extend shpc to support guest live migration with 
> > > > > > >> pass-through
> > > > > > >> devices.
> > > > > > >
> > > > > > > shpc?
> > > > > > 
> > > > > > That is kind of what I was thinking.  We basically need some 
> > > > > > mechanism
> > > > > > to allow for the host to ask the device to quiesce.  It has been
> > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > > case.
> > > > > > 
> > > > > > - Alex
> > > > > 
> > > > > 
> > > > > Start by using hot-unplug for this!
> > > > > 
> > > > > Really use your patch guest side, and write host side
> > > > > to allow starting migration with the device, but
> > > > > defer completing it.
> > > > > 
> > > > > So
> > > > > 
> > > > > 1.- host tells guest to start tracking memory writes
> > > > > 2.- guest acks
> > > > > 3.- migration starts
> > > > > 4.- most memory is migrated
> > > > > 5.- host tells guest to eject device
> > > > > 6.- guest acks
> > > > > 7.- stop vm and migrate rest of state
> > > > > 
> > > > > 
> > > > > It will already be a win since hot unplug after migration starts and
> > > > > most memory has been migrated is better than hot unplug before 
> > > > > migration
> > > > > starts.
> > > > > 
> > > > > Then measure downtime and profile. Then we can look at ways
> > > > > to quiesce device faster which really means step 5 is replaced
> > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > all memory mapped for write by device".
> > > > 
> > > > 
> > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > of the world; that's something we don't want to change.
> > > > 
> > > > Dave
> > > 
> > > It might but if you store the IP and restore it quickly
> > > after migration e.g. using guest agent, as opposed to DHCP,
> > > then it won't.
> > 
> > I thought if you hot-unplug then it will lose any outstanding connections
> > on that device.
> 
> Which connections and which device?  TCP connections and an ethernet
> device?  These are on different layers so of course you don't lose them.
> Just do not change the IP address.
> 
> Some guests send a signal to applications to close connections
> when all links go down. One can work around this
> in a variety of ways.

So, OK, I was surprised that a simple connection didn't go down when
I tested and just removed the network card; I'd thought stuff was more
aggressive when there was no route.
But as you say, some stuff does close connections when the links go down/away
so we do need to work around that; and any new outgoing connections get
a 'no route to host'.  So I'm still nervous what will break.

Dave

> 
> > > It allows calming the device down in a generic way,
> > > specific drivers can then implement the fast quiesce.
> > 
> > Except that if it breaks the guest networking it's useless.
> > 
> > Dave
> > 
> > > 
> > > > > 
> > > > > -- 
> > > > > MST
> > > > --
> > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2016-01-04 Thread Alexander Duyck
On Mon, Jan 4, 2016 at 12:41 PM, Konrad Rzeszutek Wilk
 wrote:
> On Sun, Dec 13, 2015 at 01:28:09PM -0800, Alexander Duyck wrote:
>> This patch set is meant to be the guest side code for a proof of concept
>> involving leaving pass-through devices in the guest during the warm-up
>> phase of guest live migration.  In order to accomplish this I have added a
>
> What does that mean? 'warm-up-phase'?

It is the first phase in a pre-copy migration.
https://en.wikipedia.org/wiki/Live_migration

Basically in this phase all the memory is marked as dirty and then
copied.  Any memory that changes gets marked as dirty as well.
Currently DMA circumvents this as the user space dirty page tracking
isn't able to track DMA.

>> new function called dma_mark_dirty that will mark the pages associated with
>> the DMA transaction as dirty in the case of either an unmap or a
>> sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
>> DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
>> the stop-and-copy phase, however allowing the device to be present should
>> significantly improve the performance of the guest during the warm-up
>> period.
>
> .. if the warm-up phase is short I presume? If the warm-up phase takes
> a long time (busy guest that is of 1TB size) it wouldn't help much as the
> tracking of these DMA's may be quite long?
>
>>
>> This current implementation is very preliminary and there are number of
>> items still missing.  Specifically in order to make this a more complete
>> solution we need to support:
>> 1.  Notifying hypervisor that drivers are dirtying DMA pages received
>
> .. And somehow giving the hypervisor the GPFN so it can retain the PFN in
> the VT-d as long as possible.

Yes, what has happened is that the host went through and marked all
memory as read-only.  So trying to do any operation that requires
write access triggers a page fault which is then used by the host to
track pages that were dirtied.

>> 2.  Bypassing page dirtying when it is not needed.
>
> How would this work with with device doing DMA operations _after_ the 
> migration?
> That is the driver submits and DMA READ.. migrates away, device is unplugged,
> VT-d context is torn down - device does the DMA READ gets an VT-d error...
>
> and what then? How should the device on the other host replay the DMA READ?

The device has to quiesce before the migration can occur.  We cannot
have any DMA mappings still open when we reach the stop-and-copy phase
of the migration.  The solution I have proposed here works for
streaming mappings but doesn't solve the case for things like
dma_alloc_coherent where a bidirectional mapping is maintained between
the CPU and the device.

>>
>> The two mechanisms referenced above would likely require coordination with
>> QEMU and as such are open to discussion.  I haven't attempted to address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with pass-through
>> devices.
>
> shpc?

That is kind of what I was thinking.  We basically need some mechanism
to allow for the host to ask the device to quiesce.  It has been
proposed to possibly even look at something like an ACPI interface
since I know ACPI is used by QEMU to manage hot-plug in the standard
case.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-14 Thread Michael S. Tsirkin
On Mon, Dec 14, 2015 at 03:20:26PM +0800, Yang Zhang wrote:
> On 2015/12/14 13:46, Alexander Duyck wrote:
> >On Sun, Dec 13, 2015 at 9:22 PM, Yang Zhang  wrote:
> >>On 2015/12/14 12:54, Alexander Duyck wrote:
> >>>
> >>>On Sun, Dec 13, 2015 at 6:27 PM, Yang Zhang 
> >>>wrote:
> 
> On 2015/12/14 5:28, Alexander Duyck wrote:
> >
> >
> >This patch set is meant to be the guest side code for a proof of concept
> >involving leaving pass-through devices in the guest during the warm-up
> >phase of guest live migration.  In order to accomplish this I have added
> >a
> >new function called dma_mark_dirty that will mark the pages associated
> >with
> >the DMA transaction as dirty in the case of either an unmap or a
> >sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
> >DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
> >the stop-and-copy phase, however allowing the device to be present
> >should
> >significantly improve the performance of the guest during the warm-up
> >period.
> >
> >This current implementation is very preliminary and there are number of
> >items still missing.  Specifically in order to make this a more complete
> >solution we need to support:
> >1.  Notifying hypervisor that drivers are dirtying DMA pages received
> >2.  Bypassing page dirtying when it is not needed.
> >
> 
> Shouldn't current log dirty mechanism already cover them?
> >>>
> >>>
> >>>The guest has no way of currently knowing that the hypervisor is doing
> >>>dirty page logging, and the log dirty mechanism currently has no way
> >>>of tracking device DMA accesses.  This change is meant to bridge the
> >>>two so that the guest device driver will force the SWIOTLB DMA API to
> >>>mark pages written to by the device as dirty.
> >>
> >>
> >>OK. This is what we called "dummy write mechanism". Actually, this is just a
> >>workaround before iommu dirty bit ready. Eventually, we need to change to
> >>use the hardware dirty bit. Besides, we may still lost the data if dma
> >>happens during/just before stop and copy phase.
> >
> >Right, this is a "dummy write mechanism" in order to allow for entry
> >tracking.  This only works completely if we force the hardware to
> >quiesce via a hot-plug event before we reach the stop-and-copy phase
> >of the migration.
> >
> >The IOMMU dirty bit approach is likely going to have a significant
> >number of challenges involved.  Looking over the driver and the data
> >sheet it looks like the current implementation is using a form of huge
> >pages in the IOMMU, as such we will need to tear that down and replace
> >it with 4K pages if we don't want to dirty large regions with each DMA
> 
> Yes, we need to split the huge page into small pages to get the small dirty
> range.
> 
> >transaction, and I'm not sure that is something we can change while
> >DMA is active to the affected regions.  In addition the data sheet
> 
> what changes do you mean?
> 
> >references the fact that the page table entries are stored in a
> >translation cache and in order to sync things up you have to
> >invalidate the entries.  I'm not sure what the total overhead would be
> >for invalidating something like a half million 4K pages to migrate a
> >guest with just 2G of RAM, but I would think that might be a bit
> 
> Do you mean the cost of submit the flush request or the performance
> impaction due to IOTLB miss? For the former, we have domain-selective
> invalidation. For the latter, it would be acceptable since live migration
> shouldn't last too long.

That's pretty weak - if migration time is short and speed does not
matter during migration, then all this work is useless, temporarily
switching to a virtual card would be preferable.

> >expensive given the fact that IOMMU accesses aren't known for being
> >incredibly fast when invalidating DMA on the host.
> >
> >- Alex
> >
> 
> 
> -- 
> best regards
> yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-13 Thread Yang Zhang

On 2015/12/14 5:28, Alexander Duyck wrote:

This patch set is meant to be the guest side code for a proof of concept
involving leaving pass-through devices in the guest during the warm-up
phase of guest live migration.  In order to accomplish this I have added a
new function called dma_mark_dirty that will mark the pages associated with
the DMA transaction as dirty in the case of either an unmap or a
sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
the stop-and-copy phase, however allowing the device to be present should
significantly improve the performance of the guest during the warm-up
period.

This current implementation is very preliminary and there are number of
items still missing.  Specifically in order to make this a more complete
solution we need to support:
1.  Notifying hypervisor that drivers are dirtying DMA pages received
2.  Bypassing page dirtying when it is not needed.



Shouldn't current log dirty mechanism already cover them?



The two mechanisms referenced above would likely require coordination with
QEMU and as such are open to discussion.  I haven't attempted to address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with pass-through
devices.

The functionality in this patch set is currently disabled by default.  To
enable it you can select "SWIOTLB page dirtying" from the "Processor type
and features" menu.


Only SWIOTLB is supported?



---

Alexander Duyck (3):
   swiotlb: Fold static unmap and sync calls into calling functions
   xen/swiotlb: Fold static unmap and sync calls into calling functions
   x86: Create dma_mark_dirty to dirty pages used for DMA by VM guest


  arch/arm/include/asm/dma-mapping.h   |3 +
  arch/arm64/include/asm/dma-mapping.h |5 +-
  arch/ia64/include/asm/dma.h  |1
  arch/mips/include/asm/dma-mapping.h  |1
  arch/powerpc/include/asm/swiotlb.h   |1
  arch/tile/include/asm/dma-mapping.h  |1
  arch/unicore32/include/asm/dma-mapping.h |1
  arch/x86/Kconfig |   11 
  arch/x86/include/asm/swiotlb.h   |   26 
  drivers/xen/swiotlb-xen.c|   92 +-
  lib/swiotlb.c|   83 ---
  11 files changed, 123 insertions(+), 102 deletions(-)

--




--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-13 Thread Alexander Duyck
On Sun, Dec 13, 2015 at 6:27 PM, Yang Zhang  wrote:
> On 2015/12/14 5:28, Alexander Duyck wrote:
>>
>> This patch set is meant to be the guest side code for a proof of concept
>> involving leaving pass-through devices in the guest during the warm-up
>> phase of guest live migration.  In order to accomplish this I have added a
>> new function called dma_mark_dirty that will mark the pages associated
>> with
>> the DMA transaction as dirty in the case of either an unmap or a
>> sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
>> DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
>> the stop-and-copy phase, however allowing the device to be present should
>> significantly improve the performance of the guest during the warm-up
>> period.
>>
>> This current implementation is very preliminary and there are number of
>> items still missing.  Specifically in order to make this a more complete
>> solution we need to support:
>> 1.  Notifying hypervisor that drivers are dirtying DMA pages received
>> 2.  Bypassing page dirtying when it is not needed.
>>
>
> Shouldn't current log dirty mechanism already cover them?

The guest has no way of currently knowing that the hypervisor is doing
dirty page logging, and the log dirty mechanism currently has no way
of tracking device DMA accesses.  This change is meant to bridge the
two so that the guest device driver will force the SWIOTLB DMA API to
mark pages written to by the device as dirty.

>> The two mechanisms referenced above would likely require coordination with
>> QEMU and as such are open to discussion.  I haven't attempted to address
>> them as I am not sure there is a consensus as of yet.  My personal
>> preference would be to add a vendor-specific configuration block to the
>> emulated pci-bridge interfaces created by QEMU that would allow us to
>> essentially extend shpc to support guest live migration with pass-through
>> devices.
>>
>> The functionality in this patch set is currently disabled by default.  To
>> enable it you can select "SWIOTLB page dirtying" from the "Processor type
>> and features" menu.
>
>
> Only SWIOTLB is supported?

Yes.  For right now this only supports SWIOTLB.  The assumption here
is that SWIOTLB is in use for most cases where an IOMMU is not
present.  If an IOMMU is present in a virtualized guest then most
likely the IOMMU might be able to provide a separate mechanism for
dirty page tracking.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-13 Thread Yang Zhang

On 2015/12/14 12:54, Alexander Duyck wrote:

On Sun, Dec 13, 2015 at 6:27 PM, Yang Zhang  wrote:

On 2015/12/14 5:28, Alexander Duyck wrote:


This patch set is meant to be the guest side code for a proof of concept
involving leaving pass-through devices in the guest during the warm-up
phase of guest live migration.  In order to accomplish this I have added a
new function called dma_mark_dirty that will mark the pages associated
with
the DMA transaction as dirty in the case of either an unmap or a
sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
the stop-and-copy phase, however allowing the device to be present should
significantly improve the performance of the guest during the warm-up
period.

This current implementation is very preliminary and there are number of
items still missing.  Specifically in order to make this a more complete
solution we need to support:
1.  Notifying hypervisor that drivers are dirtying DMA pages received
2.  Bypassing page dirtying when it is not needed.



Shouldn't current log dirty mechanism already cover them?


The guest has no way of currently knowing that the hypervisor is doing
dirty page logging, and the log dirty mechanism currently has no way
of tracking device DMA accesses.  This change is meant to bridge the
two so that the guest device driver will force the SWIOTLB DMA API to
mark pages written to by the device as dirty.


OK. This is what we called "dummy write mechanism". Actually, this is 
just a workaround before iommu dirty bit ready. Eventually, we need to 
change to use the hardware dirty bit. Besides, we may still lost the 
data if dma happens during/just before stop and copy phase.





The two mechanisms referenced above would likely require coordination with
QEMU and as such are open to discussion.  I haven't attempted to address
them as I am not sure there is a consensus as of yet.  My personal
preference would be to add a vendor-specific configuration block to the
emulated pci-bridge interfaces created by QEMU that would allow us to
essentially extend shpc to support guest live migration with pass-through
devices.

The functionality in this patch set is currently disabled by default.  To
enable it you can select "SWIOTLB page dirtying" from the "Processor type
and features" menu.



Only SWIOTLB is supported?


Yes.  For right now this only supports SWIOTLB.  The assumption here
is that SWIOTLB is in use for most cases where an IOMMU is not
present.  If an IOMMU is present in a virtualized guest then most
likely the IOMMU might be able to provide a separate mechanism for
dirty page tracking.

- Alex




--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-13 Thread Alexander Duyck
On Sun, Dec 13, 2015 at 9:22 PM, Yang Zhang  wrote:
> On 2015/12/14 12:54, Alexander Duyck wrote:
>>
>> On Sun, Dec 13, 2015 at 6:27 PM, Yang Zhang 
>> wrote:
>>>
>>> On 2015/12/14 5:28, Alexander Duyck wrote:


 This patch set is meant to be the guest side code for a proof of concept
 involving leaving pass-through devices in the guest during the warm-up
 phase of guest live migration.  In order to accomplish this I have added
 a
 new function called dma_mark_dirty that will mark the pages associated
 with
 the DMA transaction as dirty in the case of either an unmap or a
 sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
 DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
 the stop-and-copy phase, however allowing the device to be present
 should
 significantly improve the performance of the guest during the warm-up
 period.

 This current implementation is very preliminary and there are number of
 items still missing.  Specifically in order to make this a more complete
 solution we need to support:
 1.  Notifying hypervisor that drivers are dirtying DMA pages received
 2.  Bypassing page dirtying when it is not needed.

>>>
>>> Shouldn't current log dirty mechanism already cover them?
>>
>>
>> The guest has no way of currently knowing that the hypervisor is doing
>> dirty page logging, and the log dirty mechanism currently has no way
>> of tracking device DMA accesses.  This change is meant to bridge the
>> two so that the guest device driver will force the SWIOTLB DMA API to
>> mark pages written to by the device as dirty.
>
>
> OK. This is what we called "dummy write mechanism". Actually, this is just a
> workaround before iommu dirty bit ready. Eventually, we need to change to
> use the hardware dirty bit. Besides, we may still lost the data if dma
> happens during/just before stop and copy phase.

Right, this is a "dummy write mechanism" in order to allow for entry
tracking.  This only works completely if we force the hardware to
quiesce via a hot-plug event before we reach the stop-and-copy phase
of the migration.

The IOMMU dirty bit approach is likely going to have a significant
number of challenges involved.  Looking over the driver and the data
sheet it looks like the current implementation is using a form of huge
pages in the IOMMU, as such we will need to tear that down and replace
it with 4K pages if we don't want to dirty large regions with each DMA
transaction, and I'm not sure that is something we can change while
DMA is active to the affected regions.  In addition the data sheet
references the fact that the page table entries are stored in a
translation cache and in order to sync things up you have to
invalidate the entries.  I'm not sure what the total overhead would be
for invalidating something like a half million 4K pages to migrate a
guest with just 2G of RAM, but I would think that might be a bit
expensive given the fact that IOMMU accesses aren't known for being
incredibly fast when invalidating DMA on the host.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking

2015-12-13 Thread Yang Zhang

On 2015/12/14 13:46, Alexander Duyck wrote:

On Sun, Dec 13, 2015 at 9:22 PM, Yang Zhang  wrote:

On 2015/12/14 12:54, Alexander Duyck wrote:


On Sun, Dec 13, 2015 at 6:27 PM, Yang Zhang 
wrote:


On 2015/12/14 5:28, Alexander Duyck wrote:



This patch set is meant to be the guest side code for a proof of concept
involving leaving pass-through devices in the guest during the warm-up
phase of guest live migration.  In order to accomplish this I have added
a
new function called dma_mark_dirty that will mark the pages associated
with
the DMA transaction as dirty in the case of either an unmap or a
sync_.*_for_cpu where the DMA direction is either DMA_FROM_DEVICE or
DMA_BIDIRECTIONAL.  The pass-through device must still be removed before
the stop-and-copy phase, however allowing the device to be present
should
significantly improve the performance of the guest during the warm-up
period.

This current implementation is very preliminary and there are number of
items still missing.  Specifically in order to make this a more complete
solution we need to support:
1.  Notifying hypervisor that drivers are dirtying DMA pages received
2.  Bypassing page dirtying when it is not needed.



Shouldn't current log dirty mechanism already cover them?



The guest has no way of currently knowing that the hypervisor is doing
dirty page logging, and the log dirty mechanism currently has no way
of tracking device DMA accesses.  This change is meant to bridge the
two so that the guest device driver will force the SWIOTLB DMA API to
mark pages written to by the device as dirty.



OK. This is what we called "dummy write mechanism". Actually, this is just a
workaround before iommu dirty bit ready. Eventually, we need to change to
use the hardware dirty bit. Besides, we may still lost the data if dma
happens during/just before stop and copy phase.


Right, this is a "dummy write mechanism" in order to allow for entry
tracking.  This only works completely if we force the hardware to
quiesce via a hot-plug event before we reach the stop-and-copy phase
of the migration.

The IOMMU dirty bit approach is likely going to have a significant
number of challenges involved.  Looking over the driver and the data
sheet it looks like the current implementation is using a form of huge
pages in the IOMMU, as such we will need to tear that down and replace
it with 4K pages if we don't want to dirty large regions with each DMA


Yes, we need to split the huge page into small pages to get the small 
dirty range.



transaction, and I'm not sure that is something we can change while
DMA is active to the affected regions.  In addition the data sheet


what changes do you mean?


references the fact that the page table entries are stored in a
translation cache and in order to sync things up you have to
invalidate the entries.  I'm not sure what the total overhead would be
for invalidating something like a half million 4K pages to migrate a
guest with just 2G of RAM, but I would think that might be a bit


Do you mean the cost of submit the flush request or the performance 
impaction due to IOTLB miss? For the former, we have domain-selective 
invalidation. For the latter, it would be acceptable since live 
migration shouldn't last too long.



expensive given the fact that IOMMU accesses aren't known for being
incredibly fast when invalidating DMA on the host.

- Alex




--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html