Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation

2015-05-07 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> > > > > Thanks Dave, I will retry according to your suggestion.
> > > >
> > > > Did that work for you?
> > > >
> > >
> > > Yes, it works.
> > 
> > Great.
> > 
> > > Bye the way, I found that the source guest will resume after about 15
> > > minuets if there are some network errors happened during post copy. Is it
> > the expected behavior?
> > > And have you any plan about handing such errors?
> > 
> > Interesting; it shouldn't do that.  I think it's best for the source to 
> > stay in
> > paused following an error.  Were you driving it directly or via libvirt?
> > 
> 
> Drive it directly.

OK, thanks, I'll have a look at it.

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation

2015-05-06 Thread Li, Liang Z
> > > > Thanks Dave, I will retry according to your suggestion.
> > >
> > > Did that work for you?
> > >
> >
> > Yes, it works.
> 
> Great.
> 
> > Bye the way, I found that the source guest will resume after about 15
> > minuets if there are some network errors happened during post copy. Is it
> the expected behavior?
> > And have you any plan about handing such errors?
> 
> Interesting; it shouldn't do that.  I think it's best for the source to stay 
> in
> paused following an error.  Were you driving it directly or via libvirt?
> 

Drive it directly.




Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation

2015-04-29 Thread Li, Liang Z
> * Li, Liang Z (liang.z...@intel.com) wrote:
> > Hi David,
> >
> > I have tired your v6 postcopy patches and found it doesn't work. When
> > I tried to start the postcopy in live migration, some errors were printed. I
> just did the following things:
> >
> > On destination side, started the qemu like this:
> >
> > /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-
> x86_64
> > -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> > -monitor stdio -incoming tcp:0:
> >
> > On source side, started the qemu like this:
> >
> > /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-
> x86_64
> > -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> > -monitor stdio
> >
> > and then
> > (qemu) migrate_set_capability x-postcopy-ram on
> >
> > When I started the post copy with
> > (qemu) migrate -d tcp:localhost:
> >
> > I got the error message on the source side:
> >
> > (qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for
> (131552/-1)
> >  qemu-system-x86_64: RP: Received invalid message 0x length
> 0x
> >
> > and the following error on the destination side:
> >
> > (qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS
> > support
> > qemu-system-x86_64: load of migration failed: Operation not permitted
> 
> OK, the important error here is:
>postcopy_ram_supported_by_host: No OS support
> 
> that's saying that the destination OS either:
>1) The kernel isn't the correct kernel with Andrea's userfault code 
> compiled
> in
>   (check that userfaultfd is configured into the kernel as well)
>2) That when you built the QEMU it didn't find the syscall definition for 
> the
>   userfaultfd in the header as it compiled it.
> 
> I think from that error it is (2) - so make sure that when you built the qemu
> that you're using the headers from that kernel, or use the extra-cflags hack
> that I mentioned in the cover letter.
> 
> Note that you need to use the kernel tree which I point to in the first
> message.
> (The older kernel from v5 wont work).
> 

Thanks Dave, I will retry according to your suggestion.

> Dave
> P.S. I'm on holiday this week, so not checking work email much.
> 
> >
> >
> > the dmesg printed:
> > [  233.456545] kvm: zapping shadow pages for mmio generation
> > wraparound [  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr:
> > 0xc1 data 0xabcd
> >
> >
> > The v5 patches have no such errors. Do you have any suggestion?
> >
> > Liang
> >
> >
> > > -Original Message-
> > > From: qemu-devel-bounces+liang.z.li=intel@nongnu.org
> > > [mailto:qemu-
> > > devel-bounces+liang.z.li=intel@nongnu.org] On Behalf Of Dr.
> > > devel-bounces+David Alan
> > > Gilbert (git)
> > > Sent: Wednesday, April 15, 2015 1:03 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: aarca...@redhat.com; yamah...@private.email.ne.jp;
> > > quint...@redhat.com; amit.s...@redhat.com; pbonz...@redhat.com;
> > > da...@gibson.dropbear.id.au; yayan...@cn.fujitsu.com
> > > Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> > >
> > > From: "Dr. David Alan Gilbert" 
> > >
> > >   This is the 6th cut of my version of postcopy; it is designed for
> > > use with the Linux kernel additions posted by Andrea Arcangeli here:
> > >
> > > git clone --reference linux -b userfault18
> > > git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> > >
> > > (Note this is a different API from the last version)
> > >
> > > This qemu series can be found at:
> > >
> > > https://github.com/orbitfp7/qemu.git
> > > on the wp3-postcopy-v6 tag.
> > >
> > > It addresses some but not yet all of the previous review comments;
> > > however there are a couple of large simplifications, so it seems
> > > worth posting to meet the new kernel API and to stop people reviewing
> deadcode.
> > >
> > > Note: That the userfaultfd.h header is no longer included in this
> > > tree:
> > >   - if you're building with the appropriate kernel headers it should 
> > > find it
> > >   - if you're building on a host that doesn't have the kernel headers
> > > installed in the right place then:
> > >configure with:   --extra-cflags="-D__NR_userfaultfd=323"
> > >cp include/uapi/linux/userfaultfd.h into somewhere in the 
> > > include
> > >path, e.g.  /usr/local/include/linux
> > >
> > > v6
> > >   Removed the PMI bitmaps
> > >   - Andrea updated the kernel API so that userspace doesn't
> > > need to do wakeups, and thus QEMU doesn't need to keep
> > > track of which pages it's received; there is a price - which
> > > is we end up sending more dupes to the source, but it simplifies
> > > stuff a lot and makes the normal paths a lot quicker.
> > > (10s of line change in kernel, 10%-ish simplification in this 
> > > code!)
> > >   Changed discard message format to a simpler start/end address scheme
> > > and rework discard and chunking code 

Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation

2015-04-29 Thread Dr. David Alan Gilbert
* Li, Liang Z (liang.z...@intel.com) wrote:
> Hi David,
> 
> I have tired your v6 postcopy patches and found it doesn't work. When I tried 
> to start the 
> postcopy in live migration, some errors were printed. I just did the 
> following things:
> 
> On destination side, started the qemu like this:
> 
> /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
> -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> -monitor stdio -incoming tcp:0:
> 
> On source side, started the qemu like this:
> 
> /root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
> -enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
> -monitor stdio
> 
> and then
> (qemu) migrate_set_capability x-postcopy-ram on
> 
> When I started the post copy with
> (qemu) migrate -d tcp:localhost:
> 
> I got the error message on the source side:
> 
> (qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for (131552/-1)
>  qemu-system-x86_64: RP: Received invalid message 0x length0x
> 
> and the following error on the destination side:
> 
> (qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS support
> qemu-system-x86_64: load of migration failed: Operation not permitted

OK, the important error here is:
   postcopy_ram_supported_by_host: No OS support

that's saying that the destination OS either:
   1) The kernel isn't the correct kernel with Andrea's userfault code compiled 
in
  (check that userfaultfd is configured into the kernel as well)
   2) That when you built the QEMU it didn't find the syscall definition for the
  userfaultfd in the header as it compiled it.

I think from that error it is (2) - so make sure that when you built the qemu
that you're using the headers from that kernel, or use the extra-cflags hack
that I mentioned in the cover letter.

Note that you need to use the kernel tree which I point to in the first message.
(The older kernel from v5 wont work).

Dave
P.S. I'm on holiday this week, so not checking work email much.

> 
> 
> the dmesg printed:
> [  233.456545] kvm: zapping shadow pages for mmio generation wraparound
> [  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
> 
> 
> The v5 patches have no such errors. Do you have any suggestion?
> 
> Liang
> 
> 
> > -Original Message-
> > From: qemu-devel-bounces+liang.z.li=intel@nongnu.org [mailto:qemu-
> > devel-bounces+liang.z.li=intel@nongnu.org] On Behalf Of Dr. David Alan
> > Gilbert (git)
> > Sent: Wednesday, April 15, 2015 1:03 AM
> > To: qemu-devel@nongnu.org
> > Cc: aarca...@redhat.com; yamah...@private.email.ne.jp;
> > quint...@redhat.com; amit.s...@redhat.com; pbonz...@redhat.com;
> > da...@gibson.dropbear.id.au; yayan...@cn.fujitsu.com
> > Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> > 
> > From: "Dr. David Alan Gilbert" 
> > 
> >   This is the 6th cut of my version of postcopy; it is designed for use 
> > with the
> > Linux kernel additions posted by Andrea Arcangeli here:
> > 
> > git clone --reference linux -b userfault18
> > git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> > 
> > (Note this is a different API from the last version)
> > 
> > This qemu series can be found at:
> > 
> > https://github.com/orbitfp7/qemu.git
> > on the wp3-postcopy-v6 tag.
> > 
> > It addresses some but not yet all of the previous review comments; however
> > there are a couple of large simplifications, so it seems worth posting to 
> > meet
> > the new kernel API and to stop people reviewing deadcode.
> > 
> > Note: That the userfaultfd.h header is no longer included in this
> > tree:
> >   - if you're building with the appropriate kernel headers it should 
> > find it
> >   - if you're building on a host that doesn't have the kernel headers
> > installed in the right place then:
> >configure with:   --extra-cflags="-D__NR_userfaultfd=323"
> >cp include/uapi/linux/userfaultfd.h into somewhere in the include
> >path, e.g.  /usr/local/include/linux
> > 
> > v6
> >   Removed the PMI bitmaps
> >   - Andrea updated the kernel API so that userspace doesn't
> > need to do wakeups, and thus QEMU doesn't need to keep
> > track of which pages it's received; there is a price - which
> > is we end up sending more dupes to the source, but it simplifies
> > stuff a lot and makes the normal paths a lot quicker.
> > (10s of line change in kernel, 10%-ish simplification in this code!)
> >   Changed discard message format to a simpler start/end address scheme
> > and rework discard and chunking code to work in long's to match 
> > bitmap
> >   'qemu_get_buffer_less_copy' for postcopy pages
> >   - avoids a userspace copy since the kernel now does it
> >   - the new qemufile interface might also be useful for other places 
> > that
> > don't need a copy (maybe xbzrle?)
> >   Changed the bloc

Re: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation

2015-04-27 Thread Li, Liang Z
Hi David,

I have tired your v6 postcopy patches and found it doesn't work. When I tried 
to start the 
postcopy in live migration, some errors were printed. I just did the following 
things:

On destination side, started the qemu like this:

/root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
-enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
-monitor stdio -incoming tcp:0:

On source side, started the qemu like this:

/root/vt-sync/post_copy_v6_qemu.git/x86_64-softmmu/qemu-system-x86_64
-enable-kvm -smp 2 -m 1024 -net none /mnt/jinshi_ia32e_rhel6u5.qcow2
-monitor stdio

and then
(qemu) migrate_set_capability x-postcopy-ram on

When I started the post copy with
(qemu) migrate -d tcp:localhost:

I got the error message on the source side:

(qemu) qemu-system-x86_64: socket_writev_buffer: Got err=104 for (131552/-1)
 qemu-system-x86_64: RP: Received invalid message 0x length0x

and the following error on the destination side:

(qemu) qemu-system-x86_64: postcopy_ram_supported_by_host: No OS support
qemu-system-x86_64: load of migration failed: Operation not permitted


the dmesg printed:
[  233.456545] kvm: zapping shadow pages for mmio generation wraparound
[  239.785916] kvm [11926]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd


The v5 patches have no such errors. Do you have any suggestion?

Liang


> -Original Message-
> From: qemu-devel-bounces+liang.z.li=intel@nongnu.org [mailto:qemu-
> devel-bounces+liang.z.li=intel@nongnu.org] On Behalf Of Dr. David Alan
> Gilbert (git)
> Sent: Wednesday, April 15, 2015 1:03 AM
> To: qemu-devel@nongnu.org
> Cc: aarca...@redhat.com; yamah...@private.email.ne.jp;
> quint...@redhat.com; amit.s...@redhat.com; pbonz...@redhat.com;
> da...@gibson.dropbear.id.au; yayan...@cn.fujitsu.com
> Subject: [Qemu-devel] [PATCH v6 00/47] Postcopy implementation
> 
> From: "Dr. David Alan Gilbert" 
> 
>   This is the 6th cut of my version of postcopy; it is designed for use with 
> the
> Linux kernel additions posted by Andrea Arcangeli here:
> 
> git clone --reference linux -b userfault18
> git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> 
> (Note this is a different API from the last version)
> 
> This qemu series can be found at:
> 
> https://github.com/orbitfp7/qemu.git
> on the wp3-postcopy-v6 tag.
> 
> It addresses some but not yet all of the previous review comments; however
> there are a couple of large simplifications, so it seems worth posting to meet
> the new kernel API and to stop people reviewing deadcode.
> 
> Note: That the userfaultfd.h header is no longer included in this
> tree:
>   - if you're building with the appropriate kernel headers it should find 
> it
>   - if you're building on a host that doesn't have the kernel headers
> installed in the right place then:
>configure with:   --extra-cflags="-D__NR_userfaultfd=323"
>cp include/uapi/linux/userfaultfd.h into somewhere in the include
>path, e.g.  /usr/local/include/linux
> 
> v6
>   Removed the PMI bitmaps
>   - Andrea updated the kernel API so that userspace doesn't
> need to do wakeups, and thus QEMU doesn't need to keep
> track of which pages it's received; there is a price - which
> is we end up sending more dupes to the source, but it simplifies
> stuff a lot and makes the normal paths a lot quicker.
> (10s of line change in kernel, 10%-ish simplification in this code!)
>   Changed discard message format to a simpler start/end address scheme
> and rework discard and chunking code to work in long's to match bitmap
>   'qemu_get_buffer_less_copy' for postcopy pages
>   - avoids a userspace copy since the kernel now does it
>   - the new qemufile interface might also be useful for other places that
> don't need a copy (maybe xbzrle?)
>   Changed the blockingness of the incoming fd
>   it was incorrectly blocking during the precopy phase after a postcopy 
> was
>   enabled, causing the HMP to be unavailable.  It's now blocking only once
>   the postcopy thread starts up, since it's not a coroutine it can't deal
>   with the yields in qemu_file.
>   An error on the return-path now marks the migration as failed
> 
>   Fixups from Dave Gibson's comments
> Removed can_postcopy, renamed save_complete to
> save_complete_precopy
> added save_complete_postcopy
> Simplified loadvm loop exits
> discard message format changes above
> and many more smaller changes.
> 
>   small fixups for RCU
> 
> 
> This work has been partially funded by the EU Orbit project:
>   see http://www.orbitproject.eu/about/
> 
> TODO:
>   The major work is to rework the page send/receive loops so that supporting
>   larger host pages doesn't make it quite as messy.
> 
> Dr. David Alan Gilbert (47):
>   Start documenting how postcopy works.
>   Split header writing out of qemu_savevm_state_begin
>   q