Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-15 Thread Stefan Hajnoczi
On Thu, Aug 15, 2013 at 10:26:36AM +0800, Wenchao Xia wrote:
 于 2013-8-14 15:53, Stefan Hajnoczi 写道:
  On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com 
  wrote:
  于 2013-8-13 16:21, Stefan Hajnoczi 写道:
 
  On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com
  wrote:
 
  于 2013-8-12 19:33, Stefan Hajnoczi 写道:
 
  On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:
 
 
  --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
  wrote:
 
  The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
  capture the state of guest RAM and then send it back to the parent
  process.  The guest is only paused for a brief instant during fork(2)
  and can continue to run afterwards.
 
 
 
 
  How would you capture the state of emulated hardware which might not
  be in the guest RAM?
 
 
 
  Exactly the same way vmsave works today.  It calls the device's save
  functions which serialize state to file.
 
  The difference between today's vmsave and the fork(2) approach is that
  QEMU does not need to wait for guest RAM to be written to file before
  resuming the guest.
 
  Stefan
 
  I have a worry about what glib says:
 
  On Unix, the GLib mainloop is incompatible with fork(). Any program
  using the mainloop must either exec() or exit() from the child without
  returning to the mainloop. 
 
 
  This is fine, the child just writes out the memory pages and exits.
  It never returns to the glib mainloop.
 
  There is another way to do it: intercept the write in kvm.ko(or other
  kernel code). Since the key is intercept the memory change, we can do
  it in userspace in TCG mode, thus we can add the missing part in KVM
  mode. Another benefit of this way is: the used memory can be
  controlled. For example, with ioctl(), set a buffer of a fixed size
  which keeps the intercepted write data by kernel code, which can avoid
  frequently switch back to user space qemu code. when it is full always
  return back to userspace's qemu code, let qemu code save the data into
  disk. I haven't check the exactly behavior of Intel guest mode about
  how to handle page fault, so can't estimate the performance caused by
  switching of guest mode and root mode, but it should not be worse than
  fork().
 
 
  The fork(2) approach is portable, covers both KVM and TCG, and doesn't
  require kernel changes.  A kvm.ko kernel change also won't be
  supported on existing KVM hosts.  These are big drawbacks and the
  kernel approach would need to be significantly better than plain old
  fork(2) to make it worthwhile.
 
  Stefan
 
 I think advantage is memory usage is predictable, so memory usage
  peak can be avoided, by always save the changed pages first. fork()
  does not know which pages are changed. I am not sure if this would
  be a serious issue when server's memory is consumed much, for example,
  24G host emulate 11G*2 guest to provide powerful virtual server.
  
  Memory usage is predictable but guest uptime is unpredictable because
  it waits until memory is written out.  This defeats the point of
  live savevm.  The guest may be stalled arbitrarily.
  
   I think it is adjustable. There is no much difference with
 fork(), except get more precise control about the changed pages.
   Kernel intercept the change, and stores the changed page in another
 page, similar to fork(). When userspace qemu code execute, save some
 pages to disk. Buffer can be used like some lubricant. When Buffer =
 MAX, it equals to fork(), guest runs more lively. When Buffer = 0,
 guest runs less lively. I think it allows user to find a good balance
 point with a parameter.
   It is harder to implement, just want to show the idea.

You are right.  You could set a bigger buffer size to increase guest
uptime.

  The fork child can minimize the chance of out-of-memory by using
  madvise(MADV_DONTNEED) after pages have been written out.
   It seems no way to make sure the written out page is the changed
 pages, so it have a good chance the written one is the unchanged and
 still used by the other qemu process.

The KVM dirty log tells you which pages were touched.  The fork child
process could give priority to the pages which have been touched by the
guest.  They must be written out and marked madvise(MADV_DONTNEED) as
soon as possible.

I haven't looked at the vmsave data format yet to see if memory pages
can be saved in random order, but this might work.  It reduces the
likelihood of copy-on-write memory growth.

Stefan



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-15 Thread Wenchao Xia

于 2013-8-15 15:49, Stefan Hajnoczi 写道:

On Thu, Aug 15, 2013 at 10:26:36AM +0800, Wenchao Xia wrote:

于 2013-8-14 15:53, Stefan Hajnoczi 写道:

On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:

于 2013-8-13 16:21, Stefan Hajnoczi 写道:


On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com
wrote:


于 2013-8-12 19:33, Stefan Hajnoczi 写道:


On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:



--On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
wrote:


The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
capture the state of guest RAM and then send it back to the parent
process.  The guest is only paused for a brief instant during fork(2)
and can continue to run afterwards.





How would you capture the state of emulated hardware which might not
be in the guest RAM?




Exactly the same way vmsave works today.  It calls the device's save
functions which serialize state to file.

The difference between today's vmsave and the fork(2) approach is that
QEMU does not need to wait for guest RAM to be written to file before
resuming the guest.

Stefan


 I have a worry about what glib says:

On Unix, the GLib mainloop is incompatible with fork(). Any program
using the mainloop must either exec() or exit() from the child without
returning to the mainloop. 



This is fine, the child just writes out the memory pages and exits.
It never returns to the glib mainloop.


 There is another way to do it: intercept the write in kvm.ko(or other
kernel code). Since the key is intercept the memory change, we can do
it in userspace in TCG mode, thus we can add the missing part in KVM
mode. Another benefit of this way is: the used memory can be
controlled. For example, with ioctl(), set a buffer of a fixed size
which keeps the intercepted write data by kernel code, which can avoid
frequently switch back to user space qemu code. when it is full always
return back to userspace's qemu code, let qemu code save the data into
disk. I haven't check the exactly behavior of Intel guest mode about
how to handle page fault, so can't estimate the performance caused by
switching of guest mode and root mode, but it should not be worse than
fork().



The fork(2) approach is portable, covers both KVM and TCG, and doesn't
require kernel changes.  A kvm.ko kernel change also won't be
supported on existing KVM hosts.  These are big drawbacks and the
kernel approach would need to be significantly better than plain old
fork(2) to make it worthwhile.

Stefan


I think advantage is memory usage is predictable, so memory usage
peak can be avoided, by always save the changed pages first. fork()
does not know which pages are changed. I am not sure if this would
be a serious issue when server's memory is consumed much, for example,
24G host emulate 11G*2 guest to provide powerful virtual server.


Memory usage is predictable but guest uptime is unpredictable because
it waits until memory is written out.  This defeats the point of
live savevm.  The guest may be stalled arbitrarily.


   I think it is adjustable. There is no much difference with
fork(), except get more precise control about the changed pages.
   Kernel intercept the change, and stores the changed page in another
page, similar to fork(). When userspace qemu code execute, save some
pages to disk. Buffer can be used like some lubricant. When Buffer =
MAX, it equals to fork(), guest runs more lively. When Buffer = 0,
guest runs less lively. I think it allows user to find a good balance
point with a parameter.
   It is harder to implement, just want to show the idea.


You are right.  You could set a bigger buffer size to increase guest
uptime.


The fork child can minimize the chance of out-of-memory by using
madvise(MADV_DONTNEED) after pages have been written out.

   It seems no way to make sure the written out page is the changed
pages, so it have a good chance the written one is the unchanged and
still used by the other qemu process.


The KVM dirty log tells you which pages were touched.  The fork child
process could give priority to the pages which have been touched by the
guest.  They must be written out and marked madvise(MADV_DONTNEED) as
soon as possible.

  Hmm, if dirty log still works normal in child process to reflect the
memory status in parent not child's, then the problem could be solved
by: when dirty pages is too much, child tell parent to wait some time.
But I haven't check if kvm.ko behaviors like that.



I haven't looked at the vmsave data format yet to see if memory pages
can be saved in random order, but this might work.  It reduces the
likelihood of copy-on-write memory growth.

Stefan




--
Best Regards

Wenchao Xia




Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-14 Thread Stefan Hajnoczi
On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:
 于 2013-8-13 16:21, Stefan Hajnoczi 写道:

 On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com
 wrote:

 于 2013-8-12 19:33, Stefan Hajnoczi 写道:

 On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:


 --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
 wrote:

 The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
 capture the state of guest RAM and then send it back to the parent
 process.  The guest is only paused for a brief instant during fork(2)
 and can continue to run afterwards.




 How would you capture the state of emulated hardware which might not
 be in the guest RAM?



 Exactly the same way vmsave works today.  It calls the device's save
 functions which serialize state to file.

 The difference between today's vmsave and the fork(2) approach is that
 QEMU does not need to wait for guest RAM to be written to file before
 resuming the guest.

 Stefan

I have a worry about what glib says:

 On Unix, the GLib mainloop is incompatible with fork(). Any program
 using the mainloop must either exec() or exit() from the child without
 returning to the mainloop. 


 This is fine, the child just writes out the memory pages and exits.
 It never returns to the glib mainloop.

There is another way to do it: intercept the write in kvm.ko(or other
 kernel code). Since the key is intercept the memory change, we can do
 it in userspace in TCG mode, thus we can add the missing part in KVM
 mode. Another benefit of this way is: the used memory can be
 controlled. For example, with ioctl(), set a buffer of a fixed size
 which keeps the intercepted write data by kernel code, which can avoid
 frequently switch back to user space qemu code. when it is full always
 return back to userspace's qemu code, let qemu code save the data into
 disk. I haven't check the exactly behavior of Intel guest mode about
 how to handle page fault, so can't estimate the performance caused by
 switching of guest mode and root mode, but it should not be worse than
 fork().


 The fork(2) approach is portable, covers both KVM and TCG, and doesn't
 require kernel changes.  A kvm.ko kernel change also won't be
 supported on existing KVM hosts.  These are big drawbacks and the
 kernel approach would need to be significantly better than plain old
 fork(2) to make it worthwhile.

 Stefan

   I think advantage is memory usage is predictable, so memory usage
 peak can be avoided, by always save the changed pages first. fork()
 does not know which pages are changed. I am not sure if this would
 be a serious issue when server's memory is consumed much, for example,
 24G host emulate 11G*2 guest to provide powerful virtual server.

Memory usage is predictable but guest uptime is unpredictable because
it waits until memory is written out.  This defeats the point of
live savevm.  The guest may be stalled arbitrarily.

The fork child can minimize the chance of out-of-memory by using
madvise(MADV_DONTNEED) after pages have been written out.

The way fork handles memory overcommit on Linux is configurable, but I
guess in a situation where memory runs out the Out-of-Memory Killer
will kill a process (probably QEMU since it is hogging so much
memory).

The risk of OOM can be avoided by running the traditional vmsave which
stops the guest instead of using live vmsave.

The other option is to live migrate to file but the disadvantage there
is that you cannot choose exactly when the state it saved, it happens
sometime after live migration is initiated.

There are trade-offs with all the approaches, it depends on what is
most important to you.

Stefan



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-14 Thread Alex Bligh

On 14 Aug 2013, at 08:53, Stefan Hajnoczi wrote:

 The fork child can minimize the chance of out-of-memory by using
 madvise(MADV_DONTNEED) after pages have been written out.

This may also be helpful (last clause) before starting writing.

   MADV_SEQUENTIAL
  Expect  page references in sequential order.
  (Hence, pages in the given range can be aggressively
  read ahead, and may be freed soon after they
  are accessed.)

-- 
Alex Bligh







Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-14 Thread Wenchao Xia
于 2013-8-14 15:53, Stefan Hajnoczi 写道:
 On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com 
 wrote:
 于 2013-8-13 16:21, Stefan Hajnoczi 写道:

 On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com
 wrote:

 于 2013-8-12 19:33, Stefan Hajnoczi 写道:

 On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:


 --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
 wrote:

 The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
 capture the state of guest RAM and then send it back to the parent
 process.  The guest is only paused for a brief instant during fork(2)
 and can continue to run afterwards.




 How would you capture the state of emulated hardware which might not
 be in the guest RAM?



 Exactly the same way vmsave works today.  It calls the device's save
 functions which serialize state to file.

 The difference between today's vmsave and the fork(2) approach is that
 QEMU does not need to wait for guest RAM to be written to file before
 resuming the guest.

 Stefan

 I have a worry about what glib says:

 On Unix, the GLib mainloop is incompatible with fork(). Any program
 using the mainloop must either exec() or exit() from the child without
 returning to the mainloop. 


 This is fine, the child just writes out the memory pages and exits.
 It never returns to the glib mainloop.

 There is another way to do it: intercept the write in kvm.ko(or other
 kernel code). Since the key is intercept the memory change, we can do
 it in userspace in TCG mode, thus we can add the missing part in KVM
 mode. Another benefit of this way is: the used memory can be
 controlled. For example, with ioctl(), set a buffer of a fixed size
 which keeps the intercepted write data by kernel code, which can avoid
 frequently switch back to user space qemu code. when it is full always
 return back to userspace's qemu code, let qemu code save the data into
 disk. I haven't check the exactly behavior of Intel guest mode about
 how to handle page fault, so can't estimate the performance caused by
 switching of guest mode and root mode, but it should not be worse than
 fork().


 The fork(2) approach is portable, covers both KVM and TCG, and doesn't
 require kernel changes.  A kvm.ko kernel change also won't be
 supported on existing KVM hosts.  These are big drawbacks and the
 kernel approach would need to be significantly better than plain old
 fork(2) to make it worthwhile.

 Stefan

I think advantage is memory usage is predictable, so memory usage
 peak can be avoided, by always save the changed pages first. fork()
 does not know which pages are changed. I am not sure if this would
 be a serious issue when server's memory is consumed much, for example,
 24G host emulate 11G*2 guest to provide powerful virtual server.
 
 Memory usage is predictable but guest uptime is unpredictable because
 it waits until memory is written out.  This defeats the point of
 live savevm.  The guest may be stalled arbitrarily.
 
  I think it is adjustable. There is no much difference with
fork(), except get more precise control about the changed pages.
  Kernel intercept the change, and stores the changed page in another
page, similar to fork(). When userspace qemu code execute, save some
pages to disk. Buffer can be used like some lubricant. When Buffer =
MAX, it equals to fork(), guest runs more lively. When Buffer = 0,
guest runs less lively. I think it allows user to find a good balance
point with a parameter.
  It is harder to implement, just want to show the idea.

 The fork child can minimize the chance of out-of-memory by using
 madvise(MADV_DONTNEED) after pages have been written out.
  It seems no way to make sure the written out page is the changed
pages, so it have a good chance the written one is the unchanged and
still used by the other qemu process.

 
 The way fork handles memory overcommit on Linux is configurable, but I
 guess in a situation where memory runs out the Out-of-Memory Killer
 will kill a process (probably QEMU since it is hogging so much
 memory).
 
 The risk of OOM can be avoided by running the traditional vmsave which
 stops the guest instead of using live vmsave.
 
 The other option is to live migrate to file but the disadvantage there
 is that you cannot choose exactly when the state it saved, it happens
 sometime after live migration is initiated.
 
 There are trade-offs with all the approaches, it depends on what is
 most important to you.
 
 Stefan
 


-- 
Best Regards

Wenchao Xia




Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-13 Thread Stefan Hajnoczi
On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:
 于 2013-8-12 19:33, Stefan Hajnoczi 写道:

 On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:

 --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
 wrote:

 The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
 capture the state of guest RAM and then send it back to the parent
 process.  The guest is only paused for a brief instant during fork(2)
 and can continue to run afterwards.



 How would you capture the state of emulated hardware which might not
 be in the guest RAM?


 Exactly the same way vmsave works today.  It calls the device's save
 functions which serialize state to file.

 The difference between today's vmsave and the fork(2) approach is that
 QEMU does not need to wait for guest RAM to be written to file before
 resuming the guest.

 Stefan

   I have a worry about what glib says:

 On Unix, the GLib mainloop is incompatible with fork(). Any program
 using the mainloop must either exec() or exit() from the child without
 returning to the mainloop. 

This is fine, the child just writes out the memory pages and exits.
It never returns to the glib mainloop.

   There is another way to do it: intercept the write in kvm.ko(or other
 kernel code). Since the key is intercept the memory change, we can do
 it in userspace in TCG mode, thus we can add the missing part in KVM
 mode. Another benefit of this way is: the used memory can be
 controlled. For example, with ioctl(), set a buffer of a fixed size
 which keeps the intercepted write data by kernel code, which can avoid
 frequently switch back to user space qemu code. when it is full always
 return back to userspace's qemu code, let qemu code save the data into
 disk. I haven't check the exactly behavior of Intel guest mode about
 how to handle page fault, so can't estimate the performance caused by
 switching of guest mode and root mode, but it should not be worse than
 fork().

The fork(2) approach is portable, covers both KVM and TCG, and doesn't
require kernel changes.  A kvm.ko kernel change also won't be
supported on existing KVM hosts.  These are big drawbacks and the
kernel approach would need to be significantly better than plain old
fork(2) to make it worthwhile.

Stefan



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-13 Thread Wenchao Xia

于 2013-8-13 16:21, Stefan Hajnoczi 写道:

On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:

于 2013-8-12 19:33, Stefan Hajnoczi 写道:


On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:


--On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
wrote:


The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
capture the state of guest RAM and then send it back to the parent
process.  The guest is only paused for a brief instant during fork(2)
and can continue to run afterwards.




How would you capture the state of emulated hardware which might not
be in the guest RAM?



Exactly the same way vmsave works today.  It calls the device's save
functions which serialize state to file.

The difference between today's vmsave and the fork(2) approach is that
QEMU does not need to wait for guest RAM to be written to file before
resuming the guest.

Stefan


   I have a worry about what glib says:

On Unix, the GLib mainloop is incompatible with fork(). Any program
using the mainloop must either exec() or exit() from the child without
returning to the mainloop. 


This is fine, the child just writes out the memory pages and exits.
It never returns to the glib mainloop.


   There is another way to do it: intercept the write in kvm.ko(or other
kernel code). Since the key is intercept the memory change, we can do
it in userspace in TCG mode, thus we can add the missing part in KVM
mode. Another benefit of this way is: the used memory can be
controlled. For example, with ioctl(), set a buffer of a fixed size
which keeps the intercepted write data by kernel code, which can avoid
frequently switch back to user space qemu code. when it is full always
return back to userspace's qemu code, let qemu code save the data into
disk. I haven't check the exactly behavior of Intel guest mode about
how to handle page fault, so can't estimate the performance caused by
switching of guest mode and root mode, but it should not be worse than
fork().


The fork(2) approach is portable, covers both KVM and TCG, and doesn't
require kernel changes.  A kvm.ko kernel change also won't be
supported on existing KVM hosts.  These are big drawbacks and the
kernel approach would need to be significantly better than plain old
fork(2) to make it worthwhile.

Stefan


  I think advantage is memory usage is predictable, so memory usage
peak can be avoided, by always save the changed pages first. fork()
does not know which pages are changed. I am not sure if this would
be a serious issue when server's memory is consumed much, for example,
24G host emulate 11G*2 guest to provide powerful virtual server.

--
Best Regards

Wenchao Xia




Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-12 Thread Stefan Hajnoczi
On Fri, Aug 09, 2013 at 10:20:49AM +, Chijianchun wrote:
 Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
 restrictions to users.
 
 Are there plans to achieve ram live Snapshot feature?
 
 in my mind, Snapshots can not occupy additional too much memory, So when the 
 memory needs to be changed, the old memory page is needed to flush to the 
 file first.  But flushing to file is too slower than memory,  and when 
 flushing, the vcpu or VM is need to be paused until finished flushing,  so 
 pause...resume...pause...resume., more and more slower.
 
 Is this idea feasible? Are there any other thoughts?

A few people have looked at live vmsave or guest RAM snapshots.

The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
capture the state of guest RAM and then send it back to the parent
process.  The guest is only paused for a brief instant during fork(2)
and can continue to run afterwards.

The child process is a simple loop that sends the contents of guest RAM
back to the parent process over a pipe or writes the memory pages to the
save file on disk.  It performs no logic besides writing out guest RAM.

Stefan



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-12 Thread Alex Bligh



--On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com 
wrote:



The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
capture the state of guest RAM and then send it back to the parent
process.  The guest is only paused for a brief instant during fork(2)
and can continue to run afterwards.


How would you capture the state of emulated hardware which might not
be in the guest RAM?

--
Alex Bligh



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-12 Thread Stefan Hajnoczi
On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:
 --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
 wrote:

 The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
 capture the state of guest RAM and then send it back to the parent
 process.  The guest is only paused for a brief instant during fork(2)
 and can continue to run afterwards.


 How would you capture the state of emulated hardware which might not
 be in the guest RAM?

Exactly the same way vmsave works today.  It calls the device's save
functions which serialize state to file.

The difference between today's vmsave and the fork(2) approach is that
QEMU does not need to wait for guest RAM to be written to file before
resuming the guest.

Stefan



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-12 Thread Wenchao Xia

于 2013-8-12 19:33, Stefan Hajnoczi 写道:

On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote:

--On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com
wrote:


The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to
capture the state of guest RAM and then send it back to the parent
process.  The guest is only paused for a brief instant during fork(2)
and can continue to run afterwards.



How would you capture the state of emulated hardware which might not
be in the guest RAM?


Exactly the same way vmsave works today.  It calls the device's save
functions which serialize state to file.

The difference between today's vmsave and the fork(2) approach is that
QEMU does not need to wait for guest RAM to be written to file before
resuming the guest.

Stefan


  I have a worry about what glib says:

On Unix, the GLib mainloop is incompatible with fork(). Any program
using the mainloop must either exec() or exit() from the child without
returning to the mainloop. 

  There is another way to do it: intercept the write in kvm.ko(or other
kernel code). Since the key is intercept the memory change, we can do
it in userspace in TCG mode, thus we can add the missing part in KVM
mode. Another benefit of this way is: the used memory can be
controlled. For example, with ioctl(), set a buffer of a fixed size
which keeps the intercepted write data by kernel code, which can avoid
frequently switch back to user space qemu code. when it is full always
return back to userspace's qemu code, let qemu code save the data into
disk. I haven't check the exactly behavior of Intel guest mode about
how to handle page fault, so can't estimate the performance caused by
switching of guest mode and root mode, but it should not be worse than
fork().


--
Best Regards

Wenchao Xia




Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Paolo Bonzini
Il 09/08/2013 12:20, Chijianchun ha scritto:
 Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly
 restrictions to users.  
 
 Are there plans to achieve ram live Snapshot feature?
 
 in my mind, Snapshots can not occupy additional too much memory, So when
 the memory needs to be changed, the old memory page is needed to flush
 to the file first.  But flushing to file is too slower than memory,  and
 when flushing, the vcpu or VM is need to be paused until finished
 flushing,  so pause...resume...pause...resume., more and
 more slower.
 
 Is this idea feasible? Are there any other thoughts?
 

This looks very similar to postcopy migration (you can Google it).  The
infrastructure for postcopy migration could be used for this as well.

Paolo



Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Anthony Liguori
Chijianchun chijianc...@huawei.com writes:

 Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
 restrictions to users.

 Are there plans to achieve ram live Snapshot feature?

I think you mean a live version of the savevm command.

You can approximate live migrating to a file, creating an external disk
snapshot, then resuming the guest.

Regards,

Anthony Liguori


 in my mind, Snapshots can not occupy additional too much memory, So when the 
 memory needs to be changed, the old memory page is needed to flush to the 
 file first.  But flushing to file is too slower than memory,  and when 
 flushing, the vcpu or VM is need to be paused until finished flushing,  so 
 pause...resume...pause...resume., more and more slower.

 Is this idea feasible? Are there any other thoughts?




Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Eric Blake
On 08/09/2013 09:45 AM, Anthony Liguori wrote:
 Chijianchun chijianc...@huawei.com writes:
 
 Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
 restrictions to users.

 Are there plans to achieve ram live Snapshot feature?
 
 I think you mean a live version of the savevm command.
 
 You can approximate live migrating to a file, creating an external disk
 snapshot, then resuming the guest.

And libvirt does just that, since libvirt 1.0.5, for its external RAM
snapshots.  The vcpu pause is a mere fraction of a second, so it is
generally not noticeable as any guest downtime.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature