δΊ 2013-3-21 23:08, Eric Blake ει: > On 03/21/2013 08:56 AM, Stefan Hajnoczi wrote: >> On Thu, Mar 21, 2013 at 02:42:23PM +0100, Paolo Bonzini wrote: >>> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto: >>>> There already is a guest RAM cloning mechanism: fork the QEMU process. >>>> Then you have a copy-on-write guest RAM. >>>> >>>> In a little more detail: >>>> >>>> 1. save non-RAM device state >>>> 2. quiesce QEMU to a state that is safe for forking >>>> 3. create an EventNotifier for live savevm completion signal >>>> 4. fork and pass completion EventNotifier to child >>>> 5. parent continues running VM >>>> 6. child performs vmsave of copy-on-write guest RAM >>>> 7. child signals completion EventNotifier and terminates >>>> 8. parent raises live savevm completion QMP event >>> >>> Forking a threaded program is not so easy, but it could be done if the >>> child is very simple and only uses syscalls to communicate back with the >>> parent: >> >> On Linux you should be able to use clone(2) to spawn a thread with >> copy-on-write memory. Too bad it's not portable because it gets around >> the messy fork issues. > > And introduces its own messy issues - once you clone() using different > flags than what fork() does, you have invalidated the use of a LOT of > libc interfaces in that child; in particular, any use of pthread is > liable to break. > I think the core of fork() is snapshot RAM pages with RAM, just like LVM2's block snapshot, very cool idea :). The problem is implemention, an API like following is needed: void *mem_snapshot(void *addr, uint64_t len); Briefly I haven't found it on Linux, and not sure if it is available on upstream Linux kernel/C lib. Make this API available then use it in qemu, would be much nicer. It is very challenge to use fork()/clone() way in qemu, I guess there will be many sparse code preparing for fork(), and some resource handling code after fork(), code to query progress, exception handling, child/parent talking mechnism, ah... seems complex. But I am looking forward to see how good it is. Compared with migration to image, the later one use less mem with more I/O, but is much easier to be implemented and portable, maybe it can be used as a simple improvement for "migrate to fd", before an underlining mem snapshot API is available. -- Best Regards
Wenchao Xia