Hi Zhang Chen, > -----Original Message----- > From: Zhang, Chen [mailto:chen.zh...@intel.com] > Sent: Wednesday, February 12, 2020 1:45 PM > To: Zhanghailiang <zhang.zhanghaili...@huawei.com>; Dr. David Alan > Gilbert <dgilb...@redhat.com>; Daniel Cho <daniel...@qnap.com> > Cc: qemu-devel@nongnu.org > Subject: RE: The issues about architecture of the COLO checkpoint > > > > > -----Original Message----- > > From: Zhanghailiang <zhang.zhanghaili...@huawei.com> > > Sent: Wednesday, February 12, 2020 11:18 AM > > To: Dr. David Alan Gilbert <dgilb...@redhat.com>; Daniel Cho > > <daniel...@qnap.com>; Zhang, Chen <chen.zh...@intel.com> > > Cc: qemu-devel@nongnu.org > > Subject: RE: The issues about architecture of the COLO checkpoint > > > > Hi, > > > > Thank you Dave, > > > > I'll reply here directly. > > > > -----Original Message----- > > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com] > > Sent: Wednesday, February 12, 2020 1:48 AM > > To: Daniel Cho <daniel...@qnap.com>; chen.zh...@intel.com; > > Zhanghailiang <zhang.zhanghaili...@huawei.com> > > Cc: qemu-devel@nongnu.org > > Subject: Re: The issues about architecture of the COLO checkpoint > > > > > > cc'ing in COLO people: > > > > > > * Daniel Cho (daniel...@qnap.com) wrote: > > > Hi everyone, > > > We have some issues about setting COLO feature. Hope somebody > > > could give us some advice. > > > > > > Issue 1: > > > We dynamic to set COLO feature for PVM(2 core, 16G memory), > > > but the Primary VM will pause a long time(based on memory size) for > > > waiting SVM start. Does it have any idea to reduce the pause time? > > > > > > > Yes, we do have some ideas to optimize this downtime. > > > > The main problem for current version is, for each checkpoint, we have > > to send the whole PVM's pages To SVM, and then copy the whole VM's > > state into SVM from ram cache, in this process, we need both of them > > be paused. > > Just as you said, the downtime is based on memory size. > > > > So firstly, we need to reduce the sending data while do checkpoint, > > actually, we can migrate parts of PVM's dirty pages in background > > While both of VMs are running. And then we load these pages into ram > > cache (backup memory) in SVM temporarily. While do checkpoint, We just > > send the last dirty pages of PVM to slave side and then copy the ram > > cache into SVM. Further on, we don't have To send the whole PVM's > > dirty pages, we can only send the pages that dirtied by PVM or SVM > > during two checkpoints. (Because If one page is not dirtied by both > > PVM and SVM, the data of this pages will keep same in SVM, PVM, backup > > memory). This method can reduce the time that consumed in sending > > data. > > > > For the second problem, we can reduce the memory copy by two methods, > > first one, we don't have to copy the whole pages in ram cache, We can > > only copy the pages that dirtied by PVM and SVM in last checkpoint. > > Second, we can use userfault missing function to reduce the Time > > consumed in memory copy. (For the second time, in theory, we can > > reduce time consumed in memory into ms level). > > > > You can find the first optimization in attachment, it is based on an > > old qemu version (qemu-2.6), it should not be difficult to rebase it > > Into master or your version. And please feel free to send the new > > version if you want into community ;) > > > > > > Thanks Hailiang! > By the way, Do you have time to push the patches to upstream? > I think this is a better and faster option. >
Yes, I can do this, for the second optimization, we need time to realize and test Thanks > Thanks > Zhang Chen > > > > > > > Issue 2: > > > In > > > https://github.com/qemu/qemu/blob/master/migration/colo.c#L503, > > > could we move start_vm() before Line 488? Because at first > > > checkpoint PVM will wait for SVM's reply, it cause PVM stop for a while. > > > > > > > No, that makes no sense, because if PVM runs firstly, it still need to > > wait for The network packets from SVM to compare before send it to client > side. > > > > > > Thanks, > > Hailiang > > > > > We set the COLO feature on running VM, so we hope the running > > > VM could continuous service for users. > > > Do you have any suggestions for those issues? > > > > > > Best regards, > > > Daniel Cho > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK