On Wed, Jul 29, 2020 at 03:54:46PM +0300, Denis Plotnikov wrote: > > Besides current solution, do you think we can make it simpler by only > > deliver > > the fault request to the background thread? We can let the background > > thread > > to do all the rests and IIUC we can drop all the complicated sync bitmaps > > and > > so on by doing so. The process can look like: > > > > - background thread runs the general precopy migration, and, > > > > - it only does the ram_bulk_stage, which is the first loop, because for > > snapshot no reason to send a page twice.. > > > > - After copy one page, do ram_set_rw() always, so accessing of this > > page > > will never trigger write-protect page fault again, > > > > - take requests from the unqueue_page() just like what's done in this > > series, but instead of copying the page, the page request should look > > exactly like the postcopy one. We don't need copy_page because the > > page > > won't be changed before we unprotect it, so it shiould be safe. > > These > > pages will still be with high priority because when queued it means > > vcpu > > writed to this protected page and fault in userfaultfd. We need to > > migrate these pages first to unblock them. > > > > - the fault handler thread only needs to do: > > > > - when get a uffd-wp message, translate into a postcopy-like request > > (calculate ramblock and offset), then queue it. That's all. > > > > I believe we can avoid the copy_page parameter that was passed around, and > > we > > can also save the two extra bitmaps and the complicated synchronizations. > > > > Do you think this would work? > > Yes, it would. This scheme is much simpler. I like it, in general. > > I use such a complicated approach to reduce all possible vCPU delays: > if the storage where the snapshot is being saved is quite slow, it could > lead > to vCPU freezing until the page is fully written to the storage. > So, with the current approach, if not take into account a number of page > copies limitation, > the worst case is all VM's ram is copied and then written to the storage. > Other words, the current scheme provides minimal vCPU delays and thus > minimal VM cpu > performance slowdown with the cost of host's memory consumption. > The new scheme is simple, doesn't consume extra host memory but can freeze > vCPUs for > longer time r because: > * usually memory page coping is faster then memory page writing to a storage > (think of HDD disk) > * writing page to a disk depends on disk performance and current disk load > > So it seems that we have two different strategies: > 1. lower CPU delays > 2. lower memory usage > > To be honest I would start from the yours scheme as it much simler and the > other if needed in the future. > > What do you think?
Looks good to me. Btw, IIUC scheme 1 can also be seen as a way to buffer the duplicated pages in RAM. If that's the case, another implementation (even if we want to implement that in the future, but I still doubt it...) is to grant tuables to the current migration channel to take more pages in the buffer cache. Currently, the migration channel does buffering in QEMUFile.buf[IO_BUF_SIZE], where IO_BUF_SIZE is 32K as constant. Anyway, we can start with the simple scheme. Thanks! -- Peter Xu