On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
OTOH it seems that if thread stops nothing will wake it up
whem vm is restarted. Such bahaviour change across vmstop/vmstart
is unexpected.
I do not understand why we want to increment the counter
on vm stop though. It does make sense to stop the thread
but why not resume where we left off when vm is resumed?
I'm not sure which counter we incremented. But it would be clear if we
have a high level view of how it works (it is symmetric actually).
Basically, we start the optimization when each round starts and stop it
at the end of each round (i.e. before we do the bitmap sync), as shown
below:
1) 1st Round starts --> free_page_start
2) 1st Round in progress..
3) 1st Round ends --> free_page_stop
4) 2nd Round starts --> free_page_start
5) 2nd Round in progress..
6) 2nd Round ends --> free_page_stop
......
For example, in 2), the VM is stopped.
virtio_balloon_poll_free_page_hints finds the vq is empty (i.e. elem ==
NULL) and the runstate is stopped, the optimization thread exits
immediately. That is, this optimization thread is gone forever (the
optimization we can do for this round is done). We won't know when would
the VM be woken up:
A) If the VM is woken up very soon when the migration thread is still in
progress of 2), then in 4) a new optimization thread (not the same one
for the first round) will be created and start the optimization for the
2nd round as usual (If you have questions about 3) in this case, that
free_page_stop will do nothing than just return, since the optimization
thread has exited) ;
B) If the VM is woken up after the whole migration has ended, there is
still no point in resuming the optimization.
I think this would be the simple design for the first release of this
optimization. There are possibilities to improve case A) above by
continuing optimization for the 1st Round as it is still in progress,
but I think adding that complexity for this rare case wouldn't be
worthwhile (at least for now). What would you think?
Best,
Wei