On 02/09/2018 08:15 PM, Dr. David Alan Gilbert wrote:
* Wei Wang (wei.w.w...@intel.com) wrote:
This patch adds a timer to limit the time that host waits for the free
page hints reported by the guest. Users can specify the time in ms via
"free-page-wait-time" command line option. If a user doesn't specify a
time, host waits till the guest finishes reporting all the free page
hints. The policy (wait for all the free page hints to be reported or
use a time limit) is determined by the orchestration layer.
That's kind of a get-out; but there's at least two problems:
a) With a timeout of 0 (the default) we might hang forever waiting
for the guest; broken guests are just too common, we can't do
that.
b) Even if we were going to do that, you'd have to make sure that
migrate_cancel provided a way out.
c) How does that work during a savevm snapshot or when the guest is
stopped?
d) OK, the timer gives us some safety (except c); but how does the
orchestration layer ever come up with a 'safe' value for it?
Unless we can suggest a safe value that the orchestration layer
can use, or a way they can work it out, then they just wont use
it.
Hi Dave,
Sorry for my late response. Please see below:
a) I think people would just kill the guest if it is broken. We can also
change the default timeout value, for example 1 second, which is enough
for the free page reporting.
b) How about changing it this way: if timeout happens, host sends a stop
command to the guest, and makes virtio_balloon_poll_free_page_hints()
"return" immediately (without getting the guest's acknowledge). The
"return" basically goes back to the migration_thread function:
while (s->state == MIGRATION_STATUS_ACTIVE ||
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
...
}
migration_cancel sets the state to MIGRATION_CANCELLING, so it will stop
the migration process.
c) This optimization needs the guest to report. If the guest is stopped,
it wouldn't work. How about adding a check of "RUN_STATE" before going
into the optimization?
d) Yes. Normally it is faster to wait for the guest to report all the
free pages. Probably, we can just hardcode a value (e.g. 1s) for now
(instead of making it configurable by users), this is used to handle the
case that the guest is broken. What would you think?
Best,
Wei