On 3/28/2016 7:41 PM, Eric Blake wrote: > On 03/27/2016 10:16 PM, Jitendra Kolhe wrote: >> While measuring live migration performance for qemu/kvm guest, it >> was observed that the qemu doesn’t maintain any intelligence for the >> guest ram pages which are released by the guest balloon driver and >> treat such pages as any other normal guest ram pages. This has direct >> impact on overall migration time for the guest which has released >> (ballooned out) memory to the host. >> >> In case of large systems, where we can configure large guests with 1TB >> and with considerable amount of memory release by balloon driver to the, >> host the migration time gets worse. > > s/the, host/the host,/ > >> >> The optimization gets temporarily disabled, if the balloon operation is > > s/disabled,/disabled/ > >> in progress. Since the optimization skips scanning and migrating control >> information for ballooned out pages, we might skip guest ram pages in >> cases where the guest balloon driver has freed the ram page to the guest >> but not yet informed the host/qemu about the ram page >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we >> might skip migrating ram pages which the guest is using. Since this >> problem is specific to balloon leak, we can restrict balloon operation in >> progress check to only balloon leak operation in progress check. >> >> The optimization also get permanently disabled (for all subsequent > > s/get/gets/ > >> migrations) in case any of the migration uses postcopy capability. In case >> of postcopy the balloon bitmap would be required to send after vm_stop, >> which has significant impact on the downtime. Moreover, the applications >> in the guest space won’t be actually faulting on the ram pages which are >> already ballooned out, the proposed optimization will not show any >> improvement in migration time during postcopy. >> >> Signed-off-by: Jitendra Kolhe <jitendra.ko...@hpe.com> >> --- >> Changed in v2: >> - Resolved compilation issue for qemu-user binaries in exec.c >> - Localize balloon bitmap test to save_zero_page(). >> - Updated version string for newly added migration capability to 2.7. >> - Made minor modifications to patch commit text. > > I'll leave the technical review to others. > >> +++ b/qapi-schema.json >> @@ -544,11 +544,14 @@ >> # been migrated, pulling the remaining pages along as needed. >> NOTE: If >> # the migration fails during postcopy the VM will fail. (since >> 2.6) >> # >> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon driver. >> +# (since 2.7) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> - 'compress', 'events', 'postcopy-ram'] } >> + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] } > > Does this flag make sense to always have enabled (in which case we don't > need it as a flag), or are there cases where we'd explicitly want to > disable it? >
Yes the flag can be enabled for most of the time, except in cases like migration using postcopy-ram (mutually exclusive) or in cases where the user is confident that the optimization is of no benefit (for e.g. no or very less pct of balloon activity has happened on VM i.e. penalty vs gain). Thanks, - Jitendra