On 27/04/2015 18:28, Paolo Bonzini wrote: > QEMU is currently accessing the dirty bitmaps very liberally, > which is understandable since the accesses are cheap. This is > however not good for squeezing maximum performance out of dataplane, > and is also not good if the accesses become more expensive---as is > the case when they use atomic primitives. > > This patch series does the following optimizations and cleanups: > > 1) it lets KVM code treat migration as "just another dirty bitmap > client" instead of needing the special global_log_start/stop callbacks. > These remain in use in Xen and vhost. This removes code and avoids > bugs such as the one fixed in commit 4cc856f (kvm-all: Sync dirty-bitmap > from kvm before kvm destroy the corresponding dirty_bitmap, 2015-04-02). > > 2) it avoids modifications to unused dirty bitmaps: code if TCG > is disabled, migration if no migration is in progress, VGA for > regions other than VRAM. > > and on top of this makes dirty bitmap access atomic. I'm not including > the patch to make the migration thread synchronize the bitmap outside > the big QEMU lock (thus removing the last source of jitter during the > RAM copy phase of migration) but it is also enabled by these patches. > > Patches 1-4 are cleanups to DIRTY_MEMORY_VGA users. > > Patches 5-12 are the first cleanup (KVM treats migration as just > another client). Patches 13-14 are a simple optimization that is enabled > by these patches. > > Patches 15-18 are bonus cleanups to translate-all.c's dirty memory > tracking for TCG. > > Patches 19-22 are the second cleanup (avoid modifications to unused > dirty bitmaps). > > Patches 23-28 are Stefan's patches for atomic access to the dirty > bitmap, which has no performance impact in the common case thanks to > the previous work. > > Patch 29 is an unrelated strengthening of assertions, that mst spotted > while reviewing v1. > > v1->v2: completed work on removing global_start/global_stop from KVM > listener > > extra spelunking of TCG history so that the exec.c code > makes more sense > > extra splitting of patches (Stefan) > > keep memory_region_is_logging and memory_region_get_dirty_log_mask > APIs separate (mst) > > Paolo Bonzini (23): > memory: the only dirty memory flag for users is DIRTY_MEMORY_VGA > g364fb: remove pointless call to memory_region_set_coalescing > display: enable DIRTY_MEMORY_VGA tracking explicitly > display: add memory_region_sync_dirty_bitmap calls > memory: differentiate memory_region_is_logging and > memory_region_get_dirty_log_mask > memory: prepare for multiple bits in the dirty log mask > framebuffer: check memory_region_is_logging > ui/console: remove dpy_gfx_update_dirty > memory: track DIRTY_MEMORY_CODE in mr->dirty_log_mask > kvm: accept non-mapped memory in kvm_dirty_pages_log_change > memory: include DIRTY_MEMORY_MIGRATION in the dirty log mask > kvm: remove special handling of DIRTY_MEMORY_MIGRATION in the dirty > log mask > ram_addr: tweaks to xen_modified_memory > exec: use memory_region_get_dirty_log_mask to optimize dirty tracking > exec: move functions to translate-all.h > translate-all: remove unnecessary argument to tb_invalidate_phys_range > cputlb: remove useless arguments to tlb_unprotect_code_phys, rename > translate-all: make less of tb_invalidate_phys_page_range depend on > is_cpu_write_access > exec: pass client mask to cpu_physical_memory_set_dirty_range > exec: invert return value of cpu_physical_memory_get_clean, rename > exec: only check relevant bitmaps for cleanliness > memory: do not touch code dirty bitmap unless TCG is enabled > memory: strengthen assertions on mr->terminates > > Stefan Hajnoczi (6): > bitmap: add atomic set functions > bitmap: add atomic test and clear > memory: use atomic ops for setting dirty memory bits > migration: move dirty bitmap sync to ram_addr.h > memory: replace cpu_physical_memory_reset_dirty() with test-and-clear > memory: make cpu_physical_memory_sync_dirty_bitmap() fully atomic > > arch_init.c | 46 +-------------- > cputlb.c | 7 +-- > exec.c | 99 +++++++++++++++---------------- > hw/display/cg3.c | 2 + > hw/display/exynos4210_fimd.c | 20 ++++--- > hw/display/framebuffer.c | 4 ++ > hw/display/g364fb.c | 3 +- > hw/display/sm501.c | 2 + > hw/display/tcx.c | 2 + > hw/display/vmware_vga.c | 2 +- > hw/virtio/dataplane/vring.c | 2 +- > hw/virtio/vhost.c | 9 ++- > include/exec/cputlb.h | 3 +- > include/exec/exec-all.h | 6 +- > include/exec/memory.h | 25 ++++++-- > include/exec/ram_addr.h | 138 > ++++++++++++++++++++++++++++--------------- > include/qemu/bitmap.h | 4 ++ > include/qemu/bitops.h | 14 +++++ > include/ui/console.h | 4 -- > kvm-all.c | 77 ++++++------------------ > linux-user/mmap.c | 7 ++- > memory.c | 76 ++++++++++++++++-------- > translate-all.c | 20 +++---- > translate-all.h | 7 +++ > ui/console.c | 61 ------------------- > user-exec.c | 1 + > util/bitmap.c | 81 +++++++++++++++++++++++++ > xen-hvm.c | 22 ++++--- > 28 files changed, 401 insertions(+), 343 deletions(-) >
Ping? Paolo