Perf top tells me qemu_get_ram_ptr consume too much cpu cycles. > 22.56% qemu-kvm [.] address_space_translate > 13.29% qemu-kvm [.] qemu_get_ram_ptr > 4.71% qemu-kvm [.] phys_page_find > 4.43% qemu-kvm [.] address_space_translate_internal > 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt > 3.08% qemu-kvm [.] qemu_ram_addr_from_host > 2.62% qemu-kvm [.] address_space_map > 2.61% libc-2.19.so [.] _int_malloc > 2.58% libc-2.19.so [.] _int_free > 2.38% libc-2.19.so [.] malloc > 2.06% libpthread-2.19.so [.] pthread_mutex_lock > 1.68% libc-2.19.so [.] malloc_consolidate > 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned > 1.23% qemu-kvm [.] lduw_le_phys > 1.18% qemu-kvm [.] find_next_zero_bit > 1.02% qemu-kvm [.] object_unref
And Paolo suggested that we can get rid of qemu_get_ram_ptr by storing the RAMBlock pointer into the memory region, instead of the ram_addr_t value. And after appling this change, I got much better performance indeed. BTW, PATCH 3 is an occasional find. Gonglei (3): exec: store RAMBlock pointer into memory region memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length memory: Remove the superfluous code exec.c | 48 ++++++++++++++++++++++++++++++------------------ include/exec/memory.h | 7 +++---- memory.c | 3 ++- 3 files changed, 35 insertions(+), 23 deletions(-) -- 1.8.5.2