On 20/02/2016 03:35, Gonglei wrote: > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles. >> 22.56% qemu-kvm [.] address_space_translate >> 13.29% qemu-kvm [.] qemu_get_ram_ptr >> 4.71% qemu-kvm [.] phys_page_find >> 4.43% qemu-kvm [.] address_space_translate_internal >> 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt >> 3.08% qemu-kvm [.] qemu_ram_addr_from_host >> 2.62% qemu-kvm [.] address_space_map >> 2.61% libc-2.19.so [.] _int_malloc >> 2.58% libc-2.19.so [.] _int_free >> 2.38% libc-2.19.so [.] malloc >> 2.06% libpthread-2.19.so [.] pthread_mutex_lock >> 1.68% libc-2.19.so [.] malloc_consolidate >> 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned >> 1.23% qemu-kvm [.] lduw_le_phys >> 1.18% qemu-kvm [.] find_next_zero_bit >> 1.02% qemu-kvm [.] object_unref > > And Paolo suggested that we can get rid of qemu_get_ram_ptr > by storing the RAMBlock pointer into the memory region, > instead of the ram_addr_t value. And after appling this change, > I got much better performance indeed.
What's the gain like? I've not reviewed the patch in depth, but what I can say is that I like it a lot. It only does the bare minimum needed to provide the optimization, but this also makes it very simple to understand. More cleanups and further optimizations are possible (including removing mr->ram_addr completely), but your patches really does one thing and does it well. Good job! Paolo