On 08/09/17 12:16, Paolo Bonzini wrote: > On 09/08/2017 12:00, Laszlo Ersek wrote: >> On 08/09/17 09:26, Paolo Bonzini wrote: >>> On 09/08/2017 03:06, Laszlo Ersek wrote: >>>>> 20.14% qemu-system-x86_64 [.] render_memory_region >>>>> 17.14% qemu-system-x86_64 [.] subpage_register >>>>> 10.31% qemu-system-x86_64 [.] int128_add >>>>> 7.86% qemu-system-x86_64 [.] addrrange_end >>>>> 7.30% qemu-system-x86_64 [.] int128_ge >>>>> 4.89% qemu-system-x86_64 [.] int128_nz >>>>> 3.94% qemu-system-x86_64 [.] phys_page_compact >>>>> 2.73% qemu-system-x86_64 [.] phys_map_node_alloc >>> >>> Yes, this is the O(n^3) thing. An optimized build should be faster >>> because int128 operations will be inlined and become much more efficient. >>> >>>> With this patch, I only tested the "93 devices" case, as the slowdown >>>> became visible to the naked eye from the trace messages, as the firmware >>>> enabled more and more BARs / command registers (and inversely, the >>>> speedup was perceivable when the firmware disabled more and more BARs / >>>> command registers). >>> >>> This is an interesting observation, and it's expected. Looking at the >>> O(n^3) complexity more in detail you have N operations, where the "i"th >>> operates on "i" DMA address spaces, all of which have at least "i" >>> memory regions (at least 1 BAR per device). >> >> - Can you please give me a pointer to the code where the "i"th operation >> works on "i" DMA address spaces? (Not that I dream about patching *that* >> code, wherever it may live :) ) > > It's all driven by actions of the guest. > > Simply, by the time you get to the "i"th command register, you have > enabled bus-master DMA on "i" devices (so that "i" DMA address spaces > are non-empty) and you have enabled BARs on "i" devices (so that their > BARs are included in the address spaces). > >> - You mentioned that changing this is on the ToDo list. I couldn't find >> it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere >> else? > > I've added it to https://wiki.qemu.org/index.php/ToDo/MemoryAPI (thanks > for the nudge).
Thank you! Allow me one last question -- why (and since when) does each device have its own separate address space? Is that related to the virtual IOMMU? Now that I look at the "info mtree" monitor output of a random VM, I see the following "address-space"s: - memory - I/O - cpu-memory - bunch of nameless ones, with top level regions called "bus master container" - several named "virtio-pci-cfg-as" - KVM-SMRAM I (sort of) understand MemoryRegions and aliases, but: - I don't know why "memory" and "cpu-memory" exist separately, for example, - I seem to remember that the "bunch of nameless ones" has not always been there? (I could be totally wrong, of course.) ... There is one address_space_init() call in "hw/pci/pci.c", and it comes (most recently) from commit 3716d5902d74 ("pci: introduce a bus master container", 2017-03-13). The earliest commit that added it seems to be 817dcc536898 ("pci: give each device its own address space", 2012-10-03). The commit messages do mention IOMMUs. Thanks! Laszlo