On 09/08/2017 12:00, Laszlo Ersek wrote: > On 08/09/17 09:26, Paolo Bonzini wrote: >> On 09/08/2017 03:06, Laszlo Ersek wrote: >>>> 20.14% qemu-system-x86_64 [.] render_memory_region >>>> 17.14% qemu-system-x86_64 [.] subpage_register >>>> 10.31% qemu-system-x86_64 [.] int128_add >>>> 7.86% qemu-system-x86_64 [.] addrrange_end >>>> 7.30% qemu-system-x86_64 [.] int128_ge >>>> 4.89% qemu-system-x86_64 [.] int128_nz >>>> 3.94% qemu-system-x86_64 [.] phys_page_compact >>>> 2.73% qemu-system-x86_64 [.] phys_map_node_alloc >> >> Yes, this is the O(n^3) thing. An optimized build should be faster >> because int128 operations will be inlined and become much more efficient. >> >>> With this patch, I only tested the "93 devices" case, as the slowdown >>> became visible to the naked eye from the trace messages, as the firmware >>> enabled more and more BARs / command registers (and inversely, the >>> speedup was perceivable when the firmware disabled more and more BARs / >>> command registers). >> >> This is an interesting observation, and it's expected. Looking at the >> O(n^3) complexity more in detail you have N operations, where the "i"th >> operates on "i" DMA address spaces, all of which have at least "i" >> memory regions (at least 1 BAR per device). > > - Can you please give me a pointer to the code where the "i"th operation > works on "i" DMA address spaces? (Not that I dream about patching *that* > code, wherever it may live :) )
It's all driven by actions of the guest. Simply, by the time you get to the "i"th command register, you have enabled bus-master DMA on "i" devices (so that "i" DMA address spaces are non-empty) and you have enabled BARs on "i" devices (so that their BARs are included in the address spaces). > - You mentioned that changing this is on the ToDo list. I couldn't find > it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere > else? I've added it to https://wiki.qemu.org/index.php/ToDo/MemoryAPI (thanks for the nudge). Paolo > (I'm not trying to urge any changes in the area, I'd just like to learn > about the code & the tracker item, if there's one.) > > Thanks! > Laszlo > >> >> So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3). >> Expressing it as a sum shows why it gets slower as time progresses. >> >> The solution is to note that those "i" address spaces are actually all >> the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2). >> >> Thanks, >> >> Paolo >> >