On 17 July 2018 at 21:46, BALATON Zoltan <bala...@eik.bme.hu> wrote: > On Tue, 17 Jul 2018, Mark Cave-Ayland wrote: >> Good question. A quick grep for 'asidx_from_attrs' shows that >> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does >> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h >> with a simple "return 0" change the profile at all? > > > It does seem to lessen its impact but it's still higher than I expected:
It may be worth special-casing the CPU method lookups (or at least that one) if we can, then... > % cum. % linenr info symbol name > 10.7949 10.7949 exec-all.h:410 helper_lookup_tb_ptr > 7.8663 18.6612 cputlb.c:793 io_readx > 6.0265 24.6878 cputlb.c:114 tlb_flush_nocheck > 4.0671 28.7548 sm501_template.h:62 draw_line16_32 > 4.0559 32.8107 object.c:765 > object_class_dynamic_cast_assert > 3.3780 36.1887 memory.c:1350 memory_region_access_valid > 2.8920 39.0808 qemu-thread-posix.c:61 qemu_mutex_lock_impl > 2.7187 41.7995 memory.c:1415 memory_region_dispatch_read > 2.6011 44.4006 qht.c:487 qht_lookup_custom > 2.5356 46.9362 softmmu_template.h:112 helper_ret_ldub_mmu > > Maybe it's called from somewhere else too? I know draw_line16_32 but I > wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those > seem to be significant. And io_readx in itself seems to be too high on the > list too. helper_lookup_tb_ptr is part of TCG -- it's where we look for the next TB to go to. Any non-computed branch to a different page will result in our calling this. So it's high on the profile because we do it a lot, I think, but that's not necessarily a problem as such. io_readx is the slow path for guest memory accesses -- any guest access to something that's not RAM will have to go through here. My first guess (given the other things in the profile, especially helper_ret_ldub_mmu, memory_region_dispatch_read and memory_region_access_valid) is that the guest is in a tight loop doing a read on a device register a lot of the time. > I wonder if it may have something to do with the background task > trying to read non-implemented i2c stuff frequently (as discussed in point > 2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos). Could be, or some similar thing. If you suspect the i2c you could try putting in an unimplemented-device stub in the right place and see how often -d unimp yells about reads to it. So overall I'd be a little wary of optimizing based on this profile, because I suspect it's atypical -- the guest is sat in a tight polling loop and the profile says "all the functions in the code path for doing device access are really hot". The fix is to improve our model so the guest doesn't get stuck like that, not to try to slightly improve the speed of device accesses (we call it the "slow path" for a reason :-)) (But places like asidx_from_attrs are likely to be on hot paths in general, so having the QOM class lookup there be overly heavyweight is maybe worth fixing anyhow.) thanks -- PMM