On 17/07/18 16:09, BALATON Zoltan wrote:
On Mon, 16 Jul 2018, Peter Maydell wrote:
Is this coming up as significant in profiling? In the past we've
This seems to depend on the workload. From the cases I'm interested in
AROS and AmigaOS on qemu-system-ppc -M sam460ex does not seem to be
effected much (object_class_dynamic_cast_assert is not in top 10 with
<2%) but for MorphOS on mac99 this seems to be significant. This is with
default configure (--enable-qom-cast-debug):
% cum. % linenr info symbol name
9.7057 9.7057 exec-all.h:410 helper_lookup_tb_ptr
8.0330 17.7387 object.c:711
object_class_dynamic_cast_assert
6.9411 24.6798 cputlb.c:793 io_readx
6.3219 31.0017 sm501_template.h:62 draw_line16_32
5.3601 36.3617 cputlb.c:114 tlb_flush_nocheck
3.6170 39.9787 translate-all.c:749 page_trylock_add
3.1188 43.0975 translate-all.c:803 page_collection_lock
3.0405 46.1380 exec.c:3025 iotlb_to_section
2.7044 48.8424 softmmu_template.h:112 helper_ret_ldub_mmu
2.4154 51.2578 memory.c:1350 memory_region_access_valid
and improves a bit (but not much) with --disable-qom-cast-debug
% cum. % linenr info symbol name
10.2063 10.2063 exec-all.h:410 helper_lookup_tb_ptr
7.1581 17.3644 object.c:711
object_class_dynamic_cast_assert
5.9297 23.2941 sm501_template.h:62 draw_line16_32
5.9227 29.2168 cputlb.c:793 io_readx
5.3030 34.5198 cputlb.c:114 tlb_flush_nocheck
3.6445 38.1643 memory.c:1350 memory_region_access_valid
3.5499 41.7142 softmmu_template.h:112 helper_ret_ldub_mmu
3.0383 44.7525 translate-all.c:803 page_collection_lock
2.9735 47.7260 memory.c:1415
memory_region_dispatch_read
2.9503 50.6763 translate-all.c:749 page_trylock_add
But the workloads may not have been 100% identical so this is not
conclusive, maybe this debug code is not that expensive at the moment.
AROS on sam460ex has a different profile:
% cum. % linenr info symbol name
8.9905 8.9905 translate-all.c:749 page_trylock_add
8.7658 17.7563 exec-all.h:410 helper_lookup_tb_ptr
7.7349 25.4911 translate-all.c:803 page_collection_lock
5.8246 31.3158 cputlb.c:924 victim_tlb_hit
3.1640 34.4797 cpus.c:347 cpu_get_clock
3.1538 37.6335 translate-all.c:788 tb_page_addr_cmp
2.7969 40.4304 exec.c:435
address_space_translate_internal
2.6647 43.0951 memory.c:571 access_with_adjusted_size
2.0615 45.1567 exec.c:569 flatview_do_translate
1.9586 47.1153 memory.c:1350 memory_region_access_valid
Would anyone be able to guess what are the places that should be looked
at or what to check to get more info on this?
My first thought is that there is a QOM cast somewhere in a hot path on
-M mac99 - can you show us the call stack information from the profile?
I had a similar issue with SPARC32 whereby each DMA request needs to be
manually word-swapped, so instead of adding the QOM cast into these
routines I did a direct C cast from the opaque to ensure that the
overhead was as little as possible.
ATB,
Mark.