2012/2/14 Lluís Vilanova <vilan...@ac.upc.edu>: > Artyom Tarasenko writes: > [...] >> QEMU 1.0.50 monitor - type 'help' for more information >> (qemu) profile >> unknown command: 'profile' >> (qemu) info profile >> async time 38505498320 (38.505) >> qemu time 35947093161 (35.947) > >> Is there a way to find out more? > > Command "info jit" also has some information added when compiled with > profiling > support. > > Search for CONFIG_PROFILER to see which code is activated during profiling. > > >> Next I tried gprof: > >> build-prof $ gprof sparc64-softmmu/qemu-system-sparc64 gmon.out >> Flat profile: > >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls Ts/call Ts/call name >> 100.00 5.06 5.06 main > >> Hmm. Not very informative. Is there a way to find out more details? > > Did you run QEMU for a reasonable amount of time? gprof uses sampling to > capture > its execution time statistics, so a small execution of QEMU will not be able > to > capture any meaningful information.
I did run it to the OpenBIOS prompt. But I think it's my setup which makes gprof useless on the machine where I tested git master: the "host" is a virtual machine itself running under virtual box, and it has problems with the system timer. Will re-check on a bare metal host. > [...] >> Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are >> good candidates for optimizing: together they take the same amount of >> time as cpu_sparc_exec. I guess both operations would be trivial in >> the x86_64 assembler. What would be the best strategy to make TCG take >> the advantage of running on a x86_64 host? > > A quick look into the code reveals that these two are called from a TCG helper > (helper_compute_psr), so I see two approaches here applicable to the most > frequently used "sub-operations" in helper_compute_psr: > > * Define new simpler helpers for those sub-operations that can be declared > with > TCG_CALL_CONST and generate the new psr/xcc values in temporal registers. You > must make sure any other code will still be able to use the new psr/xcc > values. I don't see how to make get_C_sub_xcc even simpler: all it does is the src1 < src2 check. > * Reimplement these sub-operations in pure TCG code. Are there already examples where we compute flags in pure TCG code? > But first, make sure you run a proper benchmark to establish where are the > hotspots in the sparc code for QEMU. The problem here is to establish what a > proper benchmark is :) > :) Artyom -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/search/label/qemu