2012/2/14 Laurent Desnogues <laurent.desnog...@gmail.com>: > 2012/2/14 Lluís Vilanova <vilan...@ac.upc.edu>: >> Artyom Tarasenko writes: > [...] >>> Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are >>> good candidates for optimizing: together they take the same amount of >>> time as cpu_sparc_exec. I guess both operations would be trivial in >>> the x86_64 assembler. What would be the best strategy to make TCG take >>> the advantage of running on a x86_64 host? >> >> A quick look into the code reveals that these two are called from a TCG >> helper >> (helper_compute_psr), so I see two approaches here applicable to the most >> frequently used "sub-operations" in helper_compute_psr: >> >> * Define new simpler helpers for those sub-operations that can be declared >> with >> TCG_CALL_CONST and generate the new psr/xcc values in temporal registers. >> You >> must make sure any other code will still be able to use the new psr/xcc >> values. >> >> * Reimplement these sub-operations in pure TCG code. >> >> >> But first, make sure you run a proper benchmark to establish where are the >> hotspots in the sparc code for QEMU. The problem here is to establish what a >> proper benchmark is :) > > Similar helpers are used in ARM translation, so I'm not surprised > they show up (typically sub/flag instructions are used for loops). > > A good strategy is indeed to generate TCG code and let the > NZ/C/etc. be global temps as other CPU registers. This gains a > few percents of speed.
Can you give an example, where global temp would be faster than an inline helper? At the first sight it's trading a cheap math operation (in case of sub, a few cheap math operations in case of subx) against a memory access. Or do you mean, use the global flag registers instead of CC_SRC{1,2} and always compute them? Artyom -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/search/label/qemu