2012/2/14 Lluís Vilanova <vilan...@ac.upc.edu>: > Artyom Tarasenko writes: [...] >> Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are >> good candidates for optimizing: together they take the same amount of >> time as cpu_sparc_exec. I guess both operations would be trivial in >> the x86_64 assembler. What would be the best strategy to make TCG take >> the advantage of running on a x86_64 host? > > A quick look into the code reveals that these two are called from a TCG helper > (helper_compute_psr), so I see two approaches here applicable to the most > frequently used "sub-operations" in helper_compute_psr: > > * Define new simpler helpers for those sub-operations that can be declared > with > TCG_CALL_CONST and generate the new psr/xcc values in temporal registers. You > must make sure any other code will still be able to use the new psr/xcc > values. > > * Reimplement these sub-operations in pure TCG code. > > > But first, make sure you run a proper benchmark to establish where are the > hotspots in the sparc code for QEMU. The problem here is to establish what a > proper benchmark is :)
Similar helpers are used in ARM translation, so I'm not surprised they show up (typically sub/flag instructions are used for loops). A good strategy is indeed to generate TCG code and let the NZ/C/etc. be global temps as other CPU registers. This gains a few percents of speed. HTH, Laurent