Lluís Vilanova writes: [...] > This was working on a much older version of instrumentation for QEMU, but I > can > implement something that does the first use-case point above and some > filtering > example (second use-case point) to see what's the performance difference.
Ok, so here's some numbers for the discussion (booting Emilio's ARM full system image that immediately shuts down): * Without instrumentation real 0m10,099s user 0m9,876s sys 0m0,128s * Count number of memory access writes, by instrumenting only when they are executed real 0m15,896s user 0m15,752s sys 0m0,108s * Counting same, but the filtering is done at translation time (i.e., not generate an execute-time callback if it's not a write) real 0m11,084s user 0m10,880s sys 0m0,112s As Peter said, the filtering can be added into the API to take advantage of this "speedup", without exposing translation vs execution time callbacks. * Counting number of executed instructions, by instrumenting the beginning of each one of them real 0m24,583s user 0m24,352s sys 0m0,184s * Counting same, but per-TB numbers are collected at translation-time, and we only generate a per-TB execution time callback to add the corresponding number of instructions for that TB real 0m11,151s user 0m10,952s sys 0m0,092s This really needs to expose translation vs execution time callbacks to take advantage of this "speedup". Cheers, Lluis
