Hi,
With those changes in place, the same boot-to-kdm process
requires only about 570000 translations to be made, and 2
cache flushes to happen. Of course the cost is an extra
48M of memory use.
I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to
invalidate the code cache approx. 1000 times per second. My poor
K6-2/300 was suffering a lot. About 45% of the time was dedicated to
compilation of code, and desktop experience was very sluggish. Then, I
came up with a very simple idea I named "lazy cache flush". Performance
increased by 76% and compilation time dropped below 10%, desktop
experience was very smooth. I will give you more contemporary results
hereunder.
So what's lazy invalidation of the translation cache? Well, the goal is
simple: keep translated code as long as possible. In practise, you
invalidate the complete translation cache only when it is full. Other
explicit cache invalidation (CINV instructions on 68k, icbi on ppc,
etc.) is virtual. This means the code is kept but it is put in a
"dormant" state. That is, usual entry points (in the hash table, or
inter-block jumps) are redirected to a check/recovery code where the
source block is checksumed again. If it matches original's the
previously compiled code is brought back to life (restoration of entry
points in hash table, and inter-block links). Otherwise, it's
recompiled and new code is used.
It's very simple and quite efficient. Since, I had no need to increase
the translation cache beyond 8MB.
So, here are a few results on an Athlon64 3200+. Translation cache is
set to 8MB. The test consisted in booting to MacOS 8, running all
Speedometer 4 tests, then shuting down the virtual Mac.
* Without lazy flush:
Number of soft flushes: 0
Number of hard flushes: 101387
Number of checksums : 0
Number of calls to compile_block : 20244047
Total emulation time : 115,8 sec
Total compilation time : 59,4 sec (51,3%)
* With lazy flush:
Number of soft flushes: 405520
Number of hard flushes: 7
Number of checksums : 46545721
Number of calls to compile_block : 340104
Total emulation time : 66,1 sec
Total compilation time : 1,8 sec (2,8%)
The results speak by themselves. ;-)
Speedometer 4 "Performance Rating" increased by 12%. More
interestingly, Color QuickDraw tests improved by a 12x factor: scored
19.95 on average with lazy cache flush, 1.67 without.
Bye,
Gwenolé.
_______________________________________________
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel