Hi,

With those changes in place, the same boot-to-kdm process
requires only about 570000 translations to be made, and 2
cache flushes to happen.  Of course the cost is an extra
48M of memory use.

I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to invalidate the code cache approx. 1000 times per second. My poor K6-2/300 was suffering a lot. About 45% of the time was dedicated to compilation of code, and desktop experience was very sluggish. Then, I came up with a very simple idea I named "lazy cache flush". Performance increased by 76% and compilation time dropped below 10%, desktop experience was very smooth. I will give you more contemporary results hereunder.

So what's lazy invalidation of the translation cache? Well, the goal is simple: keep translated code as long as possible. In practise, you invalidate the complete translation cache only when it is full. Other explicit cache invalidation (CINV instructions on 68k, icbi on ppc, etc.) is virtual. This means the code is kept but it is put in a "dormant" state. That is, usual entry points (in the hash table, or inter-block jumps) are redirected to a check/recovery code where the source block is checksumed again. If it matches original's the previously compiled code is brought back to life (restoration of entry points in hash table, and inter-block links). Otherwise, it's recompiled and new code is used.

It's very simple and quite efficient. Since, I had no need to increase the translation cache beyond 8MB.

So, here are a few results on an Athlon64 3200+. Translation cache is set to 8MB. The test consisted in booting to MacOS 8, running all Speedometer 4 tests, then shuting down the virtual Mac.

* Without lazy flush:
Number of soft flushes: 0
Number of hard flushes: 101387
Number of checksums   : 0
Number of calls to compile_block : 20244047
Total emulation time   : 115,8 sec
Total compilation time : 59,4 sec (51,3%)

* With lazy flush:
Number of soft flushes: 405520
Number of hard flushes: 7
Number of checksums   : 46545721
Number of calls to compile_block : 340104
Total emulation time   : 66,1 sec
Total compilation time : 1,8 sec (2,8%)

The results speak by themselves. ;-)

Speedometer 4 "Performance Rating" increased by 12%. More interestingly, Color QuickDraw tests improved by a 12x factor: scored 19.95 on average with lazy cache flush, 1.67 without.

Bye,
Gwenolé.


_______________________________________________
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel

Reply via email to