Re: [Qemu-devel] Update on TCG Multithreading
* Mark Burton (mark.bur...@greensocs.com) wrote: All - first a huge thanks for those who have contributed, and those who have expressed an interest in helping out. One issue I???d like to see more opinions on is the question of a cache per core, or a shared cache. I have heard anecdotal evidence that a shared cache gives a major performance benefit???. Does anybody have anything more concrete? (of course we will get numbers in the end if we implement the hybrid scheme as suggested in the wiki - but I???d still appreciate any feedback). Our next plan is to start putting an implementation plan together. Probably quite sketchy at this point, and we hope to start coding shortly. I'd expect a shared one to be able to take advantage of code that's translated by one core and then used on another. On the other hand with one per core you can perform updates on the caches with a lot less locking; however you've still got to be able to do invalidates across all the caches if any core does the write, and that could also get tricky. Dave Cheers Mark. +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton applewebdata://FB8B3C00-B344-43B7-AF3D-1618ECF92219/www.greensocs.com -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Update on TCG Multithreading
On Mon, 1 Dec 2014, Mark Burton wrote: One issue I’d like to see more opinions on is the question of a cache per core, or a shared cache. I have heard anecdotal evidence that a shared cache gives a major performance benefit…. Does anybody have anything more concrete? There is a theoretical and experimental comparison of these approaches in PQEMU article (you've cited it on wiki page). Only the authors call them differently: they call cache-per-core Separate Code Cache (SCC) and they call shared cache Unified Code Cache (UCC). -- Kirill
Re: [Qemu-devel] Update on TCG Multithreading
Mark Burton writes: All - first a huge thanks for those who have contributed, and those who have expressed an interest in helping out. One issue I’d like to see more opinions on is the question of a cache per core, or a shared cache. I have heard anecdotal evidence that a shared cache gives a major performance benefit…. Does anybody have anything more concrete? (of course we will get numbers in the end if we implement the hybrid scheme as suggested in the wiki - but I’d still appreciate any feedback). I think it makes sense to have a per-core pointer to a qom TCGCacheClass. That can then have its own methods for working with updates, making it much simpler to work with different implementations, like completely avoiding locks (per-cpu cache) or a hybrid approach like the one described in the wiki. Our next plan is to start putting an implementation plan together. Probably quite sketchy at this point, and we hope to start coding shortly. BTW, I've added some links to the COREMU project, which was discussed long ago in this list. Best, Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth
Re: [Qemu-devel] Update on TCG Multithreading
On 01.12.14 22:00, Lluís Vilanova wrote: Mark Burton writes: All - first a huge thanks for those who have contributed, and those who have expressed an interest in helping out. One issue I’d like to see more opinions on is the question of a cache per core, or a shared cache. I have heard anecdotal evidence that a shared cache gives a major performance benefit…. Does anybody have anything more concrete? (of course we will get numbers in the end if we implement the hybrid scheme as suggested in the wiki - but I’d still appreciate any feedback). I think it makes sense to have a per-core pointer to a qom TCGCacheClass. That can then have its own methods for working with updates, making it much simpler to work with different implementations, like completely avoiding locks (per-cpu cache) or a hybrid approach like the one described in the wiki. I don't think you want to have indirect function calls in the fast path ;). Alex
Re: [Qemu-devel] Update on TCG Multithreading
Alexander Graf writes: On 01.12.14 22:00, Lluís Vilanova wrote: Mark Burton writes: All - first a huge thanks for those who have contributed, and those who have expressed an interest in helping out. One issue I’d like to see more opinions on is the question of a cache per core, or a shared cache. I have heard anecdotal evidence that a shared cache gives a major performance benefit…. Does anybody have anything more concrete? (of course we will get numbers in the end if we implement the hybrid scheme as suggested in the wiki - but I’d still appreciate any feedback). I think it makes sense to have a per-core pointer to a qom TCGCacheClass. That can then have its own methods for working with updates, making it much simpler to work with different implementations, like completely avoiding locks (per-cpu cache) or a hybrid approach like the one described in the wiki. I don't think you want to have indirect function calls in the fast path ;). Ooops, true; at least probably, since you're never sure how much the HW prefetcher is going to outsmart you :) Well, I guess that a define will have to do then. But I think it still makes sense to refactor tb_* functions and such to have a TCGCache as first argument. Best, Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth