OK - Alex - your implication is that it has to be atomic, we need the sync…
:-( I have a horrid feeling that the atomicity of global flush can’t be causing the (almost, but not quite reproducible) errors we’re seeing - but… anyway ;-) Cheers Mark. > On 12 Feb 2015, at 15:45, Alexander Graf <ag...@suse.de> wrote: > > >> On 12.02.2015, at 15:35, Mark Burton <mark.bur...@greensocs.com> wrote: >> >> >> TLB Flush: >> >> We have spent a few days on this issue, and still haven’t resolved the best >> path. >> >> Our solution seems to work, most of the time, but we still have some strange >> issues - so I want to check that what we are proposing has a chance of >> working. >> >> >> Our plan is to allow all CPU’s to continue. Potentially one CPU will want to >> write to the TLBs. Subsequent to the write, it requests a TLB Flush. > > Local or global? For local TLB flushes you don't notify the other CPUs at > all. For global ones, the semantics of the call usually dictate atomicity. > >> We are proposing to implement this by signalling all other CPU’s to exit >> (and requesting they flush before re-starting). In other words, this would >> happen asynchronously. > > For global flushes, give them a pointer payload along with the flush request > and tell all cpus to increment it atomically. In your main thread, wait until > *ptr == nKickedCpus. > > FWIW TLBs are always CPU local. When there's a "global TLB flush" > instruction, it pretty much does stall the CPU, notifies the others to also > flush their TLBs, waits and then continues. > > If this really does become a performance bottleneck (which I doubt it does, > almost nobody except x86 does global flushes), you can also do some nasty > hacky tricks, such as (atomically) change the valid bit in remote CPUs TLB > entries. But really only do this as a last resort if the clean version > doesn't perform well. > > > Alex > >> This means - there is a theoretical period of time when one CPU is writing >> to the TLBs while other CPU’s are executing. Our belief is that this has to >> be handled by software anyway, and this should not be an issue from Qemu’s >> point of view. >> The alternative would be to force all other CPU’s to exit before writing the >> TLB’s - this is both expensive and very painful to organise (as we get into >> horrid deadlocks whichever way we turn)… >> >> We’d appreciate some thoughts on this... >> >> Cheers >> >> Mark. >> >> >> >> +44 (0)20 7100 3485 x 210 >> +33 (0)5 33 52 01 77x 210 >> >> +33 (0)603762104 >> mark.burton >> >> > +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton