* Alexander Graf (ag...@suse.de) wrote: > > > On 18.12.14 10:12, Mark Burton wrote: > > > >> On 17 Dec 2014, at 17:39, Peter Maydell <peter.mayd...@linaro.org> wrote: > >> > >> On 17 December 2014 at 16:29, Mark Burton <mark.bur...@greensocs.com> > >> wrote: > >>>> On 17 Dec 2014, at 17:27, Peter Maydell <peter.mayd...@linaro.org> wrote: > >>>> I think a mutex is fine, personally -- I just don't want > >>>> to see fifteen hand-hacked mutexes in the target-* code. > >>>> > >>> > >>> Which would seem to favour the helper function approach? > >>> Or am I missing something? > >> > >> You need at least some support from QEMU core -- consider > >> what happens with this patch if the ldrex takes a data > >> abort, for instance. > >> > >> And if you need the "stop all other CPUs while I do this??? > > > > It looks like a corner case, but working this through - the ???simple??? > > put a mutex around the atomic instructions approach would indeed need to > > ensure that no other core was doing anything - that just happens to be true > > for qemu today (or - we would have to put a mutex around all writes); in > > order to ensure the case where a store exclusive could potential fail if a > > non-atomic instruction wrote (a different value) to the same address. This > > is currently guarantee by the implementation in Qemu - how useful it is I > > dont know, but if we break it, we run the risk that something will fail (at > > the least, we could not claim to have kept things the same). > > > > This also has implications for the idea of adding TCG ops I think... > > The ideal scenario is that we could ???fallback??? on the same semantics > > that are there today - allowing specific target/host combinations to be > > optimised (and to improve their functionality). > > But that means, from within the TCG Op, we would need to have a mechanism, > > to cause other TCG???s to take an exit???. etc etc??? In the end, I???m > > sure it???s possible, but it feels so awkward. > > That's the nice thing about transactions - they guarantee that no other > CPU accesses the same cache line at the same time. So you're safe > against other vcpus even without blocking them manually. > > For the non-transactional implementation we probably would need an "IPI > others and halt them until we're done with the critical section" > approach. But I really wouldn't concentrate on making things fast on old > CPUs.
Hang on; 99% of the worlds CPUs don't have (working) transactional memory; so it's a bit excessive to lump them all under old CPUs. > Also keep in mind that for the UP case we can always omit all the magic > - we only need to detect when we move into an SMP case (linux-user clone > or -smp on system). Depends on the architecture to depend if IO breaks those type of ops. Dave > > > > > To re-cap where we are (for my own benefit if nobody else): > > We have several propositions in terms of implementing Atomic instructions > > > > 1/ We wrap the atomic instructions in a mutex using helper functions (this > > is the approach others have taken, it???s simple, but it is not clean, as > > stated above). > > This is horrible. Imagine you have this split approach with a load > exclusive and then store whereas the load starts mutex usage and the > store stop is. At that point if the store creates a segfault you'll be > left with a dangling mutex. > > This stuff really belongs into the TCG core. > > > > > 1.5/ We add a mechanism to ensure that when the mutex is taken, all other > > cores are ???stopped???. > > > > 2/ We add some TCG ops to effectively do the same thing, but this would > > give us the benefit of being able to provide better implementations. This > > is attractive, but we would end up needing ops to cover at least exclusive > > load/store and atomic compare exchange. To me this looks less than elegant > > (being pulled close to the target, rather than being able to generalise), > > but it???s not clear how we would implement the operations as we would > > like, with a machine instruction, unless we did split them out along these > > lines. This approach also (probably) requires the 1.5 mechanism above. > > I'm still in favor of just forcing the semantics of transactions onto > this. If the host doesn't implement transactions, tough luck - do the > "halt all others" IPI. > > > > > 3/ We have discussed a ???h/w??? approach to the problem. In this case, all > > atomic instructions are forced to take the slow path - and a additional > > flags are added to the memory API. We then deal with the issue closer to > > the memory where we can record who has a lock on a memory address. For this > > to work - we would also either > > a) need to add a mprotect type approach to ensure no ???non atomic??? > > writes occur - or > > b) need to force all cores to mark the page with the exclusive memory as IO > > or similar to ensure that all write accesses followed the slow path. > > > > 4/ There is an option to implement exclusive operations within the TCG > > using mprotect (and signal handlers). I have some concerns on this : would > > we need have to have support for each host O/S???. I also think we might > > end up the a lot of protected regions causing a lot of SIGSEGV???s because > > an errant guest doesn???t behave well - basically we will need to see the > > impact on performance - finally - this will be really painful to deal with > > for cases where the exclusive memory is held in what Qemu considers IO > > space !!! > > In other words - putting the mprotect inside TCG looks to me like > > it???s mutually exclusive to supporting a memory-based scheme like (3). > > Again, I don't think it's worth caring about legacy host systems too > much. In a few years from now transactional memory will be commodity, > just like KVM is today. > > > Alex > > > My personal preference is for 3b) it is ???safe??? - its where the > > hardware is. > > 3a is an optimization of that. > > to me, (2) is an optimisation again. We are effectively saying, if you are > > able to do this directly, then you dont need to pass via the slow path. > > Otherwise, you always have the option of reverting to the slow path. > > > > Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just > > dirty hacks. However - their saving grace is that they are hacks that exist > > and ???work???. I dislike patching the hack, but it did seem to offer the > > fastest solution to get around this problem - at least for now. I am no > > longer convinced. > > > > 4/ is something I???d like other peoples views on too??? Is it a better > > approach? What about the slow path? > > > > I increasingly begin to feel that we should really approach this from the > > other end, and provide a ???correct??? solution using the memory - then > > worry about making that faster??? > > > > Cheers > > > > Mark. > > > > > > > > > > > > > > > > > >> semantics linux-user currently uses then that definitely needs > >> core code support. (Maybe linux-user is being over-zealous > >> there; I haven't thought about it.) > >> > >> -- PMM > > > > > > +44 (0)20 7100 3485 x 210 > > +33 (0)5 33 52 01 77x 210 > > > > +33 (0)603762104 > > mark.burton > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK