Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.

Dr. David Alan Gilbert Thu, 18 Dec 2014 04:37:45 -0800

* Alexander Graf (ag...@suse.de) wrote:
> 
> 
> On 18.12.14 10:12, Mark Burton wrote:
> > 
> >> On 17 Dec 2014, at 17:39, Peter Maydell <peter.mayd...@linaro.org> wrote:
> >>
> >> On 17 December 2014 at 16:29, Mark Burton <mark.bur...@greensocs.com> 
> >> wrote:
> >>>> On 17 Dec 2014, at 17:27, Peter Maydell <peter.mayd...@linaro.org> wrote:
> >>>> I think a mutex is fine, personally -- I just don't want
> >>>> to see fifteen hand-hacked mutexes in the target-* code.
> >>>>
> >>>
> >>> Which would seem to favour the helper function approach?
> >>> Or am I missing something?
> >>
> >> You need at least some support from QEMU core -- consider
> >> what happens with this patch if the ldrex takes a data
> >> abort, for instance.
> >>
> >> And if you need the "stop all other CPUs while I do this???
> > 
> > It looks like a corner case, but working this through - the ???simple??? 
> > put a mutex around the atomic instructions approach would indeed need to 
> > ensure that no other core was doing anything - that just happens to be true 
> > for qemu today (or - we would have to put a mutex around all writes); in 
> > order to ensure the case where a store exclusive could potential fail if a 
> > non-atomic instruction wrote (a different value) to the same address. This 
> > is currently guarantee by the implementation in Qemu - how useful it is I 
> > dont know, but if we break it, we run the risk that something will fail (at 
> > the least, we could not claim to have kept things the same).
> > 
> > This also has implications for the idea of adding TCG ops I think...
> > The ideal scenario is that we could ???fallback??? on the same semantics 
> > that are there today - allowing specific target/host combinations to be 
> > optimised (and to improve their functionality). 
> > But that means, from within the TCG Op, we would need to have a mechanism, 
> > to cause other TCG???s to take an exit???. etc etc??? In the end, I???m 
> > sure it???s possible, but it feels so awkward.
> 
> That's the nice thing about transactions - they guarantee that no other
> CPU accesses the same cache line at the same time. So you're safe
> against other vcpus even without blocking them manually.
> 
> For the non-transactional implementation we probably would need an "IPI
> others and halt them until we're done with the critical section"
> approach. But I really wouldn't concentrate on making things fast on old
> CPUs.


Hang on; 99% of the worlds CPUs don't have (working) transactional memory;
so it's a bit excessive to lump them all under old CPUs.

> Also keep in mind that for the UP case we can always omit all the magic
> - we only need to detect when we move into an SMP case (linux-user clone
> or -smp on system).

Depends on the architecture to depend if IO breaks those type of ops.

Dave

> 
> > 
> > To re-cap where we are (for my own benefit if nobody else):
> > We have several propositions in terms of implementing Atomic instructions
> > 
> > 1/ We wrap the atomic instructions in a mutex using helper functions (this 
> > is the approach others have taken, it???s simple, but it is not clean, as 
> > stated above).
> 
> This is horrible. Imagine you have this split approach with a load
> exclusive and then store whereas the load starts mutex usage and the
> store stop is. At that point if the store creates a segfault you'll be
> left with a dangling mutex.
> 
> This stuff really belongs into the TCG core.
> 
> > 
> > 1.5/ We add a mechanism to ensure that when the mutex is taken, all other 
> > cores are ???stopped???.
> > 
> > 2/ We add some TCG ops to effectively do the same thing, but this would 
> > give us the benefit of being able to provide better implementations. This 
> > is attractive, but we would end up needing ops to cover at least exclusive 
> > load/store and atomic compare exchange. To me this looks less than elegant 
> > (being pulled close to the target, rather than being able to generalise), 
> > but it???s not clear how we would implement the operations as we would 
> > like, with a machine instruction, unless we did split them out along these 
> > lines. This approach also (probably) requires the 1.5 mechanism above.
> 
> I'm still in favor of just forcing the semantics of transactions onto
> this. If the host doesn't implement transactions, tough luck - do the
> "halt all others" IPI.
> 
> > 
> > 3/ We have discussed a ???h/w??? approach to the problem. In this case, all 
> > atomic instructions are forced to take the slow path - and a additional 
> > flags are added to the memory API. We then deal with the issue closer to 
> > the memory where we can record who has a lock on a memory address. For this 
> > to work - we would also either
> > a) need to add a mprotect type approach to ensure no ???non atomic??? 
> > writes occur - or
> > b) need to force all cores to mark the page with the exclusive memory as IO 
> > or similar to ensure that all write accesses followed the slow path.
> > 
> > 4/ There is an option to implement exclusive operations within the TCG 
> > using mprotect (and signal handlers). I have some concerns on this : would 
> > we need have to have support for each host O/S???. I also think we might 
> > end up the a lot of protected regions causing a lot of SIGSEGV???s because 
> > an errant guest doesn???t behave well - basically we will need to see the 
> > impact on performance - finally - this will be really painful to deal with 
> > for cases where the exclusive memory is held in what Qemu considers IO 
> > space !!!
> >     In other words - putting the mprotect inside TCG looks to me like 
> > it???s mutually exclusive to supporting a memory-based scheme like (3).
> 
> Again, I don't think it's worth caring about legacy host systems too
> much. In a few years from now transactional memory will be commodity,
> just like KVM is today.
> 
> 
> Alex
> 
> > My personal preference is for 3b) it  is ???safe??? - its where the 
> > hardware is.
> > 3a is an optimization of that.
> > to me, (2) is an optimisation again. We are effectively saying, if you are 
> > able to do this directly, then you dont need to pass via the slow path. 
> > Otherwise, you always have the option of reverting to the slow path.
> > 
> > Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just 
> > dirty hacks. However - their saving grace is that they are hacks that exist 
> > and ???work???. I dislike patching the hack, but it did seem to offer the 
> > fastest solution to get around this problem - at least for now. I am no 
> > longer convinced.
> > 
> > 4/ is something I???d like other peoples views on too??? Is it a better 
> > approach? What about the slow path?
> > 
> > I increasingly begin to feel that we should really approach this from the 
> > other end, and provide a ???correct??? solution using the memory - then 
> > worry about making that faster???
> > 
> > Cheers
> > 
> > Mark.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >> semantics linux-user currently uses then that definitely needs
> >> core code support. (Maybe linux-user is being over-zealous
> >> there; I haven't thought about it.)
> >>
> >> -- PMM
> > 
> > 
> >      +44 (0)20 7100 3485 x 210
> >  +33 (0)5 33 52 01 77x 210
> > 
> >     +33 (0)603762104
> >     mark.burton
> > 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.

Reply via email to