> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote: > > Is there anything holding this up from being shipped? > > Andreas Hansson wrote: > In my view the patch needs two things: > > 1) Some thought around the design. I am still hoping there is a less > invasive way of accommodating the functionality, possibly with some changes > to the existing functions of the cache. > > 2) A way to test it. This should preferably include both synthetic and > real use-cases. The memtester and memchecker may be a good starting point for > the synthetic part. > > It would be great if someone could dig into these issues. > > Steve Reinhardt wrote: > It's been quite a while since I've looked at this... in fact I was going > to say that there might be a memory leak, but looking back at the patch > history I see I fixed that already in May 2015 :). > > It's pretty easy to come up with a simple test case, just using C++11 > atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD > work system that I had to give back, and it looks like I missed them when I > went to copy off useful non-proprietary stuff before I left. I know I didn't > spend much time writing tests though, so they should be easy to recreate. > > As far as being less invasive, I think my reply to Andreas's comment of > May 7, 2015 is still relevant: given the current structure of the cache code, > I think the existing solution is really not that invasive. There isn't really > that much code being added here; most of it is comments. Generating a > synthetic request to allow allocating an MSHR so that we can accumulate the > deferred operations that happen while the lock is held is a little strange, > but it allows us to reuse the bulk of the code in recvTimingResp() with just > a few changes. > > If the cache code were heavily restructured, we could probably layer it > in such a way that the fake packets would not be necessary, and thus the > invasiveness of the locked RMW support could be reduced. However, I started > down this path a long time ago (shortly after his comment), and after putting > a fair amount of work into the restructuring, I still didn't even have the > code working at all, so I gave up. Thus I think the net invasiveness of > restructuring the cache to reduce the amount of change required specifically > for this feature is much much larger than the invasiveness of this patch by > itself. The end result would probably be better code, but it's a huge amount > of work that I don't see anyone signing up for right now, so I think it's > better to put this change in as is for now, and put "restructuring the cache" > on the to-do list. > > Nikos Nikoleris wrote: > I agree, the implementation of locked instructions wouldn't be trivial > with the current structure of the cache. It would be nice if we didn't have > to abuse the MSHR in that way, it makes it hard to reason about memory leaks, > and it can't solve the problem for atomic and functional accesses. > > Wouldn't it be easier to implement this in the same way as far atomics? > If I understand correctly the semantics of locked instructions, it would be > significantly simpler if we could perform these operations with a single RMW > packet rather than two separate, a read and then a write. Wouldn't the > AtomicOpFunctor in the packet class be sufficient? > > Tony would that work for you? > > Steve Reinhardt wrote: > Thanks for the input, Nikos. While it would be a lot simpler to do the > atomic ops in the cache using functors, x86 lets you put a LOCK prefix in > front of a fairly long list of instructions, and then you have to combine > that with all the different data sizes that each instruction supports. Plus > it would be a pretty bi disruption to the whole CPU ISA definition to create > and plumb those packets through to the memory interface, and it would make > the locked instructions copmletely different from their unlocked > counterparts. These are all the reasons we didn't do it this way to begin > with, though in retrospect it's not as infeasible as perhaps it seemed up > front; just complicated and a little ugly. Not necessarily worse than the > current patch. > > Actually though there is no problem with atomic and functional > accesses... it's easy to make them atomic, as you just issue the read and the > write within the same event tick/process() method (i.e., without going back > to the event queue). It's only timing accesses that are hard with this > interface. > > Nikos Nikoleris wrote: > I thought that the AtomicOpFunctor solves exactly this issue. While > decoding the instruction you pass a function pointer which operates on the > data in the memory system. Does the memory system need to be aware of the > specifics of the operation? If I understand correctly the cache will only > call the function pointer and it won't need to decode the instruction or know > anything about it for that matter. I am not really familiar with the LOCK > prefix but do we need to support something more than arithmetic operations? > > It would be great if this was not necessarily tied to the x86 LOCK prefix > but a generic mechanism to support far atomics across different architectures. > > Tony Gutierrez wrote: > The AtomicOpFunctors are supposed to allow various different atomic > operations to be sent to the memory or cache controllers in a simple way, > without needing to specify all the possible different atomic op types, and > provide their functionality, inside of the request. I think it is a very nice > mechanism for handling atomics, but ARM had previously stated their desire to > remove them altogether. Is this no longer the case, Nikos?
Tony, I only wanted to point out that all the complexity in Steve's implementation was due to the split of the operation in two packets. Implementing the lock prefix with a single packet in the form of an atomic would simplify things a lot. I took the AtomicOpFunctor as given as it is already commited. I am missing the context here as I haven't followed the discussion on the AtomicOpFunctor but if there are thoughts to change the interface I would imagine that the mechanism/functionality would be pretty much the same. As for the overall idea, I am pretty sure that we are interested in a mechanism for (far)-atomic operations in gem5. - Nikos ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2691/#review9228 ----------------------------------------------------------- On April 15, 2016, 5:42 a.m., Steve Reinhardt wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2691/ > ----------------------------------------------------------- > > (Updated April 15, 2016, 5:42 a.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Changeset 11444:8a1419dbbfa6 > --------------------------- > mem: implement x86 locked accesses in timing-mode classic cache > > Add LockedRMW(Read|Write)(Req|Resp) commands. In timing mode, > use a combination of clearing permission bits and leaving > an MSHR in place to prevent accesses & snoops from touching > a locked block between the read and write parts of an locked > RMW sequence. > > > Diffs > ----- > > src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 > src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 > src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 > src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 > > Diff: http://reviews.gem5.org/r/2691/diff/ > > > Testing > ------- > > > Thanks, > > Steve Reinhardt > > _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev