> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
>     In my view the patch needs two things:
>     
>     1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
>     
>     2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
>     
>     It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
>     It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
>     
>     It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
>     
>     As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
>     
>     If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
>     I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
>     
>     Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
>     
>     Tony would that work for you?
> 
> Steve Reinhardt wrote:
>     Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
>     
>     Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.
> 
> Nikos Nikoleris wrote:
>     I thought that the AtomicOpFunctor solves exactly this issue. While 
> decoding the instruction you pass a function pointer which operates on the 
> data in the memory system. Does the memory system need to be aware of the 
> specifics of the operation? If I understand correctly the cache will only 
> call the function pointer and it won't need to decode the instruction or know 
> anything about it for that matter. I am not really familiar with the LOCK 
> prefix but do we need to support something more than arithmetic operations?
>     
>     It would be great if this was not necessarily tied to the x86 LOCK prefix 
> but a generic mechanism to support far atomics across different architectures.
> 
> Tony Gutierrez wrote:
>     The AtomicOpFunctors are supposed to allow various different atomic 
> operations to be sent to the memory or cache controllers in a simple way, 
> without needing to specify all the possible different atomic op types, and 
> provide their functionality, inside of the request. I think it is a very nice 
> mechanism for handling atomics, but ARM had previously stated their desire to 
> remove them altogether. Is this no longer the case, Nikos?

Tony, I only wanted to point out that all the complexity in Steve's 
implementation was due to the split of the operation in two packets. 
Implementing the lock prefix with a single packet in the form of an atomic 
would simplify things a lot.

I took the AtomicOpFunctor as given as it is already commited. I am missing the 
context here as I haven't followed the discussion on the AtomicOpFunctor but if 
there are thoughts to change the interface I would imagine that the 
mechanism/functionality would be pretty much the same. As for the overall idea, 
I am pretty sure that we are interested in a mechanism for (far)-atomic 
operations in gem5.


- Nikos


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
-----------------------------------------------------------


On April 15, 2016, 5:42 a.m., Steve Reinhardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> -----------------------------------------------------------
> 
> (Updated April 15, 2016, 5:42 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11444:8a1419dbbfa6
> ---------------------------
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -----
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to