Dear Eliot, This is all invaluable so thank you for taking the time to message.
This message is just my current thinking, so please let me know if I’ve misinterpreted anything. >From what I can now tell, the best way to go is to add a request flag to >mem/request.hh, and then issue the request with writeMemTiming from >memhelpers.hh. Then as you have done, it should be possible to extend the >caches to respond to this request (but in the case of fence.t, up to the point >of unification rather than coherence? It seems you can just add the DST_POU >flag to the request to achieve this.). You could make each cache visit every >block with some added delay depending on your exact modelling. I’ve seen such >a thing implemented by functional accesses in BaseCache::memWriteback and >BaseCache::memInvalidate, but I am assuming your engine probably does this via >timing writebacks on each block. From what I can see, Cache::writebackBlk >seems to be timing, and any latency from determining dirty lines (depending on >our particular model) could be added to the cycle count. As for the writeback buffer issue, it seems that given any placement of fence.t it should be conceptually valid to say that no channel exists across it. Therefore you’d need to ensure the writeback buffer was emptied regardless. Is a memory fence able to achieve this or does it require extending the caches further? Then, I guess you would need some concept of worst-case execution time (as you have said, a fixed maximum), as otherwise fence.t in of itself would become a communication channel. I imagine a basic first implementation could do this functionally, to verify everything that should be flushed is, and then made more accurate afterwards. At this point I’ve got the instruction decoding, and can flush an individual L1 block, so with respect to caches – I just need to extend the protocol appropriately. I would appreciate a high-level, but slightly more detailed explanation of the changes you made (particularly the engine) and the functions you called to get your implementation working whilst also making it timing accurate. Assuming that it is easier to provide than producing a potentially quite complicated patch. Thanks again for your support, Ethan From: Eliot Moss <m...@cs.umass.edu> Sent: 14 March 2022 14:15 To: Ethan Bannister <qs18...@bristol.ac.uk>; gem5-users@gem5.org <gem5-users@gem5.org> Subject: Re: [gem5-users] Modelling cache flushing on gem5 (RISC-V) I just skimmed that paper (not surprised to see Gernot Heiser's name there!) and I think that, while it would be a little bit of work, it might not be *too* hard to implement something like fence.t for the caches. It would be substantially different from wbinvd. The latter speaks to the whole cache system, and I implemented it by a request that flows all the way up to the Point of Coherence (memory bus) and back down as a new kind of snoop to all the caches that talk through one or more levels to memory. Then each cache essentially has a little engine for writing dirty lines back. It's that part that would be useful here - I guess we'd be looking at a variation on it, triggered in a slightly different way (not by a snoop, but by a different kind of request). To get sensible timings you'd need to decide what hardware mechanisms are available for finding dirty lines. I assumed they were indexed in some way that finding at least a set with one or more dirty lines had no substantial overhead. L1 cache is small enough that we might get by with that assumption. Alternatively, assuming each set provides an "at least one dirty line" bit, and that 64 of the these set bits can be examined by a priority encoder to give you a set to work on - or indicate that all 64 sets are clean - then a typical L1 cache would not need many cycles of reading those bits out to find the relevant sets. For 64 KB cache, 64 B lines, associativity 2, there are 512 sets, meaning we'd need to read 8 groups of 64 of these "dirty set" bits. The actual writing back would usually take most of the time. Presumably you would need to wait until all the dirty lines make it to L2, since if the writeback buffers are clogged there might still be a communication channel there. Still, by the time a context switch is complete, those buffers may be guaranteed to have cleared - provided we can make an argument that there is a fixed maximum amount of time needed for that to happen. Anyway, I hope this helps. Eliot Moss
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s