Hi Nilay, Overall, I believe we are in more agreement with each other than maybe you think. I'm glad you included pseudo code in your latest email. That is a great idea. I think part of our problem is we are comprehending our textual descriptions in different ways.
Below are my responses: > > > > Absolutely, we still need the ability to allocate or deallocate > > entries within actions. I'm not advocating to completely eliminate > > the set/unset cache and tbe entry functions. I just want to avoid > > including those calls in the inports. I'm confused why the mandatory > > queue is different than other queues. They all trigger events in the same > way. > > if (L1IcacheMemory.isTagPresent(in_msg.LineAddress)) { > // The tag matches for the L1, so the L1 asks the L2 for it. > trigger(mandatory_request_type_to_event(in_msg.Type), > in_msg.LineAddress); } > > Brad, mandatory queue is just an example where an inport may perform tag > lookup before cache and transaction buffer entries has been set. Above is an > excerpt from the file MOESI_CMP_directory-L1cache.sm. Before the > trigger() is called, isTagPresent() is called. This means tag look up is being > performed before cache or transaction buffer entries have been set. > Suppose the tag was present in L1Icache, then in the trigger() call, we will > again perform lookup. > > Similarly, there is an inport in the Hammer's protocol implementation where > getCacheEntry() is called before a call to trigger(). Now, why should we use > getCacheEntry() in the inport and cache entry in the action? > The reason is, as you pointed out, we ideally want to call getCacheEntry once. I believe your suggestion to use local variables in the input ports gets us there. Below is what I'm envisioning for the MOESI_hammer mandatory queue in_port logic (at least the IFETCH half of the logic): ENTRY getL1ICacheEntry(Address addr) { assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE); assert(is_valid(L2cacheMemory.lookup(addr) == FALSE); return L1IcacheMemory.lookup(addr); } ENTRY getL1DCacheEntry(Address addr) { assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE); assert(is_valid(L2cacheMemory.lookup(addr) == FALSE); return L1DcacheMemory.lookup(addr); } ENTRY getL2CacheEntry(Address addr) { assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE); assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE); return L2cacheMemory.lookup(addr); } in_port(mandatoryQueue_in, CacheMsg, mandatoryQueue, desc="...", rank=0) { if (mandatoryQueue_in.isReady()) { peek(mandatoryQueue_in, CacheMsg, block_on="LineAddress") { // Set the local entry variables ENTRY L1I_cache_entry = getL1ICacheEntry(in_msg.LineAddress); ENTRY L1D_cache_entry = getL1DCacheEntry(in_msg.LineAddress); TBE_Entry tbe_entry = getTBE(in_msg.LineAddress); // Check for data access to blocks in I-cache and ifetchs to blocks in D-cache if (in_msg.Type == CacheRequestType:IFETCH) { // ** INSTRUCTION ACCESS *** // Check to see if it is in the OTHER L1 if (is_valid(L1D_cache_entry)) { // The block is in the wrong L1, try to write it to the L2 if (L2cacheMemory.cacheAvail(in_msg.LineAddress)) { trigger(Event:L1_to_L2, in_msg.LineAddress, L1D_cache_entry, tbe_entry); } else { replace_addr = L2cacheMemory.cacheProbe(in_msg.LineAddress); replace_cache_entry = getL2CacheEntry(replace_addr); replace_tbe_entry = getTBE(replace_addr); trigger(Event:L2_Replacement, replace_addr, replace_cache_entry, replace_tbe_entry); } } if (is_valid(L1I_cache_entry)) { // The tag matches for the L1, so the L1 fetches the line. We know it can't be in the L2 due to exclusion trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, L1I_cache_entry, tbe_entry); } else { if (L1IcacheMemory.cacheAvail(in_msg.LineAddress)) { // L1 does't have the line, but we have space for it in the L1 ENTRY L2_cache_entry = getL2CacheEntry(in_msg.LineAddress); if (is_valid(L2_cache_entry)) { // L2 has it (maybe not with the right permissions) trigger(Event:Trigger_L2_to_L1I, in_msg.LineAddress, L2_cache_entry, tbe_entry); } else { // We have room, the L2 doesn't have it, so the L1 fetches the line trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, L1Icache_entry, tbe_entry); // you could also say here: "trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, ODD, ODD);" } } else { // No room in the L1, so we need to make room if (L2cacheMemory.cacheAvail(L1IcacheMemory.cacheProbe(in_msg.LineAddress))) { // The L2 has room, so we move the line from the L1 to the L2 replace_addr = L1IcacheMemory.cacheProbe(in_msg.LineAddress); replace_cache_entry = getL2CacheEntry(replace_addr); replace_tbe_entry = getTBE(replace_addr); trigger(Event:L1_to_L2, replace_addr, replace_cache_entry, replace_tbe_entry); } else { // The L2 does not have room, so we replace a line from the L2 replace_addr = L2cacheMemory.cacheProbe(L1IcacheMemory.cacheProbe(in_msg.LineAddress)); replace_cache_entry = getL2CacheEntry(replace_addr); replace_tbe_entry = getTBE(replace_addr); trigger(Event:L2_Replacement, replace_addr, replace_cache_entry, replace_tbe_entry); } } } } else { // *** DATA ACCESS *** ... } > // The tag matches for the L1, so the L1 asks the L2 for it. > trigger(mandatory_request_type_to_event(in_msg.Type), > in_msg.LineAddress); } > In MESI_CMP_directory, getState() is called from an inport. This means we > cannot have an implementation of getState() that makes use of cache entry > variable since it would not have been set. Now, different implementations > for setState() and getState() simply does not make sense to me, so in my > opinion setState() will also not use cache entry. These two function (I just > went through the profile output for MOESI hammer), account for about half > of the calls to isTagPresent(), the function that we are trying to get rid of. > I think we need to modify MESI_CMP_directories getState function to use the cache_entry variable. It doesn't makes sense to me that it wouldn't since the state is part of the cache_entry. How else does getState and setState access the cache state? > > Maybe I should point out that I'm assuming that getCacheEntry can > > return a NULL pointer and thus that can be passed into the trigger > > call when no cache or tbe entry exists. > > You are correct that getCacheEntry() would have to return a NULL, another > thing which I believe we earlier preferred avoiding. As an aside, we cannot > use the keyword NULL since it is already in use. So, we will have to rename > NULL as some thing else. > > On second thought, I think it may not be necessary for getCacheEntry() to > use the keyword. > Yes, I initially was hoping to avoid the NULL (or the OOD) keyword, but over the past few weeks I have realized that isn't possible. By the way, if all things are equal, I would prefer to use your OOD keyword versus NULL. That way we differentiate from C++ NULL since the getCacheEntry functions don't return pointers. At least I'm still hoping to avoid that. > > > > > >> Another concern is in implementation of getCacheEntry(). If this > >> function has to return a pointer to a cache entry, we would have to > >> provide support for local variables which internally SLICC would assume to > be pointer variables. > >> > > > > Within SLICC understanding that certain variables are actually > > pointers is a little bit of a nuisance, but there already exists > > examples where we make that distinction. For instance, look at the "if > para.pointer" > > conditionals in StateMachine.py. We just have to treat cache and tbe > > entries in the same fashion. > > I know it is possible to declare pointers in SLICC, CacheMemory being the > foremost example. But we only allow declaration of pointers. We tacitly > assume that when ever they will be used, they will only be used after being > dereferenced. Now, in case of getCacheEntry(), I do not see this happening. > Below is the current implementation of getCacheEntry() from > MOESI_CMP_directory-L1cache.sm. > > Entry getCacheEntry(Address addr), return_by_ref="yes" { > if (L1DcacheMemory.isTagPresent(addr)) { > return static_cast(Entry, L1DcacheMemory[addr]); > } else { > return static_cast(Entry, L1IcacheMemory[addr]); > } > } > > I would probably convert it to some thing like this. > > AbstractCacheEntry getCacheEntry(Address addr) { > AbstractCacheEntry * cache_entry = L1DcacheMemory.lookup(addr); > if(is_valid(cache_entry)) return cache_entry; > return L1IcacheMemory.lookup(addr); > } > > Now, if we assume that cache_entry will always be used in its dereferenced > form, then we will face problem in returning cache entry. We can introduce a > 'return_by_pointer' annotation for the function to handle this situtation. But > you have pointed out in your previous emails that it is possible for > getCacheEntry() to do more things, which may cause problems for the > compiler. > I think we always assume cache_entry is dereference whenever one of its members or member functions is called. However cache_entry itself is a pointer and getCacheEntry always returns a pointer. > > > > > >> In my opinion, we should maintain one method for looking up cache > entries. > >> My own experience informs me that it is not difficult to incorporate > >> calls to set/unset_cache_entry () in already existing protocol > implementations. > >> For implementing new protocols, I think the greater obstacle will be > >> in implementing the protocol correctly and not in using entry > >> variables correctly. If we document this change lucidly, there is no > >> reason to believe a SLICC programmer will be exceptionally pushed > because of this change. > >> > >> Assuming that this change does introduce some complexity in > >> progamming with SLICC, does that complexity out weigh the performance > improvements? > >> > > > > My position is we can leverage SLICC as an intermediate language and > > achieve the performance benefits of your change without significantly > > impacting the programmability. I agree that we need the > > set/unset_cache_entry calls in the allocate and deallocate actions. I > > see no problem with that. I just want to treat these new implicit > > cache and tbe entry variables like the existing implicit variable address. > > Therefore I want to pass them into the trigger operation like the > > address variable. I also want just one method for looking up cache > > entries. I believe the only difference is that I would like to set > > the cache and tbe entries in the trigger function, as well as allowing > > them to be set in the actions. > > > > I still don't think that we can avoid those calls in inports. By avoiding > those calls, the job of the compiler and the programmer will only become > more difficult. Currently I am of the opinion that if we avoid those > calls, we should avoid this change all together. > We definitely should not give up on this patch. You've put a lot of hard work into it and, as your performance data has suggested, it should give us a noticeable speedup. I sympathize with your position and I really appreciate the effort you've put it. If you'd like, I would be happy to take over the patch if you'd like. Brad _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev